This is part of a series of posts examining the methods malicious Python code gains execution.
This technique is more about avoiding detection by hiding in plain sight and leveraging other techniques already discussed to gain execution. Think of it as reducing the signal-to-noise ratio for the good guys looking to root out malware.
Monitoring for new package publications (i.e., new versions) on the Python Package Index (PyPI) is common. Static analysis of source files to find malicious behavior is also common, but usually only for the files present at the time of publication. Less common is continuous monitoring for existing package publications.
Most specific artifact wins
It is possible to create a benign package and upload it to PyPI such that the source distribution and all wheels would pass inspection. Then, a malicious wheel could be custom-built and uploaded separately. Package installers will select the wheel most specific to the installation environment. It could be a platform wheel, tailored for a specific target (i.e., black-24.4.0-cp312-cp312-macosx_11_0_arm64.whl) to limit exposure and infect only the system type matching that of the desired victim. The camouflaged package could differ in a minor way, like with the addition of a single malicious dependency entry.
Examining release timelines
Warehouse, the web application that implements the canonical Python package index, offers RSS feeds to get the newest packages, latest updates, and project releases. It also provides legacy XML-RPC methods that can be used to find all the artifacts added to a given release. Here is an example of what a common package, black, looks like when it publishes a new version to PyPI:
That is one source distribution and 21 built distributions, released over a span of almost 38 minutes! That is a lot of files to analyze for a single release. The black project is known to be good, but imagine a different project with even more platform wheels added, spread out over a longer period of time, perhaps even months later:
PyPI is immutable in that you can not republish a different artifact for an existing specific release artifact. It is possible to publish new artifacts that are more specific for a given release, giving rise to the class of attack outlined in this post. To prevent the possibility of using a malicious artifact, use hashes.
pip and other package installers allow for specifying dependencies with a matching hash. pip calls it “Hash-checking mode” and pipenv makes use of it in Pipfile.lock as a default security feature. The trick is to ensure any lockfile created with hashes from known good dependencies is guarded against updates that may add new hashes. Of course, Phylum is here to help with that.
Senior Software Engineer, responsible for integrations and author of the "phylum" Python package. Documentation and quality champion, runner, baseball and scout dad, pod-faster, and lover of outdoors.
Subscribe to our research
Keep up with the latest software supply chain attacks