This is part of a series of posts examining the methods malicious Python code gains execution.
This blog series has already shown the common infection method for most Python malware is allowing source distributions when installing packages. Weβve also seen most malware gain execution by running from the setup.py file. Thatβs old news and more likely to be noticed. Attackers want to be modern, too. Isnβt there a way they can gain arbitrary code execution with the pyproject.toml file? It turns out there is.
Some build backends provide build hooks for modifying the inputs and outputs of source and built distribution creation. Hatch and PDM are two examples of package managers with related build backends offering build hook functionality. There may be others (see addendum for Poetry at the end of this post). Hatch provides a popular build backend in hatchling so weβll start with it to demonstrate the process.
Spoofing pyproject.toml projects
Weβll start with the same certifi package that has been used throughout this series. To show just how useful Hatch is for package management, the process will be shown from the beginning. The process for spoofing pyproject.toml projects will mimic the one used in the post showing how to do so for setup.py projects.
First, be sure hatch is installed. I prefer the pipx method. Then, clone the repository for the legitimate certifi package and switch into the new directory:
Next, use Hatch to initialize the existing project. This is a nice feature since it will automatically convert the old setuptools configuration using setup.py to the modern PEP 518pyproject.toml format, adhering to PEP 621 to boot!
That was easy. A few commands and we were able to create source and built distributions from an automagically generated, standards-compliant, pyproject.toml file. However, we didnβt seek to rebuild the wheel. We want to create a spoofed package that doesnβt rely on setup.py to run our bad code. To do that, we need to make some modifications to pyproject.toml. For reference, here is the original that Hatch generated:
Weβll add it to source control (git add pyproject.toml) so changes will be more obvious. The first change to make is to rename the project from certifi to certify. Hatchβs logic to automatically detect files to be included when building a wheel relies on the project name matching the file structure. Just like in the previous post about package spoofing, we want a fake distribution package while retaining the import package naming structure. To do that, we need to override the default logic and tell Hatch where to find the files. Here are the changes made in pyproject.toml to meet those goals:
These changes are enough to build a spoofed package with both source and built distributions.
Notice that the distribution package is now certify. After installation, the import package is still certifi. Thatβs great but it doesnβt provide much in the way of new techniques. Sure, Hatch can be used to convert setup.py projects to pyproject.toml ones, but let's crack some eggs and get messy.
Hatching evil plans
Hatch is very much a modern tool, adhering to the latest packaging standards. It eschews the legacy methods of defining metadata and package configuration in a setup.py file, which is essentially an arbitrary code execution vector. Instead, it adopts pyproject.toml as the PEP 518 compliant plaintext configuration file. Thereβs no way arbitrary code can run from a config file, right? Right!? You may be asking, βWhat about my custom package generation steps? How am I supposed to compile extensions for all the various platform types my project supports?β
Hatch has you covered. Hatch provides build hooks to get around this limitation. These build hooks are offered in the form of plugins, either third-party (i.e., more build requirements) or first-party by adhering to a reference plugin interface. The interface allows for writing code that will be executed at one or more points in the build process:
clean: occurs before the build process if the -c/βclean flag was passed to the build command, or when invoking the clean command
initialize: occurs immediately before each build
finalize: occurs immediately after each build
Additionally, Hatch provides one built-in build hook named custom. It is this hook that will be used to deliver our malicious payload since it requires the least amount of additional code. It is enabled with a single line added to the pyproject.toml configuration:
[tool.hatch.build.targets.wheel.hooks.custom]
The default configuration for this custom build hook specifies hatch_build.py as the file containing the custom implementation of the interface. The file name and/or path can be different but weβll use the default for this example to avoid adding another line to pyproject.toml. For reference, this is now the full difference in the configuration file as compared to the one initially generated by Hatch:
Next, weβll write a minimal plugin implementation using hatch_build.py. We choose to use the initialize entry point since it is more likely to run for every build (the finalize entry point may not be called if the build fails). The βmaliciousβ code simply adds a file to a temporary directory to serve as a flag indicating when the code runs. Real malware would be more complex and likely obfuscated.
Building the source distribution with this change and using it to install the package shows that the newly added code does indeed run:
PDM FTW
The same process works when using the build hooks offered by the pdm-backendpackage that comes from the PDM project. Here is what that looks like, using the same certifi package as a starting point.
At this point, the pyproject.toml file looks like the one from before, when Hatch was used. It needs a few adjustments to be used with PDM. For starters, the build backend has to change. Plus, some PEP 621 metadata fields need to be updated to match the expected format even though the content is fine. After those modifications, the pyproject.toml file looks like this:
Here is the difference from the one generated by Hatch:
Enabling PDM build hooks is as easy as creating a pdm_build.py module in the root of the project directory and populating it with one or more of the defined functions from the build hook interface API. The module name can be different but requires an additional entry in the pyproject.toml configuration. This is a minimal pdm_build.py implementation used to plant our flag for wheel builds:
Building the source distribution with this change and using it to install the package shows that the newly added code does indeed run. The demonstration steps are essentially the same as for Hatch, but this time the virtual environment is created before building the source distribution so PDM will use it instead of creating one.
Modern hard hats and safety goggles
Once again, it has been shown that allowing source distributions to be installed anywhere in the fully resolved dependency chain introduces risk. The risk of executing arbitrary code contained within one of those source distributions is not limited to legacy setup.py structures. Modern package managers like Hatch and PDM allow for build hooks in modern pyproject.toml projects. Installing a source distribution means building the wheel that is ultimately installed, which means the hook code in hatch_build.py, pdm_build.py, or any custom-configured module will run.
If possible, disallow all source distributions during install. The pip documentation provides a guide for secure installs that recommends passing --only-binary :all: to meet the goal. It appears that Poetry allows configurations to specify no binaries (i.e., wheels) but no such option for source distributions exists. Other package installers likely suffer the same limitation.
--cta--
One countermeasure to this class of attack is to run all package installation actions through an application sandbox. This restricts the actions available to only those filesystem and network operations deemed legitimate and effectively neuters malicious code. Phylum offers this protection in the form of the open-source Birdcage sandbox, which is baked into the Phylum CLI and can be used for Python developers using pip or Poetry with the matching official extensions.
Addendum: Undocumented Poetry build hooks
βΉοΈ
Since the feature is undocumented and likely to change at any time, this proof of concept is included only as an addendum.
It is indeed possible to include build hooks with Poetry using either the legacy poetry build backend or the modern poetry-core one. This is an undocumented feature but has existed since the earliest days of the project. For some background and insights about the inner workings of this feature, reference these GitHub issues from the Poetry project:
The eleventh issue ever, first talking about this feature
An issue requesting the feature to be stabilized and documented
An issue showing an alternate configuration for the newer poetry-core build backend
To demonstrate the use of Poetry for malicious purposes, we alter our very own phylum project and spoof it as phylum-ci, but with an added Python module disguised as a Markdown file.
Senior Software Engineer, responsible for integrations and author of the "phylum" Python package. Documentation and quality champion, runner, baseball and scout dad, pod-faster, and lover of outdoors.
Subscribe to our research
Keep up with the latest software supply chain attacks