Devious Python Build Requirements
The previous installment of this series demonstrated the weakness in allowing source distributions as dependencies. They lead to executing arbitrary code from setup.py
files tucked away in the dependency hierarchy. A best practice is to enumerate the complete set of dependencies, in the form of a lockfile, and monitor for malicious entries. What if there was a way to rely on packages as dependencies but ensure they donβt show up in lockfiles? This is precisely what Python build requirements are and they represent a dark spot in the fabric of dependency management.
--cta--
Build requirements in pyproject.toml
Installable source distributions of projects that make use of the modern pyproject.toml
file can specify the build system to use with a build-system
table and the requires
key. This key holds a list of PEP 508 requirement specifiers.
It is a surprising side effect that the entries from this key will be installed in an isolated build environment. Arbitrary code execution is possible if any of the fully resolved set of build requirements are installed as source distributions. Letβs see an example of this in action with the popular cryptography
package, which currently has a pyproject.toml
file that starts like this:
Notice the list of build requirements. Weβll see them again during package installation. This package is offered on PyPI with a source distribution and built distributions:
This is not uncommon, as only one of the top 360 most-downloaded packages on PyPI is not offered as a wheel, according to the "Python Wheels" site at the time of writing:
This is great news for the health of the Python ecosystem! It means it is more likely source distributions are disallowed by package installers to eliminate an entire class of arbitrary code execution attacks. However, there are still less popular and older, unmaintained, packages that are only offered as source distributions. It is this subclass of packages and developersβ ongoing reliance on them where threat actors can operate.
They can modify existing packages with expired author or maintainer domain takeovers and compromised accounts. They can slip their own malicious packages into the dependency chain as build requirements through typosquatting, starjacking, dependency confusion, and lockfile injection attacks.
To understand what happens next, we continue with the cryptography
example, coercing pip
into using the source distribution by specifying the --no-binary
option when installing the package:
There it is. Blink and youβd miss it. The build requirements were installed in an isolated environment, which itself was removed after use. That is a nice feature for malicious actors who want to cover their tracks. Arbitrary code execution would have been the result had any of those build requirements been offered as source distributions solely.
Build requirements for setuptools
The same behavior is possible with the legacy method of building packages using setuptools
. Packages using a setup.py
file will have a setup()
call, which allows for a setup_requires
keyword to list build-time dependency requirements. Similarly, the setup.cfg
file has an [options]
config section with a setup_requires
key that serves the same purpose.
Using setup.py
is discouraged as it allows for arbitrary code execution and allowing source distributions opens the door to strangers on the internet. The pip
documentation for secure and repeatable installs further drives home the point with this warning:
Beware of thesetup_requires
keyword arg insetup.py
. The (rare) packages that use it will cause those dependencies to be downloaded by setuptools directly, skipping pipβs hash-checking. If you need to use such a package, see controlling setup_requires.
A survey of Python lockfiles
It is perhaps more surprising that most Python lockfile generators do not account for these build requirements. Weβve previously written about the benefits of using a lockfile and covered the landscape of popular options in the Python ecosystem. The question now is which lockfiles fall victim to this attack?
Pip
The ability of pip
to create a lockfile is limited to the pip freeze
command. The output will only contain the packages from the environment in which the command was run. Since build requirements are installed in an ephemeral environment, they will not be included. Therefore, using pip
as a lockfile generator is not recommended since it does not capture all requirements.
The --no-build-isolation
option can be specified when installing packages to indicate that an ephemeral environment will not be used. Instead, build requirements must already exist in the installation environment. Adherence is likely a manual process and so not a viable one for most developers.
The pip
documentation includes a βSecure installsβ guide. It recommends disallowing source distributions entirely, by passing the --only-binary :all:
option during install. Consider following this advice on your projects until proven it doesnβt work. Even then, attempt to modernize any requirements that depend on source distributions.
Poetry
The Poetry packaging and dependency management tool uses the pyproject.toml
file for specifying metadata. It also offers a build backend in the form of the poetry-core
package, which must be specified in the requires
key of the build-system
table. What is interesting is that Poetry does not have a mechanism for allowing the build requirements specified here to be included in the poetry.lock
lockfile. This can be seen in our very own phylum-ci
project, which makes use of Poetry and its build backend:
Poetry has an open issue requesting the ability to include build requirements in the lockfile, but it has stalled due to concerns over performance.
pip-tools
The pip-compile
command from the pip-tools
suite is not a victim, at least when the --all-build-deps
option is used. This option was added earlier this year, in the v7.4.0 release from 16 FEB 2024. However, there is a limitation in that build dependencies of build dependencies are not included in the output.
Here is an example using the phylum-ci
project again:
The lockfile even includes comments showing where each requirement originates! This tool can also handle the input manifests setup.py
and setup.cfg
. Here is an example building off the spoofed certify
project from an earlier post in this series:
Pipenv
Pipenv falls victim to this class of lockfile omission, failing to account for build requirements specified in standard locations. It appears the tool is designed to make use of its own custom Pipfile
manifest instead of using the existing options provided by pyproject.toml
or setup.py
. The documentation has this to say regarding the observation that Pipenv does not respect dependencies in setup.py
:
No, it does not, intentionally. Pipfile and setup.py serve different purposes, and should not consider each other by default.
Pipenv has this open issue requesting the ability to specify build dependencies but it stalled out almost five years ago. Plus, it doesnβt quite get to the root of the issue discussed in this post.
PDM
The PDM package and dependency manager bills itself as supporting the latest PEP standards. Indeed, it is PEP 621 compliant for storing project metadata in pyproject.toml
. However, it does not account for the build-system.requires
key there when generating itβs pdm.lock
lockfile. An open issue exists to add this feature but it was deferred by the projectβs creator as too big to take on without additional support from the PyPI API (e.g., .metadata
files a la PEP 658) and standardization in the form of PEPs (e.g., core metadata for source distributions a la PEP 643).
Other languages do it right
Including build dependencies in lockfiles is not an intractable problem. The Rust language ecosystem provides only one officially supported package management tool in Cargo
, which generates the canonical Cargo.lock
dependency lockfile.
Rust projects may also need to run code before building their package. Cargo
allows for this in the form of build scripts, which may have their own set of build dependencies. Critically, these build requirements show up in the lockfile, making it much easier for analysis tools like Phylum to warn of bad packages. Here is an example for the curious:
Conclusion
Allowing source distributions at any point in the package dependency chain opens the door for arbitrary code execution. Allowing them for build requirements is especially troublesome because the default behavior is to install those dependencies (i.e., execute arbitrary code from strangers on the internet) in an isolated build environment (away from prying eyes) that is automatically deleted after use (cover tracks).
It would be one thing if these build requirements were included in lockfiles. That way, modifications and additions could go through the normal review process and be automatically scanned for malicious behavior by one of the integrations offered by Phylum.
Unfortunately, most common Python lockfile generators do not support this feature due to the performance cost incurred by additional dependency resolver iterations. The one tool that does support it, pip-compile
from the pip-tools
suite, only resolves build time dependencies one level deep.
Be wary of any packages that are only offered as a source distribution. If possible in your environment, disallow source distributions entirely. If not possible, at least guard against arbitrary changes to build requirements by pinning them to exact versions and analyzing new versions and new entries before use.