Python Package Spoofing

Python Package Spoofing | Phylum
Photo by Fidias Cervantes / Unsplash
🗣️
This is part of a series of posts examining the methods malicious Python code gains execution.

Creating a functional package and hosting it on the Python Package Index (PyPI) is the foundation of most malicious Python packages. Making one that developers will actually want is hard. Malware authors know that proper R&D is essential to their success. Instead of research and development, it is much easier to ripoff and duplicate. It is trivial to take a known good package, clone it, modify the distribution name and maybe some other metadata, and republish it. This post shows an example of that process using the known good certifi package.

--cta--

Candidate Selection

Thinking like an attacker, the certifi package was chosen because it has some desirable attributes:

Notably, the certifi package uses the legacy setup.py file for specifying package metadata and building. Even though half of the top 20 most downloaded PyPI packages as of this writing still use this file, the trend is towards using the more modern PEP 518 pyproject.toml file instead. Perhaps more relevant is the fact that a lot of malicious packages Phylum discovers are still making use of setup.py. Even so, the process is similar for pyproject.toml projects.

Spoofing setup.py projects

The first step is to clone the repository for the legitimate certifi package and switch into the new directory:

❯ git clone git@github.com:certifi/python-certifi.git python-certify
Cloning into 'python-certify'...
remote: Enumerating objects: 934, done.
remote: Counting objects: 100% (278/278), done.
remote: Compressing objects: 100% (127/127), done.
remote: Total 934 (delta 207), reused 152 (delta 151), pack-reused 656
Receiving objects: 100% (934/934), 1.34 MiB | 668.00 KiB/s, done.
Resolving deltas: 100% (473/473), done.

Then, change the distribution name from certifi to certify:

❯ git diff
diff --git a/setup.py b/setup.py
index 4313c16..719d095 100755
--- a/setup.py
+++ b/setup.py
@@ -24,7 +24,7 @@ with open("certifi/__init__.py") as f:
         raise RuntimeError("No version number found!")

 setup(
-    name="certifi",
+    name="certify",
     version=VERSION,
     description="Python package for providing Mozilla's CA Bundle.",
     long_description=open("README.rst").read(),

Next, create a virtual environment and use it to build the new source distribution:

❯ python -m venv .venv

# For brevity, assume every command shown after this one is run from within
# the virtual environment even though the prompt does not show it explicitly
❯ source .venv/bin/activate

❯ python -m pip install setuptools
Collecting setuptools
  Using cached setuptools-69.2.0-py3-none-any.whl.metadata (6.3 kB)
Using cached setuptools-69.2.0-py3-none-any.whl (821 kB)
Installing collected packages: setuptools
Successfully installed setuptools-69.2.0

❯ python setup.py sdist
/Users/maxrake/dev/phylum/python-certify/.venv/lib/python3.12/site-packages/setuptools/dist.py:318: InformationOnly: Normalizing '2024.02.02' to '2024.2.2'
  self.metadata.version = self._normalize_version(self.metadata.version)
running sdist
running egg_info
creating certify.egg-info
writing certify.egg-info/PKG-INFO
writing dependency_links to certify.egg-info/dependency_links.txt
writing top-level names to certify.egg-info/top_level.txt
writing manifest file 'certify.egg-info/SOURCES.txt'
reading manifest file 'certify.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files found matching '.github/'
warning: manifest_maker: MANIFEST.in, line 4: 'recursive-exclude' expects <dir> <pattern1> <pattern2> ...
adding license file 'LICENSE'
writing manifest file 'certify.egg-info/SOURCES.txt'
running check
creating certify-2024.2.2
creating certify-2024.2.2/certifi
creating certify-2024.2.2/certify.egg-info
copying files to certify-2024.2.2...
copying LICENSE -> certify-2024.2.2
copying MANIFEST.in -> certify-2024.2.2
copying README.rst -> certify-2024.2.2
copying setup.py -> certify-2024.2.2
copying certifi/__init__.py -> certify-2024.2.2/certifi
copying certifi/__main__.py -> certify-2024.2.2/certifi
copying certifi/cacert.pem -> certify-2024.2.2/certifi
copying certifi/core.py -> certify-2024.2.2/certifi
copying certifi/py.typed -> certify-2024.2.2/certifi
copying certify.egg-info/PKG-INFO -> certify-2024.2.2/certify.egg-info
copying certify.egg-info/SOURCES.txt -> certify-2024.2.2/certify.egg-info
copying certify.egg-info/dependency_links.txt -> certify-2024.2.2/certify.egg-info
copying certify.egg-info/not-zip-safe -> certify-2024.2.2/certify.egg-info
copying certify.egg-info/top_level.txt -> certify-2024.2.2/certify.egg-info
copying certify.egg-info/SOURCES.txt -> certify-2024.2.2/certify.egg-info
Writing certify-2024.2.2/setup.cfg
creating dist
Creating tar archive
removing 'certify-2024.2.2' (and everything under it)

❯ ls -alh dist
total 328
drwxr-xr-x   3 maxrake  staff    96B Apr  9 10:30 .
drwxr-xr-x  16 maxrake  staff   512B Apr  9 10:30 ..
-rw-r--r--   1 maxrake  staff   161K Apr  9 10:30 certify-2024.2.2.tar.gz

The final step is to upload the new package to PyPI. This step was skipped here so as not to pollute the registry in the name of research or demonstration. Plus, it is suspected that certify is a reserved name on PyPI, likely to prevent such malicious attacks. We can’t know for sure without trying, but there is an unmerged pull request to provide an API for obtaining such prohibited names. The suspicion is further reinforced by the fact that there is a certify package on the test instance of PyPI but not a matching one on the production instance of PyPI.

Deception is only skin deep

This package hijacking vector is both simple and sneaky. It takes advantage of the differences between distribution and import packages. An attacker can ensure the import package name is the same as the expected package even though the distribution package name is different. That is, even though the bad distribution package name would be certify on PyPI and installed with pip install certify, the internal package directory structure and naming was untouched and therefore the code to use it will be the expected import certifi statement:

# Install the "certify" package, locally sourced in this case
# but it could be `pip install certify` if uploaded to PyPI.
❯ python -m pip install dist/certify-2024.2.2.tar.gz
Processing ./dist/certify-2024.2.2.tar.gz
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: certify
  Building wheel for certify (pyproject.toml) ... done
  Created wheel for certify: filename=certify-2024.2.2-py3-none-any.whl size=163779 sha256=93f1b35d857fb33b8146e1c5c6cfb5b10ecab2b4b69b2f7eda4c99a5626148d6
  Stored in directory: /Users/maxrake/Library/Caches/pip/wheels/43/8d/c9/91f4cd154b7df7fbc77d07b6d2012a4f0b9a289da49d46706d
Successfully built certify
Installing collected packages: certify
Successfully installed certify-2024.2.2

# See that the installed package is named "certify"
❯ python -m pip list
Package    Version
---------- --------
certify    2024.2.2
pip        24.0
setuptools 69.2.0

# ...which matches the "dist-info" while the importable package from the
# virtual environment's `site-packages` directory is still named "certifi"
❯ ls -alh .venv/lib/python3.12/site-packages
total 8
drwxr-xr-x  11 maxrake  staff   352B Apr  9 10:36 .
drwxr-xr-x   3 maxrake  staff    96B Apr  9 10:29 ..
drwxr-xr-x   5 maxrake  staff   160B Apr  9 10:29 _distutils_hack
drwxr-xr-x   8 maxrake  staff   256B Apr  9 10:36 certifi
drwxr-xr-x  10 maxrake  staff   320B Apr  9 10:36 certify-2024.2.2.dist-info
-rw-r--r--   1 maxrake  staff   151B Apr  9 10:29 distutils-precedence.pth
drwxr-xr-x   9 maxrake  staff   288B Apr  9 10:29 pip
drwxr-xr-x  11 maxrake  staff   352B Apr  9 10:29 pip-24.0.dist-info
drwxr-xr-x   6 maxrake  staff   192B Apr  9 10:29 pkg_resources
drwxr-xr-x  51 maxrake  staff   1.6K Apr  9 10:29 setuptools
drwxr-xr-x  10 maxrake  staff   320B Apr  9 10:29 setuptools-69.2.0.dist-info

# ...which means that the package is imported as "certifi"
❯ python
Python 3.12.2 (main, Feb 14 2024, 10:56:22) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import certify
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'certify'
>>> import certifi
>>> dir(certifi)
['__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'contents', 'core', 'where']

StarJacking

This is great! A one-line name change resulted in a package that can be installed (through trickery, if needed) and used just like the original. It is also super convenient, for the attacker that is, that the metadata from the PyPI listing will be the same as the original:

PyPI listing for the certifi package, showing the metadata that can and can't be spoofed.
PyPI certifi listing metadata

The project links, statistics, meta, and classifiers sections on the left side of the page come straight from the setup() keywords provided in the setup.py file. Notably, the owner and maintainers can not be spoofed since they are directly tied to PyPI accounts. Of course, an account could be created to look like the real one, with a similar name and the same avatar.

The practice of hijacking a legitimate package’s metadata and popularity metrics like this is known as StarJacking and is very common. However, it is not enough to have a convincing package. Successful attackers need you to execute their malicious code. How they do that will be covered in a series of posts that build off of this one.

Charles Coggins

Charles Coggins

Senior Software Engineer, responsible for integrations and author of the "phylum" Python package. Documentation and quality champion, runner, baseball and scout dad, pod-faster, and lover of outdoors.