Python Package Spoofing
Creating a functional package and hosting it on the Python Package Index (PyPI) is the foundation of most malicious Python packages. Making one that developers will actually want is hard. Malware authors know that proper R&D is essential to their success. Instead of research and development, it is much easier to ripoff and duplicate. It is trivial to take a known good package, clone it, modify the distribution name and maybe some other metadata, and republish it. This post shows an example of that process using the known good certifi
package.
--cta--
Candidate Selection
Thinking like an attacker, the certifi
package was chosen because it has some desirable attributes:
- It is in the top 10 most downloaded packages on PyPI
- It has a name that already looks like a typo
- It has no direct dependencies, making it easier to inject in lockfiles
- It has no platform wheels
- It is a small package, with only two functions:
where()
andcontents()
Notably, the certifi
package uses the legacy setup.py
file for specifying package metadata and building. Even though half of the top 20 most downloaded PyPI packages as of this writing still use this file, the trend is towards using the more modern PEP 518 pyproject.toml
file instead. Perhaps more relevant is the fact that a lot of malicious packages Phylum discovers are still making use of setup.py
. Even so, the process is similar for pyproject.toml
projects.
Spoofing setup.py
projects
The first step is to clone the repository for the legitimate certifi
package and switch into the new directory:
❯ git clone git@github.com:certifi/python-certifi.git python-certify
Cloning into 'python-certify'...
remote: Enumerating objects: 934, done.
remote: Counting objects: 100% (278/278), done.
remote: Compressing objects: 100% (127/127), done.
remote: Total 934 (delta 207), reused 152 (delta 151), pack-reused 656
Receiving objects: 100% (934/934), 1.34 MiB | 668.00 KiB/s, done.
Resolving deltas: 100% (473/473), done.
Then, change the distribution name from certifi
to certify
:
❯ git diff
diff --git a/setup.py b/setup.py
index 4313c16..719d095 100755
--- a/setup.py
+++ b/setup.py
@@ -24,7 +24,7 @@ with open("certifi/__init__.py") as f:
raise RuntimeError("No version number found!")
setup(
- name="certifi",
+ name="certify",
version=VERSION,
description="Python package for providing Mozilla's CA Bundle.",
long_description=open("README.rst").read(),
Next, create a virtual environment and use it to build the new source distribution:
❯ python -m venv .venv
# For brevity, assume every command shown after this one is run from within
# the virtual environment even though the prompt does not show it explicitly
❯ source .venv/bin/activate
❯ python -m pip install setuptools
Collecting setuptools
Using cached setuptools-69.2.0-py3-none-any.whl.metadata (6.3 kB)
Using cached setuptools-69.2.0-py3-none-any.whl (821 kB)
Installing collected packages: setuptools
Successfully installed setuptools-69.2.0
❯ python setup.py sdist
/Users/maxrake/dev/phylum/python-certify/.venv/lib/python3.12/site-packages/setuptools/dist.py:318: InformationOnly: Normalizing '2024.02.02' to '2024.2.2'
self.metadata.version = self._normalize_version(self.metadata.version)
running sdist
running egg_info
creating certify.egg-info
writing certify.egg-info/PKG-INFO
writing dependency_links to certify.egg-info/dependency_links.txt
writing top-level names to certify.egg-info/top_level.txt
writing manifest file 'certify.egg-info/SOURCES.txt'
reading manifest file 'certify.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files found matching '.github/'
warning: manifest_maker: MANIFEST.in, line 4: 'recursive-exclude' expects <dir> <pattern1> <pattern2> ...
adding license file 'LICENSE'
writing manifest file 'certify.egg-info/SOURCES.txt'
running check
creating certify-2024.2.2
creating certify-2024.2.2/certifi
creating certify-2024.2.2/certify.egg-info
copying files to certify-2024.2.2...
copying LICENSE -> certify-2024.2.2
copying MANIFEST.in -> certify-2024.2.2
copying README.rst -> certify-2024.2.2
copying setup.py -> certify-2024.2.2
copying certifi/__init__.py -> certify-2024.2.2/certifi
copying certifi/__main__.py -> certify-2024.2.2/certifi
copying certifi/cacert.pem -> certify-2024.2.2/certifi
copying certifi/core.py -> certify-2024.2.2/certifi
copying certifi/py.typed -> certify-2024.2.2/certifi
copying certify.egg-info/PKG-INFO -> certify-2024.2.2/certify.egg-info
copying certify.egg-info/SOURCES.txt -> certify-2024.2.2/certify.egg-info
copying certify.egg-info/dependency_links.txt -> certify-2024.2.2/certify.egg-info
copying certify.egg-info/not-zip-safe -> certify-2024.2.2/certify.egg-info
copying certify.egg-info/top_level.txt -> certify-2024.2.2/certify.egg-info
copying certify.egg-info/SOURCES.txt -> certify-2024.2.2/certify.egg-info
Writing certify-2024.2.2/setup.cfg
creating dist
Creating tar archive
removing 'certify-2024.2.2' (and everything under it)
❯ ls -alh dist
total 328
drwxr-xr-x 3 maxrake staff 96B Apr 9 10:30 .
drwxr-xr-x 16 maxrake staff 512B Apr 9 10:30 ..
-rw-r--r-- 1 maxrake staff 161K Apr 9 10:30 certify-2024.2.2.tar.gz
The final step is to upload the new package to PyPI. This step was skipped here so as not to pollute the registry in the name of research or demonstration. Plus, it is suspected that certify
is a reserved name on PyPI, likely to prevent such malicious attacks. We can’t know for sure without trying, but there is an unmerged pull request to provide an API for obtaining such prohibited names. The suspicion is further reinforced by the fact that there is a certify
package on the test instance of PyPI but not a matching one on the production instance of PyPI.
Deception is only skin deep
This package hijacking vector is both simple and sneaky. It takes advantage of the differences between distribution and import packages. An attacker can ensure the import package name is the same as the expected package even though the distribution package name is different. That is, even though the bad distribution package name would be certify
on PyPI and installed with pip install certify
, the internal package directory structure and naming was untouched and therefore the code to use it will be the expected import certifi
statement:
# Install the "certify" package, locally sourced in this case
# but it could be `pip install certify` if uploaded to PyPI.
❯ python -m pip install dist/certify-2024.2.2.tar.gz
Processing ./dist/certify-2024.2.2.tar.gz
Installing build dependencies ... done
Getting requirements to build wheel ... done
Installing backend dependencies ... done
Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: certify
Building wheel for certify (pyproject.toml) ... done
Created wheel for certify: filename=certify-2024.2.2-py3-none-any.whl size=163779 sha256=93f1b35d857fb33b8146e1c5c6cfb5b10ecab2b4b69b2f7eda4c99a5626148d6
Stored in directory: /Users/maxrake/Library/Caches/pip/wheels/43/8d/c9/91f4cd154b7df7fbc77d07b6d2012a4f0b9a289da49d46706d
Successfully built certify
Installing collected packages: certify
Successfully installed certify-2024.2.2
# See that the installed package is named "certify"
❯ python -m pip list
Package Version
---------- --------
certify 2024.2.2
pip 24.0
setuptools 69.2.0
# ...which matches the "dist-info" while the importable package from the
# virtual environment's `site-packages` directory is still named "certifi"
❯ ls -alh .venv/lib/python3.12/site-packages
total 8
drwxr-xr-x 11 maxrake staff 352B Apr 9 10:36 .
drwxr-xr-x 3 maxrake staff 96B Apr 9 10:29 ..
drwxr-xr-x 5 maxrake staff 160B Apr 9 10:29 _distutils_hack
drwxr-xr-x 8 maxrake staff 256B Apr 9 10:36 certifi
drwxr-xr-x 10 maxrake staff 320B Apr 9 10:36 certify-2024.2.2.dist-info
-rw-r--r-- 1 maxrake staff 151B Apr 9 10:29 distutils-precedence.pth
drwxr-xr-x 9 maxrake staff 288B Apr 9 10:29 pip
drwxr-xr-x 11 maxrake staff 352B Apr 9 10:29 pip-24.0.dist-info
drwxr-xr-x 6 maxrake staff 192B Apr 9 10:29 pkg_resources
drwxr-xr-x 51 maxrake staff 1.6K Apr 9 10:29 setuptools
drwxr-xr-x 10 maxrake staff 320B Apr 9 10:29 setuptools-69.2.0.dist-info
# ...which means that the package is imported as "certifi"
❯ python
Python 3.12.2 (main, Feb 14 2024, 10:56:22) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import certify
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'certify'
>>> import certifi
>>> dir(certifi)
['__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '__version__', 'contents', 'core', 'where']
StarJacking
This is great! A one-line name change resulted in a package that can be installed (through trickery, if needed) and used just like the original. It is also super convenient, for the attacker that is, that the metadata from the PyPI listing will be the same as the original:
The project links, statistics, meta, and classifiers sections on the left side of the page come straight from the setup()
keywords provided in the setup.py
file. Notably, the owner and maintainers can not be spoofed since they are directly tied to PyPI accounts. Of course, an account could be created to look like the real one, with a similar name and the same avatar.
The practice of hijacking a legitimate package’s metadata and popularity metrics like this is known as StarJacking and is very common. However, it is not enough to have a convincing package. Successful attackers need you to execute their malicious code. How they do that will be covered in a series of posts that build off of this one.