Series: How Malicious Python Code Gains Execution

Python logo surrounded by concentric circles
Photo by Brecht Corbeel / Unsplash

The primary vector for malicious code running in software developer environments (e.g., local system, CI/CD runners, production servers, etc.) is software dependencies. This is third-party code which often means open-source software, also known as running code from strangers on the internet.

The prized goal for attackers is arbitrary code execution. It’s the stuff high CVE scores are made of and often the topic of how vulnerabilities can turn into exploits. It’s the foothold needed to run cryptominers, steal secrets, or encrypt data for ransom. It’s no wonder why threat actors want it, but how do they get it? Sutton’s Law makes it obvious why they go after open-source software: because executing arbitrary code is easy there.

This is a series examining the methods malicious Python code gains execution. Some of the methods are obvious and some are potentially undiscovered or at least not found in the wild, yet. What they all mostly have in common is the reliance on a software dependency in the form of a Python package, which is where we begin.

Threat modeling is a useful defensive exercise to predict and prevent future attacks. By thinking like a malicious actor, we can identify the attack surface, enumerate possible compromise vectors, and neutralize them with considered countermeasures. As security researchers, we’ll don a hat of a darker color to put ourselves in the right mindset. Certainly not a black hat, but maybe more of a gray thinking cap. The remainder of the series documents these findings.

Putting our white hat back on, there are countermeasures to protect developers from these attacks. First, use a lockfile every time an environment is created to ensure reproducibility. Then, guard against any changes to that lockfile by automatically monitoring the health of the lockfile and the dependencies contained therein. Finally, don’t allow arbitrary code to run anywhere in your development process.

Phylum can detect, report, and block malicious packages. Other solutions are merely looking for known vulnerabilities and will therefore miss this entire risk domain. Use Phylum to analyze dependencies. Integrations exist to guard PRs with a free GitHub app or a GitHub action. There is also a CLI and pre-commit hook for local development, as well as a phylum Python package that can be pip/pipx installed. Additional supported CI platforms include GitLab CI, Azure Pipelines, and Bitbucket Pipelines, with more coming.

At the time of this writing, Phylum offers Python lockfile and manifest support for pip, pipenv, and poetry. A free community edition is available for everyone to automate software supply chain security to block new risks, prioritize existing issues, and only use trusted open-source code.


Charles Coggins

Charles Coggins

Senior Software Engineer, responsible for integrations and author of the "phylum" Python package. Documentation and quality champion, runner, baseball and scout dad, pod-faster, and lover of outdoors.