Software supply chains are unique among the broader supply chain family. Logistics-based supply chain risks can be contained or limited by industry or region. However, all software applications, everywhere, rely on the same open-source ecosystem, creating a targeted attack surface with massive reach.
Think about it: the supply chain for medical devices is not often impacted by the supply chain for cars. Each depends on different components and relies on building processes and techniques that have little in common. That's not true for the software supply chain. It's hard to think of modern software that is not dependent on open-source packages and containers. A single package containing a vulnerability or OSS malware affects everyone and has the potential to compromise the developers who build applications, the organizations that rely on them, and the consumers who use them.
Q1 2023 Analysis
Phylum analyzes open-source packages, including source code and metadata, as they are published into several popular ecosystems: NPM, PyPI, RubyGems, Nuget, Golang, Cargo, and Maven. In Q1 2023, Phylum’s Software Supply Chain Security Platform analyzed roughly 237.6M files across 2.8M package publications (with an average of 31.5K packages/day). Findings include:
- 2,189 packages targeted specific groups or organizations
- 6,099 packages referenced known malicious URLs
- 34,253 packages contained pre-compiled binaries
- 18,016 ****packages executed suspicious code during installation
- 12,122 packages made requests to servers by IP address
- 2,543 packages attempted to obfuscate underlying code
- 807 packages enumerated system environment variables
- 103 packages imported dependencies in a non-standard way
- 1,660 packages surreptitiously downloaded and executed code from a remote source
- 2,177 were identified as typosquat packages
- 2,834 packages were registered by authors with throwaway email accounts
- Over 800,000 spam packages were published across ecosystems
The first quarter of 2023 saw a marked increase in spam package publications. Although these packages don’t carry any code with them and thus are not software supply chain attacks in the traditional sense, the volume of packages is impacting the ability of open-source ecosystems to adequately triage and remove legitimate malware threats. This leaves malicious packages up far longer than necessary, providing attackers additional time to potentially infect developers.
All told, Phylum saw 800,024 spam packages across all ecosystems in Q1.
Looking at the trend of NPM package publications over this quarter, we note a striking characteristic: private (i.e., scoped) package publication rates remained mostly steady throughout the quarter; however, public packages exploded in volume around March. The sudden rise in publication count can be almost entirely attributed to various forms of spam.
NPM: A Bulk of the Spam Packages
NPM is the largest and most active ecosystem in terms of published packages. It makes sense, then, that it would also be the largest contributor to the overall spam package totals. Most of these packages consisted of a single
package.json and no executable code.
A significant number of these packages were financial/investment spam, generally promising users millions in earnings, e.g.
Забудьте о финансовых проблемах навсегда: новый метод заработка позволит вам зарабатывать миллионы, не выходя из дома!
Which loosely translates to:
Forget about financial problems forever: a new method of earning will allow you to earn millions without leaving your home!
Stunningly, 357K of the 800K packages in all published spam packages can be tied to just 17 authors on NPM.
|Author Name||Packages Published|
eBook piracy spam accounted for approximately 111K packages published during the latter part of March 2023 alone.
Q1 2023 Ecosystem Statistics
Similar to previous quarters, NPM dominates in terms of package publication activity. Nearly 2.5M packages were published to NPM during the first quarter of 2023. PyPI was a distant second with nearly 324K packages published during this time. All told, Phylum analyzed over 2TB of packages during Q1.
Popular Software Supply Chain Attacks
As one might expect, the trend of malware publications tracks closely with the activity of the given ecosystem. Across all ecosystems, NPM and PyPI made up the bulk of the software supply chain attacks.
This quarter was similar to the last quarter, with typosquats and dependency confusion accounting for a vast majority of the attacks targeting developers.
Typosquats Continue to Dominate
Typosquats continue to be the dominant initial infection vector selected by threat actors. In early February, Phylum identified a large typosquat campaign targeting PyPI users. A deep dive into the threat actors’ behavior showed a highly targeted and efficient attack against software developers.
In just over an hour, the attacker published 451 packages targeting popular Python libraries like
Dependency Confusion Targets Organizations
Phylum continues to see routine attacks targeting specific companies and organizations. It has been over two years since the initial research on dependency confusion was released. Since then, this technique has become a favorite of software supply chain attackers.
Organizations often leverage internal repositories to maintain private packages they develop in-house. Issues arise when package managers are configured to prefer pulling packages from public open-source registries over internal ones. When this occurs, it leaves the door open for an attacker.
For example, if a hypothetical company, Acme, Inc., publishes a package
acme-auth to its internal package registry (i.e., a registry inside of the company network), an attacker could publish an identically named package to PyPI. In doing so, if a developer’s package manager defaults to the public registry before the internal registry, the developer will install the malware package instead of the legitimate package.
These packages are often published to the ecosystems with high version numbers, using naming schemes referencing popular organizations or companies. In the first quarter of 2023, the Phylum platform identified just over 2,000 packages of this sort, targeting a variety of organizations ranging from financial institutions to large technology companies.
Attackers Experimenting With New Evasion Techniques
Attackers continue to evolve their Tactics, Techniques, and Procedures (TTPs) as their packages are removed from the various open-source ecosystems. In late March 2023, Phylum identified a user experimenting with encoding to evade detection. By taking advantage of some native behaviors of the Python interpreter, the user was able to produce readable, but obfuscated source code that would evade naive detection tooling.
Just a week later, Phylum identified another package
mathjs-min delivering malware to NPM developers. In this particular instance, the attacker forewent the common obfuscation we see most attackers using and instead hid the malicious credential stealer in the minified source code directly. They then published this package to NPM under the guise of being a legitimate minified version of the popular math library.
Even Well-Intentioned Pollutants Damage Ecosystems
The health of the open-source ecosystem is delicate. Maintaining its health is the collective effort of the developer community along with the small cohort of individuals tasked with keeping these registries clean.
The likes of NPM, PyPI, and others provide tremendous value to the community. It’s also the perfect conduit for the transmission of malicious packages. Many of these are publications by security researchers/organizations hoping to score a bug bounty - or worse, security companies typosquatting packages as marketing. This is a problem.
The value provided by open-source package registries relative to the number of people responsible for cleaning them up is seriously out of whack. For something as significant as PyPI you might assume that there are dozens of people working to triage malicious package reports. The reality is that there are roughly two individuals tackling this on a day-to-day basis .
Clearly, there is very little bandwidth available to handle reported packages. The addition of malicious-like but otherwise harmless packages to their stack means that it takes longer to deal with legitimate threats. We are all collectively worse off with this state of affairs. As software developers and security practitioners, we owe it to one another to be good stewards of the open-source ecosystem.
Bug Bounty Pollutants
Purported bug bounty packages are pervasive in the deluge of packages published into each ecosystem every day. If we limit our focus to just the most popular ecosystem (NPM) and isolate our view to packages that contain
preinstall scripts, we find that nearly 16% of these packages are purported to be for security research purposes. These packages frequently contain malware.
It’s important to note that publishing packages of this sort to public registries often violates the TOS for these registries. See the “Acceptable Content” policy for NPM:
NPM’s TOS indicates that security research packages are prohibited
“But can’t we just ignore the bug bounties?” you might ask. The answer would be a resounding, “No.”
While many of these packages note that they are for “security research,” we lack sufficient context to determine the intent of the package author. In absence of this context, we must assume the worst case and treat the package as malicious. This means that registry maintainers must take time out of their day to deal with these packages - leaving them less time to deal with legitimate threats - and we are all worse off for it. 
Typosquats as Billboards
In 2019 the maintainer of
As if things couldn’t get worse, we’ve begun seeing publications from security companies typosquatting popular packages, directing users to their services. Is this a creative marketing tactic? Sure. Is it an appropriate use of the open-source ecosystems we all share? Absolutely not. It wastes the time of individuals trying to keep these ecosystems clean and it is confusing for developers who accidentally install these packages. Regardless of the underlying motivations of the company, can we all agree that treating typosquats as billboards is an all-around bad idea?
Novel Research Has Value
Let’s be clear: There is value in novel research. When Alex Birsan published his write-up on dependency confusion, a vast amount was collectively gained by the community. Our collective security posture was improved as more people became aware of this sort of attack.
Since then there has been a surge of copycats poorly attempting to replicate Alex’s success. The shotgun approach to package publication the individuals take, with little concern for the possible collateral damage, does nothing to improve the security posture of open source. It’s noisy, it’s unlikely to be effective as a bug bounty, and it has a good chance of impacting an unrelated party.
Follow these best practices if you do decide to conduct research in the open-source ecosystem:
- Minimize the publication of bug bounty packages to open-source repositories
- If you must publish a package for security research, make sure it conforms with the TOS of the open-source repository
- Security research packages should contain a minimal payload to demonstrate feasibility; don’t send off more data than is necessary
- Be overly cautious about how your package publication might impact random third parties
- When possible, prefer private registries for proofs-of-concept or CTFs
Risks and Impacts
The consequences of software supply chain attacks from the open-source ecosystem can be devastating. One mistake made by any developer in your supply chain could cause your entire software organization to crumble. In many ways, the open-source supply chain is more akin to a watering hole. Nearly every developer writing software in a particular language uses open-source packages in that language, and they all come from the same sources.
Developers are able to use any open-source package they choose. Each package can have an arbitrary number of dependencies, and the permutations of those dependencies can change erratically, unbeknownst to the software developer. To make matters worse, this is a recursive problem. Organizations do not just need to worry about the security hygiene of developers within their software organizations; they must also consider the developers of the packages their developers rely on, and the developers of the packages those developers rely on, ad infinitum.
The odds of a specific developer being compromised might be low, but the odds of some developer in your supply chain being compromised are much higher. The number of developers in an organization times the number of packages those developers use creates an exponentially growing attack surface that is difficult to reason about or manage. The compromise of just one individual, one hop outside of your organization, could lead to the loss of sensitive credentials or the installation of backdoors inside of your organization. The scale of this problem is one that no one can afford to ignore.
- Dustin Ingram and Ee Durbin have been wonderful to work with and have removed reported packages extremely quickly.
- We have seen instances of authors publishing packages under the guise of bug bounties. Only to turn around and start delivering much more malicious payloads in a subsequent package version.
- Not compensating open-source developers is a software supply chain threat. We should consider ways to effectively compensate these developers.