Typosquatting and Other Attacks Against Open Source Dependencies

Typosquatting and Other Attacks Against Open Source Dependencies

In November of 2018 a malicious Javascript package was identified and subsequently removed from the NPM ecosystem. A nefarious modification was introduced into this package, flatmap-stream, which was then added as a direct dependency to the popular event-stream package - and downloaded nearly 8 million times.

Unfortunately this is hardly an isolated event. In 2019 another package, bb-builder, was identified as malicious and removed from NPM. In 2020 over 700 Ruby gems were taken offline. So what can we learn from these types of attacks against the open source community? Software supply chain attacks are both common and extremely effective attack vectors against developers and their end users.

A newly published paper titled The Backstabber's Knife Collection recently made waves in the security community. This paper provides a great analysis across a broad spectrum of software supply chain attacks over the last few years. Not only have we seen a dramatic rise in these sorts of attacks (including some rather high profile breaches, such as the Magecart attacks targeting British Airways, among others), but there has also been a very lackluster response across the industry to the threats posed by these problems. This is a startling revelation considering that according to the Synopsys 2020 Open Source Security and Risk Analysis Report, the majority of commercial code bases across a broad spectrum of industries not only contained open source software, but were composed mostly of open source software. This means that across the board we have effectively given tens of thousands of unknown developers indirect write access to our source repositories with no checks, balances or governance and simply hoped for the best.

So what does a software supply chain attack look like? The incidents examined in the paper looked at several classes of issues: typosquatting, dependency injection, and repository compromise.

What is typosquatting?

Typosquatting comprised a bulk of the malware found. The whole idea behind this sort of attack is to take an existing package, add a backdoor to it and re-upload the package with a slightly different name (e.g. raect instead of react). Attackers performed a number of tricks in order to get their package downloaded by developers including: stealing the names of legitimate packages in new ecosystems, making blog posts encouraging the use of the imposter package, or simply hoping developers would mistype the package name.

Typosquatting has been a prevalent attack against users of NPM. In August of 2017 NPM identified a typosquatted package with a name very close to that of the popular cross-env package. NPM worked to remove this package, and in doing so, identified an additional 40 malicious packages that needed to be taken down.

Injecting software dependencies upstream

In these cases, packages were added upstream from a major project and were able to infect a wide variety of packages downstream. This particular attack takes advantage of the fact that developers rarely investigate their dependencies past the few top level ones that they directly rely on.

Turning packages malicious through repository compromise

Finally, there were a number of cases where package repositories themselves were compromised. This could happen for several reasons - in one case, the maintainer of a major package, used by thousands of downstream projects, quit, passing ownership of the package to a new maintainer. The new maintainer, having gained control of a very influential piece of the ecosystem, added a backdoor to the package.

Our independent analysis at Phylum has shown that across the board large centers of gravity exist within package ecosystems, where compromise of a single package may influence many thousands of projects downstream.

How do we mitigate software supply chain risks?

Since this is clearly a huge issue, what is currently being done about it? Shockingly, the answer is nearly nothing. This is especially interesting as many products now advertise into the space of threat detection and prevention within the software supply chain. However, the reality of the situation is that their focus is on simple threat intelligence feeds - data derived from painstaking manual analysis. An approach that is not only purely reactive rather than proactive, but also doomed to fail. The scope and scale of the data involved is simply too massive to manually analyze, let alone keep up with as new packages are published and existing ones changed. For reference, NPM (which is the largest repository of Javascript packages) contains over a million published packages, with nearly a thousand more published daily. It has grown from around 350,000 published packages to over 1.5 million packages in merely three years! Considering that NPM is simply one of dozens of such repositories, it is easy to see that manually analyzing even a fraction of the packages in existence is simply untenable.

The authors of The Backstabber's Knife Collection noted that across the board, discovered malicious packages were able to stay in distribution for great lengths of time - in many cases greater than a year before discovery. While each ecosystem makes some attempts at curation they do not seem to be terribly effective at preventing future attacks. The efforts are restricted to small silos of data, which fail to address some of the major issues the paper uncovered (such as cross-ecosystem package listing). Additionally, of all the products currently on the market that do any sort of package analysis, the only solutions provided essentially boil down to manual analysis. If Pitchbook's industry analysis puts the global count of all infosec professionals at 6 million, it is flatly impossible to hire enough staff to police an ecosystem of 60 million or greater packages that are constantly evolving, with tens of thousands more being published every day. Clearly, this needs to change. Here at Phylum we are working to bring modern analysis and data mining to bear, providing a path to unite these ecosystems and help protect the open source community.

Phylum Research Team

Phylum Research Team

Hackers, Data Scientists, and Engineers responsible for the identification and takedown of software supply chain attackers.