Mar 15, 2022 4 min read Phylum Research

Phylum’s Monthly Malware Report: March 2022 – Unknown Unknowns

Relying on security research to manually discover open-source packages that exhibit supply chain issues is no longer enough. To truly mitigate the risk of using open-source software written by strangers on the Internet, we must analyze all packages published into the various ecosystems, in real time and at scale.

Phylum was purpose built to analyze the risk of all package releases with the necessary scale, depth, and automation in mind. In the last 29 days, Phylum has processed a total of 545,777 package releases across three ecosystems (NPM, PyPI, and RubyGems), for an average of 18,818 packages processed each day. We have analyzed the metadata and source code for each of these packages, resulting in the processing of 38,798,991 individual source files.

Package Registry	# of Packages	# of Packages/day
NPM	467,818	16,131
PyPI	69,966	2,412
RubyGems	7,993	275.6
Total	545,777	Average: 18,818

Once this data has been collected, Phylum’s analytics, heuristics and ML models comb through the data to identify risk indicators. We look for a myriad of these risk indicators but are uniquely positioned to identify and convict malware. Phylum’s analytics seek to make determinations on the maliciousness of a package before a developer adds it to a production release. Examples of the analytics that identify elements of malware include:

high entropy strings used as arguments to evaluation functions
risky function calls
author/maintainer transition
package name similarity (for typosquatting detection)
package code similarity (for typosquatting detection)
pre and post installation hooks
calls to system binaries frequently used for host compromise
presence and changes to URIs and IP addresses

Our completely automated processing and analytics pipeline identified 72 package versions of interest over the past month. This includes legitimate malware, packages for reconnaissance, network enumeration, and individuals attempting to conduct security research. On average, these packages were identified within 11.2 minutes of publication.

Next, Phylum researchers validate the results of the identified packages, use the resulting data to improve our analytics, and report the malicious packages to the respective package registry. This amounted to 41 packages reported to two package registries as several packages had multiple affected versions.

In doing so, Phylum has simultaneously reduced the window of opportunity for an attacker to infect a victim and made it more difficult for a published malicious package to remain undetected for months, or years.

Preview: Malware Spotlight

Going forward, Phylum will select interesting packages identified during this process for deeper review on our blog. We’ll use adblock-lists as an example for a preview.

*Figure 1 - Phylum identifies a malicious package attempting dependency confusion against an unknown target within minutes of publication*

This package was quickly identified as a package version of interest, then verified and reported in a total of 14 minutes. In the image above, we can see three (3) artifacts that the system identified and a fourth explained:

The package is missing a README. This isn’t common for legitimate software libraries and is a minor contributor to the classification.
The package only has 2 released versions yet has a version number of 99.99.0. This is almost assuredly an attempt to target victims with a Dependency Confusion attack and is a large contributor to the classification.
The package is missing a link to a version control system such as GitHub or GitLab. This is uncommon for legitimate software libraries, but very common for packages released by attackers. This is a minor contributor to the classification.
Via automated static analysis of the source code, Phylum identified the package as employing an install hook that executes curl to send an HTTP request to URI. This is almost assuredly illegitimate and is a large contributor to the classification.

Packages of Interest

Phylum identified 41 packages of interest, with several triggering on multiple versions of the package.

NPM Packages:

adblock-lists	testcucmanh2
rif-marketplace-engine	eslint-plugin-cypress-rules
pstagger	custom-colour
rif-marketplace-engine-sdk	cusom-colours
who.fhir.template	test4dc
openhie.fhir.template	testcucmanh3
cqf.fhir.template	testcucmanh4
node-callstats	testcucmanh5
wasm-thumbnail-js	@madflys/reactVersion
autoplay-whitelist	gapminder-offline
extension-whitelist	dhparadox_teste
anonymous-credentials	superlegal
star-wasm	retrofit2
nsovo-pkg	desugar_jdk_libs
sj-rs	viewlioconfig
nsovo-pkg-2	huutokaupat
cschandragirisample	okhttp-tls
winlocker	forge-locale
testcucmanh2	redis-v3
custom-color	@roku-web-core/ajax

Why Phylum & What’s Coming Next....

Phylum’s capabilities extend beyond pure source code analysis. We have constructed authorship models that, in combination with other metrics, allow us to identify odd behaviors around commits and activity. We analyze maintainer information for a package, allowing us to spot packages that have recently changed ownership that may be at risk for the introduction of malware (as was the case with even-stream in 2018).

As we look forward, we are imminently preparing the release of C#/Nuget and Java/Maven support. In addition to this, we are pushing hard to increase both the sophistication and number of our heuristics and analytics.

Phylum, at its core, is a risk detection system focusing on the software supply chain. Unlike other SCA products that focus nearly exclusively on well-known issues, we are looking for the unknown unknowns - the subtle modifications to a software package that will surreptitiously exfiltrate keys to your critical infrastructure. We do this at the scale of open source, tackling the problem in an automated fashion, to make software supply chain security proactive instead of merely reactive.

To learn more about Phylum’s automated malware identification capability and how we support secure and efficient use of open-source software; contact us for a conversation.

Phylum Research Team

Hackers, Data Scientists, and Engineers responsible for the identification and takedown of software supply chain attackers.

Preview: Malware Spotlight

Packages of Interest

Why Phylum & What’s Coming Next....

Phylum Research Team

Subscribe to our research

You might also like...