Phylum’s Monthly Malware Report: March 2022 – Unknown Unknowns

Phylum’s Monthly Malware Report: March 2022 – Unknown Unknowns

Relying on security research to manually discover open-source packages that exhibit supply chain issues is no longer enough. To truly mitigate the risk of using open-source software written by strangers on the Internet, we must analyze all packages published into the various ecosystems, in real time and at scale.

Phylum was purpose built to analyze the risk of all package releases with the necessary scale, depth, and automation in mind. In the last 29 days, Phylum has processed a total of 545,777 package releases across three ecosystems (NPM, PyPI, and RubyGems), for an average of 18,818 packages processed each day. We have analyzed the metadata and source code for each of these packages, resulting in the processing of 38,798,991 individual source files.

Package Registry

# of Packages

# of Packages/day

NPM

467,818

16,131

PyPI

69,966

2,412

RubyGems

7,993

275.6

Total

545,777

Average: 18,818

Once this data has been collected, Phylum’s analytics, heuristics and ML models comb through the data to identify risk indicators. We look for a myriad of these risk indicators but are uniquely positioned to identify and convict malware. Phylum’s analytics seek to make determinations on the maliciousness of a package before a developer adds it to a production release. Examples of the analytics that identify elements of malware include:

  • high entropy strings used as arguments to evaluation functions
  • risky function calls
  • author/maintainer transition
  • package name similarity (for typosquatting detection)
  • package code similarity (for typosquatting detection)
  • pre and post installation hooks
  • calls to system binaries frequently used for host compromise
  • presence and changes to URIs and IP addresses

Our completely automated processing and analytics pipeline identified 72 package versions of interest over the past month. This includes legitimate malware, packages for reconnaissance, network enumeration, and individuals attempting to conduct security research. On average, these packages were identified within 11.2 minutes of publication.

Next, Phylum researchers validate the results of the identified packages, use the resulting data to improve our analytics, and report the malicious packages to the respective package registry. This amounted to 41 packages reported to two package registries as several packages had multiple affected versions.

In doing so, Phylum has simultaneously reduced the window of opportunity for an attacker to infect a victim and made it more difficult for a published malicious package to remain undetected for months, or years.

Preview: Malware Spotlight

Going forward, Phylum will select interesting packages identified during this process for deeper review on our blog. We’ll use adblock-lists as an example for a preview.

Figure 1 - Phylum identifies a malicious package attempting dependency confusion against an unknown target within minutes of publication
Figure 1 - Phylum identifies a malicious package attempting dependency confusion against an unknown target within minutes of publication

This package was quickly identified as a package version of interest, then verified and reported in a total of 14 minutes. In the image above, we can see three (3) artifacts that the system identified and a fourth explained:

  1. The package is missing a README. This isn’t common for legitimate software libraries and is a minor contributor to the classification.
  2. The package only has 2 released versions yet has a version number of 99.99.0. This is almost assuredly an attempt to target victims with a Dependency Confusion attack and is a large contributor to the classification.
  3. The package is missing a link to a version control system such as GitHub or GitLab. This is uncommon for legitimate software libraries, but very common for packages released by attackers. This is a minor contributor to the classification.
  4. Via automated static analysis of the source code, Phylum identified the package as employing an install hook that executes curl to send an HTTP request to URI. This is almost assuredly illegitimate and is a large contributor to the classification.

Packages of Interest

Phylum identified 41 packages of interest, with several triggering on multiple versions of the package.

NPM Packages:

adblock-lists

testcucmanh2

rif-marketplace-engine

eslint-plugin-cypress-rules

pstagger

custom-colour

rif-marketplace-engine-sdk

cusom-colours

who.fhir.template

test4dc

openhie.fhir.template

testcucmanh3

cqf.fhir.template

testcucmanh4

node-callstats

testcucmanh5

wasm-thumbnail-js

@madflys/reactVersion

autoplay-whitelist

gapminder-offline

extension-whitelist

dhparadox_teste

anonymous-credentials

superlegal

star-wasm

retrofit2

nsovo-pkg

desugar_jdk_libs

sj-rs

viewlioconfig

nsovo-pkg-2

huutokaupat

cschandragirisample

okhttp-tls

winlocker

forge-locale

testcucmanh2

redis-v3

custom-color

@roku-web-core/ajax

Why Phylum & What’s Coming Next....

Phylum’s capabilities extend beyond pure source code analysis. We have constructed authorship models that, in combination with other metrics, allow us to identify odd behaviors around commits and activity. We analyze maintainer information for a package, allowing us to spot packages that have recently changed ownership that may be at risk for the introduction of malware (as was the case with even-stream in 2018).

As we look forward, we are imminently preparing the release of C#/Nuget and Java/Maven support. In addition to this, we are pushing hard to increase both the sophistication and number of our heuristics and analytics.

Phylum, at its core, is a risk detection system focusing on the software supply chain. Unlike other SCA products that focus nearly exclusively on well-known issues, we are looking for the unknown unknowns - the subtle modifications to a software package that will surreptitiously exfiltrate keys to your critical infrastructure. We do this at the scale of open source, tackling the problem in an automated fashion, to make software supply chain security proactive instead of merely reactive.

To learn more about Phylum’s automated malware identification capability and how we support secure and efficient use of open-source software; contact us for a conversation.

Phylum Research Team

Phylum Research Team

Hackers, Data Scientists, and Engineers responsible for the identification and takedown of software supply chain attackers.