Skip to content

Phylum’s Monthly Malware Report: March 2022 – Unknown Unknowns

Phylum was purpose built to analyze the risk of all package releases with the necessary scale, depth and automation in mind.

Published on

Mar 15, 2022

Written by

Louis Lang, CTO

Category

Research

Share

Relying on security research to manually discover open-source packages that exhibit supply chain issues is no longer enough. To truly mitigate the risk of using open-source software written by strangers on the Internet, we must analyze all packages published into the various ecosystems, in real time and at scale.

Phylum was purpose built to analyze the risk of all package releases with the necessary scale, depth, and automation in mind. In the last 29 days, Phylum has processed a total of 545,777 package releases across three ecosystems (NPM, PyPI, and RubyGems), for an average of 18,818 packages processed each day. We have analyzed the metadata and source code for each of these packages, resulting in the processing of 38,798,991 individual source files.

Package Registry

# of Packages

# of Packages/day

NPM

467,818

16,131

PyPI

69,966

2,412

RubyGems

7,993

275.6

Total

545,777

Average: 18,818

 

Once this data has been collected, Phylum’s analytics, heuristics and ML models comb through the data to identify risk indicators. We look for a myriad of these risk indicators but are uniquely positioned to identify and convict malware. Phylum’s analytics seek to make determinations on the maliciousness of a package before a developer adds it to a production release. Examples of the analytics that identify elements of malware include:

  • high entropy strings used as arguments to evaluation functions
  • risky function calls
  • author/maintainer transition
  • package name similarity (for typosquatting detection)
  • package code similarity (for typosquatting detection)
  • pre and post installation hooks
  • calls to system binaries frequently used for host compromise
  • presence and changes to URIs and IP addresses

Our completely automated processing and analytics pipeline identified 72 package versions of interest over the past month. This includes legitimate malware, packages for reconnaissance, network enumeration, and individuals attempting to conduct security research. On average, these packages were identified within 11.2 minutes of publication.

Next, Phylum researchers validate the results of the identified packages, use the resulting data to improve our analytics, and report the malicious packages to the respective package registry. This amounted to 41 packages reported to two package registries as several packages had multiple affected versions.

In doing so, Phylum has simultaneously reduced the window of opportunity for an attacker to infect a victim and made it more difficult for a published malicious package to remain undetected for months, or years.

Preview: Malware Spotlight

Going forward, Phylum will select interesting packages identified during this process for deeper review on our blog. We’ll use adblock-lists as an example for a preview.

Figure 1 - Phylum identifies a malicious package attempting dependency confusion against an unknown target within minutes of publication
Figure 1 - Phylum identifies a malicious package attempting dependency confusion against an unknown target within minutes of publication

 

This package was quickly identified as a package version of interest, then verified and reported in a total of 14 minutes. In the image above, we can see three (3) artifacts that the system identified and a fourth explained:

  1. The package is missing a README. This isn’t common for legitimate software libraries and is a minor contributor to the classification.
  2. The package only has 2 released versions yet has a version number of 99.99.0. This is almost assuredly an attempt to target victims with a Dependency Confusion attack and is a large contributor to the classification.
  3. The package is missing a link to a version control system such as GitHub or GitLab. This is uncommon for legitimate software libraries, but very common for packages released by attackers. This is a minor contributor to the classification.
  4. Via automated static analysis of the source code, Phylum identified the package as employing an install hook that executes curl to send an HTTP request to URI. This is almost assuredly illegitimate and is a large contributor to the classification.

Packages of Interest

Phylum identified 41 packages of interest, with several triggering on multiple versions of the package.

NPM Packages:

adblock-lists

testcucmanh2

rif-marketplace-engine

eslint-plugin-cypress-rules

pstagger

custom-colour

rif-marketplace-engine-sdk

cusom-colours

who.fhir.template

test4dc

openhie.fhir.template

testcucmanh3

cqf.fhir.template

testcucmanh4

node-callstats

testcucmanh5

wasm-thumbnail-js

@madflys/reactVersion

autoplay-whitelist

gapminder-offline

extension-whitelist

dhparadox_teste

anonymous-credentials

superlegal

star-wasm

retrofit2

nsovo-pkg

desugar_jdk_libs

sj-rs

viewlioconfig

nsovo-pkg-2

huutokaupat

cschandragirisample

okhttp-tls

winlocker

forge-locale

testcucmanh2

redis-v3

custom-color

@roku-web-core/ajax

Why Phylum & What’s Coming Next....

Phylum’s capabilities extend beyond pure source code analysis. We have constructed authorship models that, in combination with other metrics, allow us to identify odd behaviors around commits and activity. We analyze maintainer information for a package, allowing us to spot packages that have recently changed ownership that may be at risk for the introduction of malware (as was the case with even-stream in 2018).

As we look forward, we are imminently preparing the release of C#/Nuget and Java/Maven support. In addition to this, we are pushing hard to increase both the sophistication and number of our heuristics and analytics.

Phylum, at its core, is a risk detection system focusing on the software supply chain. Unlike other SCA products that focus nearly exclusively on well-known issues, we are looking for the unknown unknowns - the subtle modifications to a software package that will surreptitiously exfiltrate keys to your critical infrastructure. We do this at the scale of open source, tackling the problem in an automated fashion, to make software supply chain security proactive instead of merely reactive.

To learn more about Phylum’s automated malware identification capability and how we support secure and efficient use of open-source software; contact us for a conversation.

Subscribe to Our Research

Subscribe to Our Research

Latest Articles

Disrupting a PyPI Software Supply Chain Threat Actor
Research   |   Nov 22, 2022

Disrupting a PyPI Software Supply Chain Threat Actor

Phylum disrupts software supply chain attacker attempting to constru...

W4SP Stealer Update—Attacker now Attempting to Masquerade as Popular Orgs
Research   |   Nov 18, 2022

W4SP Stealer Update—Attacker now Attempting to Masquerade as Popular Orgs

Phylum's team has discovered more PyPI packages attempting to delive...

Malicious Python Packages Replace Crypto Addresses in Developer Clipboards
Malware   |   Nov 07, 2022

Malicious Python Packages Replace Crypto Addresses in Developer Clipboards

Phylum uncovers a new campaign targeting Python developers. Malware ...