Digital Detritus: Unintended Consequences of Open Source Sustainability Platforms

Digital Detritus: Unintended Consequences of Open Source Sustainability Platforms

Perverse incentives - a situation made worse by incentivizing the wrong behavior. Real-world examples abound, like the Cobra effect or the Great Hanoi Rat Massacre, and now it has come to open source software. Right now, open source repositories are being polluted with thousands of dubious packages published by opportunistic actors exploiting a protocol having the noble intent of compensating open source developers for their contributions to the greater good. Join us as we take a look at this developing situation.

--cta--

Buy Me a Coffee Tea

Open Source Software (OSS) ecosystems—platforms like PyPI and npm—are undeniably the backbone of modern software development, enabling the easy distribution and discovery of countless tools and applications. Powering these ecosystems are remarkable and often unpaid developers who contribute their time and expertise to develop and freely distribute software and source code that powers everything from small personal projects to large-scale enterprise solutions. Not only does this collaborative model democratize software development, but it also fosters innovation and increases development velocity.

Given a system so heavily dependent on unpaid volunteers, one might ask: without financial benefit, what keeps the maintainers working and interested in their projects? The answer to this question is vital, especially when a package is a critical component of other important software.

To address this issue, initiatives, like Buy Me a Coffee and GitHub sponsors, are a good step in the right direction but often fall significantly short of most OSS developer financial requirements. Other platforms like Tidelift partner with open source maintainers, providing more significant compensation provided they adopt and maintain high-security development practices. The money comes from Tidelift enterprise customers who expect such partnerships to provide more trust, safety, and reliability in the open source packages they depend upon.

There are also other burgeoning attempts to address this problem, bringing us to our topic today. The Tea protocol is a web3 platform whose stated goal is compensating open source package maintainers, but instead of cash rewards, they are rewarded with TEA tokens, a cryptocurrency.

The Tea protocol white paper outlines a decentralized system that rewards open source developers for their contributions in a way that’s proportional to the value they provide—or in their words they allow developers “to capture the value they create”. At a high level, the system works by having developers register their projects with Tea. The Tea protocol then analyzes the entire dependency graph of open source projects to dynamically determine each project’s “Proof of Contribution”—its “teaRank” or overall value to the ecosystem. It then distributes rewards as TEA tokens to a project’s “treasury”.

Other parties besides developers participate in the Tea protocol via different mechanisms. Project supporters stake TEA tokens on projects they want to support. “Tea tasters”, assumed expert code reviewers, audit projects, “staking their tea” as a show of support and vouching for the validity and correctness of the project’s claims, e.g. functionality, security, license accuracy, and so on. The end goal here is the creation of a robust economy around open source software that accurately and proportionately rewards developers based on the value of their work through complex web3 mechanisms, programmable incentives, and decentralized governance.

The project is undeniably noble in heart and novel in approach. However, it raises the question: will bringing web3 and cryptocurrency dynamics to the heart of open source lead to its intended outcomes - just compensation for open source software developers who meaningfully contribute to the greater good, and thus increased trust in open source - or will unintended consequences follow?

A Sharp Uptick in npm Publications

Let’s rewind for a minute to share how we learned about the Tea protocol in the first place. One of our core missions here at Phylum is to defend developers from malware. We ingest and analyze every package published to open source ecosystems, including npm. Over the past few months, we’ve seen a significant uptick in new package publications to npm. That is, new packages, not new versions of existing packages but completely new packages not previously seen in the ecosystem. The following graph shows that starting around mid-to-end February of 2024, new package publications steadily increased towards the beginning of March 2024, then skyrocketed, peaking at over 7x the number of new daily packages typically seen.

New package publications to npm per day.

We started investigating this anomaly and found that coincident with this increase in new package publications, our systems were also flagging significant amounts of what we consider spam publications to npm. In the context of open source software, by “spam” we mean a non-malicious package published by a user with intentions other than providing utility or meaningful functionality to other users. Numerous varieties of spam abound in open source ecosystems, and one way that we classify spam is by its apparent intentions.

As we started looking into these thousands of spam packages, they fit into several campaigns that followed a particular pattern. The package names were either two to four random words joined with hyphens or an un-hyphenated portmanteau of two Latin words chosen from the Lorem Ipsum corpus. For example,

These packages, and several thousand more like them, are based on one of a handful of existing packages that the actors have cloned, provided a new random name for, changed only necessary metadata, and republished. There were a few interesting things we noticed early on in the investigation.

One was the vast number of dependencies many of these packages had that were clearly from the same campaign--we're talking on the order of thousands of dependencies. In other words, they were building a huge interconnected network of packages. An artificially created network like this introduces a dangerous risk from a malware defense perspective because of the transitive dependency problem. A single poison pill in this rat’s nest of package dependencies could spell disaster for any external legitimate package that accidentally reaches into the nest and names one of those packages as a dependency. However, we have yet to find anything overtly malicious in the network.

Another strange thing we noticed was that nearly every package had an associated and unique GitHub account with a like-named repo. Strangely, many of the repositories were effectively empty with nothing more than a minimal readme and a file named tea.yaml. Here is an example tea.yaml from the stay-including repository:

# <https://tea.xyz/what-is-this-file>
---
version: 1.0.0
codeOwners:
  - '0x16f50--REDACTED--'
quorum: 1

At first, it seemed innocuous. But this tea.yaml pattern persisted, and we began investigating the Tea protocol on tea.xyz. Further digging on GitHub revealed a script in a commit by a user named ylmin (who lists their name as Jack--remember that name) that appears to be tied to the publication of the random words and Latin campaigns. In it you can see they have a collection of npm API keys they cycle through and publish these packages through a proxy agent. We also found a file containing a map of npm API keys to published package names. We then began to suspect that the Tea protocol had incentivized a massive automated crypto farming campaign.

Spam, spam, spam, beans, spam

Spam, even at high volumes, is not new to open source ecosystems, especially npm. However, automated sustained spamming of this volume for months on end is rare and does nothing but cause heavy strain on the ecosystem itself, degrading the performance of the ecosystem for genuine users and straining open source security researchers.

We then took a step back from one particular campaign and looked at Tea-related packages as a whole. As of publication, there are over 14 thousand packages registered with the Tea protocol across all open source ecosystems, the majority of which were in npm. Manually checking every package is infeasible, but based on the naming schemes and patterns of the packages registered and some spot-checking, while some are legitimate, it’s clear that nearly all of them are spam. And, many separate campaigns are attempting to do the same thing.

The following plot shows how significant these campaigns are in the npm ecosystem.

New package publications to npm, including Tea and transitive Tea packages.

Under normal conditions, npm sees approximately 500-1000 new packages published daily. At the height of this crypto farming effort, around the start of March, we can see the number of Tea-registered packages published to npm exceeded the normal daily volume of total new packages published!

Since then, we have seen the spammers releasing packages at an even higher rate listing their own Tea-registered package as a dependency. These are not registered on Tea, but released on npm in an effort to increase their Tea package’s “teaRank”, thus making it more valuable in The tea ecosystem.

The Total Counts

These numbers are still increasing but as of our last count, we see 13,375 npm packages have been registered in Tea. We’ve also seen 53,513 transitive tea packages. For comparison, we count 238 packages on PyPI registered to tea, 301 on rubygems, and 141 on crates. Currently, the spam problem appears limited to npm.

Gaming the System

We joined the Discord server to see what was going on there and it’s pretty clear the challenge/reward structure is having some unintended consequences. For example:

Screenshot 2024-04-08 at 10.27.30 AM.png

Users who “don’t know jack” about software development are probably not very good candidates to build well-designed open source software projects.

Screenshot 2024-04-08 at 10.25.26 AM.png

Fair enough, some of the challenges do not require development skills such as “signing up for tea” or “verifying your email address”, but the challenges with the largest point rewards definitely do. Here’s a snapshot of the some of the challenges.

Screenshot 2024-04-08 at 10.50.22 AM.png
Rewards for tasks in the "Earl Grey" challenge.
Screenshot 2024-04-08 at 10.54.54 AM.png
Rewards for tasks in the "Jasmine" challenge.
Screenshot 2024-04-08 at 10.35.19 AM.png
Screenshot 2024-04-08 at 10.39.39 AM.png

Finally a voice of reason! Programming knowledge is indeed required to create an OSS project. But why let something like that deter you?

Screenshot 2024-04-08 at 11.33.56 AM.png

Or better yet…

Screenshot 2024-04-08 at 11.35.48 AM.png

One user, named Jack, even asked the same question we're asking!

About a week prior to that, we also saw Jack complaining that they were unable to claim their reward from a particular project. In the discussion with the moderators, we noticed them cropping out the names of the projects they were trying to claim rewards for.

A screenshot from Jack during the reward claiming discussion. Notice the top is cropped where the project name usually is.

Curious about this strange behavior, we dug a bit and were able to track the Tea address provided to the project associated with that address.

The Tea address provided in Jack's discussion leads to the npm package "far-web3-neck!"

We took a look at the package on npm.

far-web3-neck's npm landing page. Notice it has 7,721 dependents. Must be some package!

This is clearly part of the "random-words" campaign. We then followed that package to GitHub.

far-web3-neck's GitHub repo.

Jack! You're ylmin, and you're behind the random-words and Latin campaigns! Are you asking about the spam packages in the Tea Discord server because you're worried they're catching on? Or are you trying to call out the other campaigns while silently running your own?

As an aside, Jack's GitHub network is populated by other GitHub users who also contribute to the publication of packages belonging to the random-words and Latin campaigns. This appears to be a fake account like all the rest and it's highly likely whoever is actually behind this is using hundreds, possibly even thousands of fake GitHub and npm accounts to accomplish this high volume publication of spam.

The point here isn't necessarily to call out Jack and the names of actors perpetrating this spam campaign--but it was quite a coincidence to stumble into the very actor creating the packages that originally sent us down this rabbit hole. Instead, the point here is to show that there are strong incentives created by the Tea protocol and their reward system to heavily pollute the open source ecosystems, with npm taking the brunt of it.

Some Considerations

Before we wrap up, there are a few other things worth considering in light of this massive spam campaign.

  1. This is not the first time tea has had to deal with abuse. In late February, Connor Tumbleson wrote about a different type of abuse open source maintainers were having to deal with as a result of the Tea protocol. To their defense, tea responded the following day with a list of safeguards they were going to implement to prevent that particular kind of abuse.
  2. The Tea protocol is not even live yet. These users are farming points from the “Incentivized Testnet,” apparently with the expectation that having more points in the Testnet will increase their odds of receiving a later airdrop. It’s also worth noting that while Testnet TEA tokens have no value, Testnet points “are planned to become redeemable by eligible persons for blockchain tokens and/or other benefits at a later time.” If we are seeing this volume of spam for tokens of no value and points that may be worth something later, imagine the lengths bad actors will go to when these are actually worth something.
  3. As different challenges conclude and new ones begin, it is likely that the same actors, along with others, will seek novel ways to exploit the system in order to maximize their token and point counts. The specific manifestations of these exploits in future challenges within the open source ecosystems remain to be seen. However, if history is any indication, we strongly suspect that these actions will ultimately prove more detrimental than beneficial.

Final Thoughts

As we stated earlier, the Tea protocol outlined in their white paper is an innovative attempt to address the long-standing problem of open source developer sustainability and compensation. It acknowledges the limitations of relying on voluntary contributions and proposes a reward system to motivate developers. However, the protocol’s safeguards against spam, abuse, and gaming appear underdeveloped in its current state.

The countermeasures discussed, such as manual review by "tea tasters" and slashing rewards for bad actors, may not be sufficient to address the full scope of the problem. The reliance on human reviewers introduces scalability concerns and potential misalignment of incentives, and the teaRank algorithm itself seems vulnerable to external manipulation. Nothing as far as we can see prevents a single actor who has automated spam production also exploit the protocol as a tea taster, harvesting rewards for auditing their own spam under a different account. It is the 21st-century version of raising cobras or rats.

Moreover, short-term challenges designed to increase user engagement, may unintentionally encourage the very behaviors the system aims to prevent. Rewarding the most active users based on Incentivized Testnet points has already incentivized spam and the creation of fake packages, completely undermining the goal of fostering diverse, stable, well-maintained projects. After all, developing robust software is a gradual process that requires careful planning, steady maintenance, and continual improvement.

It's also important to recognize that wherever there is potential for financial gain, especially if it can be achieved fraudulently with low risk and potentially high reward, bad actors will seek to exploit the system. As long as there are incentives to behave badly, malicious actors will find ways to circumvent safeguards, creating a constant “whack-a-mole” battle between security measures and those seeking to exploit vulnerabilities.

As the Tea protocol develops, it is crucial to enhance its anti-abuse measures and continuously refine its incentive structures to ensure they drive the desired behaviors without encouraging gaming or spam. Otherwise, spam farmers will dilute the compensation due to the software developers that was originally intended.

Phylum Research Team

Phylum Research Team

Hackers, Data Scientists, and Engineers responsible for the identification and takedown of software supply chain attackers.