Repo Jacking: Hidden Danger in Broken Links
When contemplating the dangers of 3rd party libraries, there are a lot of things you can't control. While issues related to direct contribution or account compromises are certainly things to look out for, it turns out that these sorts of problems are just the tip of the iceberg.
Package managers are very powerful tools. They are built to be adaptive, friendly to automation, and provide as much flexibility as possible to developers. This extreme flexibility, however, has some sharp edges. As it turns out, many modern package management systems, including NPM and Golang's package system, allow you to link directly to Version Control System (VCS) repositories in order to download software packages as part of an executed build.
A Simple Example
While it sounds innocent enough, there are actually a number of significant problems with this approach. How exactly does this sort of mechanism work? Consider the following scenario: A software developer begins work on a new project. This project could be anything from a personal blog or project to a security critical financial services or healthcare dashboard. The threat is largely the same regardless of the security criticality of the project under development. For the purposes of our example, we'll keep the discussion high-level and generic. As we "break ground" on this new project, the developer scopes out what functionality needs to go into the project. The developer needs a few modules to do the work, some components to let users interact with it, and some methods to display results.
At some point, this developer will likely pull in some external packages to help build out the various pieces of the application, at which point our story really begins. Our developer adds a single library from an internet source: Package A. What Package A actually does is largely immaterial to the story. We'll say, for simplicity, that it provides some functions to produce Widgets that software developers will use to build their products. More important to note, however, is that Package A imports a few other packages, one of which is Package B, a package that prints out Widgets to the command line in different colors. Package B is a pretty useless thing. The maintainer of Package A thought it was sort of cool to let developers debug their code with colors and added it while writing some tests. Package B doesn't actually even ship with "production" builds. It exists to help developers work with Package A. Package B also imports a few more packages, including Package C which then imports Package E (for evil - very subtle!). To make this easier to follow, it would look like the following:
Original Project -> Package A -> Package B -> Package C -> Package E(vil)
One problem, however, is that it turns out Package E isn't really a package at all - at least not in the same sense as Package A. Package A is a simple Javascript library that was published to NPM with a well-defined author/maintainer and some limits on what can be done to it after it has been published. Package E is actually just a bundle of files hosted on Github.
What does this mean? Well, for starters, being a bundle of files on Github means that there are no real guard rails around any behavior from its author. Package E's maintainer is free to change the code in the branches associated with any release used by any package linking to it. Just as bad, Package E's author can also delete Package E from Github. This is something that broke builds across the internet a few years ago with the notorious leftpad debacle. All of this makes Package E, which our stalwart developer never even realized she was using, a huge risk.
Now some time passes, the software developer finishes the project, it goes into production, and perhaps she moves on to a new job. A new developer takes over, new features get added, and new builds go out to production. At some point, however, Package E's owner gets tired of managing Package E. Instead of finding a new maintainer, he just deletes his GitHub account. What happens now? In our scenario, Package E is a relatively inconsequential dependency. Nothing the new developer working on the project uses really breaks, and nobody notices that Package E just vanished from the internet. This doesn't mean, however, that references to Package E are gone. Every single build that the new development team makes will try to download and install Package E. As it is now gone, it will just fail. Now suppose, after all of this time, a developer realizes that Package E no longer exists and creates a new account that has the same name as Package E's old owner. This new developer then adds a new Package E, but this Package E doesn't behave like the old library. Instead, it does something else. This Package E installs a backdoor in any software packages it finds on the local system that will let the new owner of Package E access anything that looks like billing information on any website that uses Package E under the hood. At this point, every time a build runs, whether on a developer workstation or while running unit tests, the new and improved Package E will be installed. It will run the code written by the new author and instantly compromise all builds.
What This Means
This actually isn't a hypothetical example. In fact, there is a term for this sort of attack: Repo Jacking. As it turns out, this is a huge, ongoing problem. Not only does repo jacking impact existing packages (for organizations using older versions of software), but it is something that organizations also need to continuously monitor. There is no telling when an existing package maintainer may delete his/her account or simply move a package to a new organization. In many ways, it mimics some other well-understood attacks like subdomain takeovers. Repo jacking, however, is even more insidious because it only relies on an organization importing a package that has a vulnerable link somewhere up the chain. Unlike subdomain takeovers, identifying vulnerable spots is much harder than simply running a scanner to try to find unused domains.