The Evolution of Open Source
If you are familiar with the application security space, you might feel a bit confused. Many Software Composition Analysis (SCA) products exist, after all. With so many products, a renewed focus on Software Bill of Materials generation, and more open-source usage than ever before, how can SCA be dead?
While it is true that open source is more prevalent than ever before and that generating a list of packages incorporated within a project is essential, it does not go far enough. What is lacking in this scenario is machinery to address the broader problem: supply chain risk management.
To understand why SCA is no longer sufficient, it is vital first to understand what SCA is. SCA typically provides the following features:
- Bill of Materials generation.
- Vulnerability Information (where known).
- License Information (in some cases - not all products provide this).
What does SCA miss? In short, almost everything. SCA gives a shallow summary view of some known issues in upstream dependencies but stops there. Any issue that is not simply cataloged in a public database will not show up in an SCA report. This misses a substantial portion of the actual supply chain. Nowhere in this report of well-cataloged issues is a commentary on authors or information about the tools utilized in the process of building the product.
SCA takes a simplistic view of "dependency issues." Everything is listed as a vulnerability or license issue (where applicable). While these classes of issues are indeed problems, they miss the bigger picture.
Why SCA No Longer Works
The biggest problem is that most modern organizations develop software using open-source software. In fact, most projects contain more open-source software than in-house developed software. Unfortunately, this has now led us to a spot where even moderately complex commercial applications include thousands to tens-of-thousands of dependencies, with no real visibility or mechanism for supply chain risk management.
SCA Is Not Supply Chain Risk Management
SCA is not the same as Supply Chain Risk Management. Weighing in on vulnerabilities and license problems in the final deliverable (the thing produced after running through CI/CD) is more like running a diagnostic check on a car after it is manufactured than it is to providing a real view of the software supply chain.
It provides a simple glimpse into known, well-documented issues only without saying anything about the extended risks that may be present upstream. There is no mention of the authors involved in producing the software, no analysis of the programs themselves, and any findings that exist outside of public databases must be discovered and catalogued manually by security researchers.
SCA Is Incomplete
Thousands of packages, written by tens of thousands of authors and third-party contributors with a myriad of build and automation tools and platforms go into the software supply chain. It is a multi-dimensional attack surface with weak points. Many authors are unaffiliated individual contributors with massive access to everything downstream and are not subject to security policies, background checks or best practices. A vast number of components, from the packages themselves to infrastructure and build tooling, are all effectively crowdsourced and written by a pool of unknown, untrusted, and often anonymous contributors.
With this multi-faceted, multi-dimensional view of the landscape, SCA looks at a small, one-dimensional component of the supply chain: known issues in upstream packages. Unfortunately, SCA does not provide proactive discovery and analysis. This means that significant findings, such as malicious packages and compromised build tools, will often lead to major breaches that take months or even years to discover.
Why This Matters
While the issues surfaced by SCA products represent some risk to organizations and are things they should absolutely be concerned about, it provides at best an incomplete picture.
The open-source ecosystem has evolved substantially. Four to five years ago, there were only tens of thousands of packages available. Most projects had something in the neighborhood of tens to hundreds of upstream dependencies. Today, the number of dependencies has exploded. Most real-world projects that are even slightly complex have thousands of dependencies, and the ecosystems they reside in have hundreds of thousands to millions of packages.
To add even greater complexity, dependencies are in constant motion. They are updated individually on schedules dictated by their respective maintainers. Even the maintainers themselves frequently transition, handing total ownership, management, and maintenance responsibility to other random contributors within the open-source ecosystem.
What this means to individual organizations is that rather than having a set of packages that can be audited, tracked, and managed periodically, we now have a massive collection of fundamentally untrusted code written by tens of thousands of anonymous authors. This code executes not just on production systems and in CI/CD pipelines but also on the computers of every developer involved in working on a project using a specific package.
A New Solution is Needed
We can no longer rely on SCA to ensure that our software supply chain is safe. Relying on manual audits from security teams worked to a reasonable degree five to ten years ago, but it has failed to keep up as the software ecosystems have grown. Even when backed by light data science, reliance on manual audits can no longer keep pace with the sheer volume of software involved.
An innovative approach is needed: one which is entirely proactive and able to scale with the volume of software and contributions across the entire ecosystem. Enter Phylum. - our entire solution is architected to address major shortfalls and capability gaps in commodity solutions and to scale with the ecosystems that we seek to defend.