Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

blog: Dependency Confusion and Typosquatting Attacks #1109

Merged
merged 9 commits into from
Sep 4, 2024
84 changes: 84 additions & 0 deletions docs/_posts/2024-08-13-dep-confusion-and-typosquatting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
title: "Defender's Perspective: Dependency Confusion and Typosquatting Attacks"
author: "Meder Kydyraliev (Google)"
is_guest_post: false
---

Dependency confusion and typosquatting attacks are very similar in their nature. They both exploit the weakness in the way many package managers identify packages using only their names. This weak form of establishing the identity of a package obtained from a package manager is what enables both of the attacks. Successfully exploiting this weakness enables the attacker to run arbitrary code at install time or at application's run time. These attacks are scalable, portable, and extremely cost-effective to carry out—making them very appealing to malicious actors.
meder marked this conversation as resolved.
Show resolved Hide resolved

This blog post explores the attacks from the defender's perspective and highlights how SLSA can be used to help defend against them. It also describes some additional capabilities which might be required to mitigate this and other supply chain risks more robustly.

## Dependency Confusion

The convenience of package managers for dependency management offered by package managers has been recognized by developers and organizations worldwide. It enabled them to leverage existing tooling to effectively distribute internal dependencies. One common approach involves running an internal, private instance of a package registry to distribute internal dependencies and configuring **_all_** build processes to first look for a package in the private registry and only if it is not found there going to the public instance of the package registry to fetch it[^1].
meder marked this conversation as resolved.
Show resolved Hide resolved

This works but is fragile. If someone in the organization attempts to build software that uses internal packages but doesn't correctly configure the build to use the private registry instance then the package installer will attempt to fetch internal packages from the public registry instance. Under normal circumstances this will return an error, as the internal package name would not be present in the public instance.
meder marked this conversation as resolved.
Show resolved Hide resolved

### The Attack

The attacker begins by performing reconnaissance to acquire names of internal packages. This can be done using a number of techniques, e.g. trawling through organization's open source repositories, inspecting shipped software or simply guessing the names. Once the attacker has the names of internal packages they register them with the public registry and release a new version with a malicious payload. At this point the attacker has to wait for one of the misconfigured builds to run and use the attacker's package — resulting in compromise. Effective use of this technique is described in detail in blog posts such as [Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies](https://medium.com/@alex.birsan/dependency-confusion-4a5d60fec610).
meder marked this conversation as resolved.
Show resolved Hide resolved

## Typosquatting

The workflow for developers to add a new dependency most often involves modification of a manifest file to add a reference to the new package by its name and version. The package name is usually manually typed in, copied/pasted from the web, or added by the IDE. Most of the input modes that involve humans are prone to transcription errors and typos, ranging from missing hyphens, lookalike characters, to transposed letters. Under normal circumstances the build would fail if a non-existent dependency is requested.
meder marked this conversation as resolved.
Show resolved Hide resolved
meder marked this conversation as resolved.
Show resolved Hide resolved

### The Attack

The attacker pre-registers (or "squats") a large number of package names with commonly seen typos in them—usually using popular package names as the starting point—and waits for victims to install their packages. The attack is opportunistic since, unlike dependency confusion, it doesn't target a specific organization. Notably, however, unlike dependency confusion, this attack has the potential to result in a more severe and hard to detect compromise. Public availability of packages that attackers target for squatting enables them to distribute pristine copies of the original packages and only deliver malicious payload at a later date by releasing an update. This attack vector can result in applications incorporating and using the attacker’s package in production.
meder marked this conversation as resolved.
Show resolved Hide resolved

There are a number of potential vectors that attackers can use as the source of names to squat on to increase the chances of success, e.g. package names popular in other ecosystems, variants of the package name used by Linux distributions or OSS package names "hallucinated" by LLMs.

## Mitigations

The inadequacy of just using package names as the way to identify software packages is hopefully obvious by now. Let's first explore common recommendations to mitigate _dependency confusion_ attacks and their limitations:
meder marked this conversation as resolved.
Show resolved Hide resolved

- **Namespacing:** Some package registries support namespacing, sometimes called scoping or organization support. This feature enables organizations to claim a namespace in the public registry for their internal dependencies. This prevents attackers from registering internal names, as they are not authorized to publish to the organization's namespace. Confidentiality concerns would likely prevent large organizations, that are usually targeted by the dependency confusion attacks, from hosting internal dependencies in the public instance; claiming the namespace is, however, sufficient to prevent attackers from exploiting dependency confusion. Namespacing is not supported by all package managers, and moreover the solution is not very robust as trust is placed into another forgeable identifier outside of the organization's control.
meder marked this conversation as resolved.
Show resolved Hide resolved
- **Registering internal names in the public registry:** a common practical workaround for registries that lack namespacing support is for organizations to do what the attackers do and claim internal package names in the public registry instance to prevent attackers from doing so. This recommendation doesn't address the root cause of dependency confusion attacks and requires ongoing coordination and synchronization of the names between public and private registries, which is fragile.
- **Pinning or hash validation:** some client-side tooling supports pinning or locking of dependencies by hash. Both internal and external dependencies will be listed in the lockfile. Effectively preventing a dependency confusion attack using lockfiles requires all updates to the lock file to ensure that hashes for internal packages match the list of authorized hashes for that package. Not all ecosystems universally support pinning with lockfiles, and even those that do may lack the functionality to manage and distinguish between internal and external dependencies.

In case of _typosquatting_ attacks the mitigation is not very straightforward and for the most part requires intervention at the point when developers are adding a new dependency. Effective mitigation could involve presenting the developer with metadata about the package being added (e.g. number of dependents, downloads) and prompting them to verify and ensure that the package being installed is the one that developer intended. This unscalable approach is prone to human errors.
meder marked this conversation as resolved.
Show resolved Hide resolved

### SLSA

meder marked this conversation as resolved.
Show resolved Hide resolved
#### Dependency Confusion

A much more robust way to address dependency confusion is to use SLSA. SLSA build provenance contains metadata about an artifact, which includes the URL of the source repository and identifies the build system that produced the artifact. This metadata enables secure binding of the package name and version to the canonical source repository and its build system, which is referred to as [expectations forming](https://slsa.dev/spec/v1.0/verifying-artifacts#forming-expectations) in SLSA.
meder marked this conversation as resolved.
Show resolved Hide resolved

Let's examine how SLSA build provenance prevents successful dependency confusion exploitation:

1. Organization's internal packages are built with a SLSA-compliant build system, which produces SLSA build provenance.
2. SLSA build provenance is [distributed along with the artifact](https://slsa.dev/spec/v1.0/distributing-provenance).
meder marked this conversation as resolved.
Show resolved Hide resolved
3. Client-side policy binds internal package names to their corresponding source repositories and builder systems.
4. Upon installation of the internal packages their build provenance is verified to ensure they were built by the authorized build system and from the canonical source repository.
meder marked this conversation as resolved.
Show resolved Hide resolved

Attackers are unable to forge SLSA build provenance thus all dependency confusion attempts will be immediately detected due to a different canonical source repository or builder ID. Native support for SLSA build provenance and its verification in ecosystems like npm will enable this robust form of protection against dependency confusion attacks.
meder marked this conversation as resolved.
Show resolved Hide resolved

#### Typosquatting

Dealing with _typosquatting_ attacks is trickier because at the time of the attack the developer is adding a new dependency, potentially interacting with it for the first time. Trust on first use (TOFU) is a common approach to bootstrapping trust and [forming expectations](https://slsa.dev/spec/v1.0/verifying-artifacts#forming-expectations), however, since it's impossible to know the developer's intent, all tooling can do is present them with the metadata about the package they are adding. Unfortunately that metadata could be for the attacker's impostor package.
meder marked this conversation as resolved.
Show resolved Hide resolved

Effective mitigation of typosquatting attacks requires ongoing integration of heuristics to proactively flag packages that appear like typosquatting attempts into all workflows that add new dependencies. Heuristics could range from evaluation over static data (e.g. package age) to ones requiring more time and resources (e.g. graph resolution or dynamic analysis). Managed ingestion, described below, is one very efficient and effective way to deploy such protections across larger enterprises.

### Managed Ingestion
meder marked this conversation as resolved.
Show resolved Hide resolved

Effective OSS supply chain security risk management hinges on an organization's ability to control what OSS can be used in an organization's products. This concept is not new, as it mirrors the approach taken in food supply chain management, where control over the ingredients included in food products is paramount. The concept is also reflected in [OpenSSF's s2c2f framework](https://github.com/ossf/s2c2f), which highlights control over ingestion in organizations as a crucial first step towards securing their software supply chains.

For a typical build process this means control over resolution of the OSS dependency graph and retrieval of the resolved dependencies from the Internet. Historically both processes lacked explicit control and transparency leading to a significant level of trust being placed in package managers and their associated registries.

Managed ingestion describes a dedicated deliberate process that happens separately from the build and involves an organization importing and assessing OSS packages before making them available to developers internally. While there is more than one way to implement managed ingestion, combining managed ingestion with existing artifact management solutions creates a very potent capability that provides organizations with control over the graph resolution and an opportunity for centralized supply chain risk management. Native support for different package ecosystems provided by most modern artifact management solutions ensures compatibility with most existing build and dependency management tools.

In this context a reasonable baseline for managed ingestion includes:
meder marked this conversation as resolved.
Show resolved Hide resolved

- **Implementation of an ingestion delay** for new versions of OSS packages. A simple but very effective mitigation against a number of supply chain attacks.
- **Mitigation of availability concerns** ensuring organizations are able to build and deploy even if upstream infrastructure is down.
- **Mitigation of dependency confusion attacks** by flagging upstream packages whose names clash with internal package names or that fail SLSA build provenance verification.
- **Mitigation of typosquatting** attacks by flagging upstream packages based on heuristics.
- Opportunity to **deploy existing content scanning tools on OSS packages** to flag known indicators of maliciousness or unexpected changes (e.g. changes in capabilities reported by [CAPSLOCK](https://github.com/google/capslock)).

The process of using OSS packages via registries presents a lot of risks beyond typosquatting and dependency confusion—and requires the same level of attention and control as the build process itself. Managed ingestion defines this fundamental capability required to successfully manage OSS supply chain risk. As part of the ongoing [SLSA dependencies track](https://github.com/slsa-framework/slsa/issues/961) effort we will work to formalize these concepts, including [those from s2c2f](https://github.com/slsa-framework/slsa/issues/1105), within the SLSA specification.
meder marked this conversation as resolved.
Show resolved Hide resolved

<!-- Footnotes themselves at the bottom. -->
## Notes

[^1]:
Multi-registry behaviour is ecosystem and configuration specific, e.g. PyPI configured with the discouraged --extra-index-url flag would pick the highest version if a package is present in private and public instance. Overall the mechanics of the attack remain the same
Loading