Skip to content

Apollo Query Planner and Apollo Gateway may infinitely loop on sufficiently complex queries

High severity GitHub Reviewed Published Aug 27, 2024 in apollographql/federation • Updated Sep 13, 2024

Package

npm @apollo/gateway (npm)

Affected versions

>= 2.0.0, < 2.8.5

Patched versions

2.8.5
npm @apollo/query-planner (npm)
>= 2.0.0, < 2.8.5
2.8.5
cargo apollo-router (Rust)
< 1.52.1
1.52.1

Description

Impact

Instances of @apollo/query-planner >=2.0.0 and <2.8.5 are impacted by a denial-of-service vulnerability. @apollo/gateway versions >=2.0.0 and < 2.8.5 and Apollo Router <1.52.1 are also impacted through their use of @apollo/query-planner.

If @apollo/query-planner is asked to plan a sufficiently complex query, it may loop infinitely and never complete. This results in unbounded memory consumption and either a crash or out-of-memory (OOM) termination.

This issue can be triggered if you have at least one non-@key field that can be resolved by multiple subgraphs. To identify these shared fields, the schema for each subgraph must be reviewed. The mechanism to identify shared fields varies based on the version of Federation your subgraphs are using.

You can check if your subgraphs are using Federation 1 or Federation 2 by reviewing their schemas. Federation 2 subgraph schemas will contain a @link directive referencing the version of Federation being used while Federation 1 subgraphs will not. For example, in a Federation 2 subgraph, you will find a line like @link(url: "https://specs.apollo.dev/federation/v2.0"). If a similar @link directive is not present in your subgraph schema, it is using Federation 1. Note that a supergraph can contain a mix of Federation 1 and Federation 2 subgraphs.

To review Federation 1 subgraphs for impact:

In Federation 1 subgraphs, fields are implicitly shareable across subgraphs. To review for impact, you will need to review for cases where multiple subgraphs can resolve the same field. For example:

# Subgraph 1
type Query {
  field: Int
}

# Subgraph 2
type Query {
  field: Int
}

To review Federation 2 subgraphs for impact:

In Federation 2 subgraphs, fields must be explicitly defined as shareable across subgraphs. This is done via the @shareable directive. For example:

# Subgraph 1
@link(url: "https://specs.apollo.dev/federation/v2.0")
type Query {
  field: Int @shareable
}

# Subgraph 2
@link(url: "https://specs.apollo.dev/federation/v2.0")
type Query {
  field: Int @shareable
}

Impact Detail

This issue results from the Apollo query planner attempting to use a Number exceeding Javascript’s Number.MAX_VALUE in some cases. In Javascript, Number.MAX_VALUE is (2^1024 - 2^971).

When the query planner receives an inbound graphql request, it breaks the query into pieces and for each piece, generates a list of potential execution steps to solve the piece. These candidates represent the steps that the query planner will take to satisfy the pieces of the larger query. As part of normal operations, the query planner requires and calculates the number of possible query plans for the total query. That is, it needs the product of the number of query plan candidates for each piece of the query. Under normal circumstances, after generating all query plan candidates and calculating the number of all permutations, the query planner moves on to stack rank candidates and prune less-than-optimal options.

In particularly complex queries, especially those where fields can be solved through multiple subgraphs, this can cause the number of all query plan permutations to balloon. In worst-case scenarios, this can end up being a number larger than Number.MAX_VALUE. In Javascript, if Number.MAX_VALUE is exceeded, Javascript represents the value as “infinity”. If the count of candidates is evaluated as infinity, the component of the query planner responsible for pruning less-than-optimal query plans does not actually prune candidates, causing the query planner to evaluate many orders of magnitude more query plan candidates than necessary.

A given graph’s exposure to this issue varies based on its complexity. Consider the following Federation 2 subgraphs:

# Subgraph 1
type Query {
  field: Int @shareable
}

# Subgraph 2
type Query {
  field: Int @shareable
}

The query planner can solve requests for Query.field in one of two ways - either by querying subgraph 1 or subgraph 2.

The following query with 1024 aliased fields would trigger this issue because 2^1024 > Number.MAX_VALUE:

query {
  field_1: field
  field_2: field
  # ...
  field_1023: field
  field_1024: field
}

However, in a graph that provided 5 options to solve a given field, the bug could be encountered in a query that aliased the field approximately 440 times.

Patches

@apollo/query-planner 2.8.5
@apollo/gateway 2.8.5
Apollo Router 1.52.1

Workarounds

This issue can be avoided by ensuring there are no fields resolvable from multiple subgraphs. If all subgraphs are using Federation 2, you can confirm that you are not impacted by ensuring that none of your subgraph schemas use the @shareable directive. If you are using Federation 1 subgraphs, you will need to validate that there are no fields resolvable by multiple subgraphs.

Note that a supergraph can contain a mix of Federation 1 and Federation 2 subgraphs.

If you do have fields resolvable by multiple subgraphs, changing this behavior in response to this issue may be risky to the operation of your supergraph. We recommend that you update to a patched version of either Apollo Router or Apollo Gateway.

Apollo customers with an enterprise entitlement using the Apollo Router can also mitigate much of the risk from this issue by implementing Apollo’s Persisted Queries (PQ) feature. With PQ enabled, the Apollo Router will only execute safelisted queries. While customers would need to ensure that queries that induce this issue are not added to the safelist, PQs would mitigate the risk of clients submitting ad hoc queries that exploit this issue.

References

Additional information on Query Plans

References

@peakematt peakematt published to apollographql/federation Aug 27, 2024
Published to the GitHub Advisory Database Aug 27, 2024
Reviewed Aug 27, 2024
Published by the National Vulnerability Database Aug 27, 2024
Last updated Sep 13, 2024

Severity

High

CVSS overall score

This score calculates overall vulnerability severity from 0 to 10 and is based on the Common Vulnerability Scoring System (CVSS).
/ 10

CVSS v4 base metrics

Exploitability Metrics
Attack Vector Network
Attack Complexity Low
Attack Requirements None
Privileges Required None
User interaction None
Vulnerable System Impact Metrics
Confidentiality None
Integrity None
Availability High
Subsequent System Impact Metrics
Confidentiality None
Integrity None
Availability None

CVSS v4 base metrics

Exploitability Metrics
Attack Vector: This metric reflects the context by which vulnerability exploitation is possible. This metric value (and consequently the resulting severity) will be larger the more remote (logically, and physically) an attacker can be in order to exploit the vulnerable system. The assumption is that the number of potential attackers for a vulnerability that could be exploited from across a network is larger than the number of potential attackers that could exploit a vulnerability requiring physical access to a device, and therefore warrants a greater severity.
Attack Complexity: This metric captures measurable actions that must be taken by the attacker to actively evade or circumvent existing built-in security-enhancing conditions in order to obtain a working exploit. These are conditions whose primary purpose is to increase security and/or increase exploit engineering complexity. A vulnerability exploitable without a target-specific variable has a lower complexity than a vulnerability that would require non-trivial customization. This metric is meant to capture security mechanisms utilized by the vulnerable system.
Attack Requirements: This metric captures the prerequisite deployment and execution conditions or variables of the vulnerable system that enable the attack. These differ from security-enhancing techniques/technologies (ref Attack Complexity) as the primary purpose of these conditions is not to explicitly mitigate attacks, but rather, emerge naturally as a consequence of the deployment and execution of the vulnerable system.
Privileges Required: This metric describes the level of privileges an attacker must possess prior to successfully exploiting the vulnerability. The method by which the attacker obtains privileged credentials prior to the attack (e.g., free trial accounts), is outside the scope of this metric. Generally, self-service provisioned accounts do not constitute a privilege requirement if the attacker can grant themselves privileges as part of the attack.
User interaction: This metric captures the requirement for a human user, other than the attacker, to participate in the successful compromise of the vulnerable system. This metric determines whether the vulnerability can be exploited solely at the will of the attacker, or whether a separate user (or user-initiated process) must participate in some manner.
Vulnerable System Impact Metrics
Confidentiality: This metric measures the impact to the confidentiality of the information managed by the VULNERABLE SYSTEM due to a successfully exploited vulnerability. Confidentiality refers to limiting information access and disclosure to only authorized users, as well as preventing access by, or disclosure to, unauthorized ones.
Integrity: This metric measures the impact to integrity of a successfully exploited vulnerability. Integrity refers to the trustworthiness and veracity of information. Integrity of the VULNERABLE SYSTEM is impacted when an attacker makes unauthorized modification of system data. Integrity is also impacted when a system user can repudiate critical actions taken in the context of the system (e.g. due to insufficient logging).
Availability: This metric measures the impact to the availability of the VULNERABLE SYSTEM resulting from a successfully exploited vulnerability. While the Confidentiality and Integrity impact metrics apply to the loss of confidentiality or integrity of data (e.g., information, files) used by the system, this metric refers to the loss of availability of the impacted system itself, such as a networked service (e.g., web, database, email). Since availability refers to the accessibility of information resources, attacks that consume network bandwidth, processor cycles, or disk space all impact the availability of a system.
Subsequent System Impact Metrics
Confidentiality: This metric measures the impact to the confidentiality of the information managed by the SUBSEQUENT SYSTEM due to a successfully exploited vulnerability. Confidentiality refers to limiting information access and disclosure to only authorized users, as well as preventing access by, or disclosure to, unauthorized ones.
Integrity: This metric measures the impact to integrity of a successfully exploited vulnerability. Integrity refers to the trustworthiness and veracity of information. Integrity of the SUBSEQUENT SYSTEM is impacted when an attacker makes unauthorized modification of system data. Integrity is also impacted when a system user can repudiate critical actions taken in the context of the system (e.g. due to insufficient logging).
Availability: This metric measures the impact to the availability of the SUBSEQUENT SYSTEM resulting from a successfully exploited vulnerability. While the Confidentiality and Integrity impact metrics apply to the loss of confidentiality or integrity of data (e.g., information, files) used by the system, this metric refers to the loss of availability of the impacted system itself, such as a networked service (e.g., web, database, email). Since availability refers to the accessibility of information resources, attacks that consume network bandwidth, processor cycles, or disk space all impact the availability of a system.
CVSS:4.0/AV:N/AC:L/AT:N/PR:N/UI:N/VC:N/VI:N/VA:H/SC:N/SI:N/SA:N

EPSS score

0.059%
(27th percentile)

CVE ID

CVE-2024-43414

GHSA ID

GHSA-fmj9-77q8-g6c4
Loading Checking history
See something to contribute? Suggest improvements for this vulnerability.