Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix __typename inside router response #6009

Open
wants to merge 13 commits into
base: dev
Choose a base branch
from
Open

Conversation

IvanGoncharov
Copy link
Member

@IvanGoncharov IvanGoncharov commented Sep 17, 2024

This PR fixes the following edge cases:

1. The initial response of the query containing @defer should be an empty object

For a query like this:

{
  ... @defer { me { name } }
}

The initial response should have data set to an empty object, as in this graphql-js test case:
https://github.com/graphql/graphql-js/blob/2b42a70191243d0ca0e0e4f1d580d6794718fbd5/src/execution/__tests__/defer-test.ts#L371-L384

2. data: null semantic should be maintained

According to GraphQL spec, null can be propagated to data: https://spec.graphql.org/draft/#sel-FANTNRCAACGBrwa
But the fix:

Some(Value::Null) => {
// Detect if root __typename is asked in the original query (the qp doesn't put root __typename in subselections)
// cf https://github.com/apollographql/router/issues/1677
let operation_kind_if_root_typename = original_operation.and_then(|op| {
op.selection_set
.iter()
.any(|f| f.is_typename_field())
.then(|| *op.kind())
});
response.data = match operation_kind_if_root_typename {
Some(operation_kind) => {
let mut output = Object::default();
output.insert(TYPENAME, operation_kind.default_type_name().into());
Some(output.into())
}
None => Some(Value::default()),
};

Added in #2253, resulted data: null being replaced with data: { __typename: "<some typename>" }, for queries like this:

{
  nonNullFieldThatErrors
  __typename
}

3. Handle cases where query with @defer also have __typename inside fragment

Fix added in #2253, here:

Some(Value::Null) => {
// Detect if root __typename is asked in the original query (the qp doesn't put root __typename in subselections)
// cf https://github.com/apollographql/router/issues/1677
let operation_kind_if_root_typename = original_operation.and_then(|op| {
op.selection_set
.iter()
.any(|f| f.is_typename_field())
.then(|| *op.kind())
});
response.data = match operation_kind_if_root_typename {
Some(operation_kind) => {
let mut output = Object::default();
output.insert(TYPENAME, operation_kind.default_type_name().into());
Some(output.into())
}
None => Some(Value::default()),
};

And here:

// Detect if root __typename is asked in the original query (the qp doesn't put root __typename in subselections)
// cf https://github.com/apollographql/router/issues/1677
let operation_kind_if_root_typename =
original_operation.and_then(|op| {
op.selection_set
.iter()
.any(|f| f.is_typename_field())
.then(|| *op.kind())
});
if let Some(operation_kind) = operation_kind_if_root_typename {
output.insert(TYPENAME, operation_kind.default_type_name().into());
}

Only worked for cases where __typename is specified directly in the topmost selection set, but didn't work for __typenames wrapped in fragments, e.g.:

{
  ... { __typename }
  ... defer {
    me { name }
  }
}

Because of #1, #2 and #3 fix added in #2253 was removed and but fix for #4 also fixes issues with @defer

4. Remaping subgraph's root type names to supergraph names didn't work for __typename wrapped in fragments

Subgraphs could have their root types named differently than supergraphs.
Moreover, different subgraphs could have different names for root types.

QP removes __typename from the topmost selection set, so in combination with:

} else if name.as_str() == TYPENAME {
if !output.contains_key(field_name_str) {
output.insert(field_name.clone(), Value::String(root_type_name.into()));
}

This issue is fixed for queries like this:

{
  __typename
  me { name }
}

But QP doesn't delete __typename wrapped in fragments like so (same true for named fragment):

{
  ... { __typename }
  me { name }
}

That means

} else if name.as_str() == TYPENAME {
if !output.contains_key(field_name_str) {
output.insert(field_name.clone(), Value::String(root_type_name.into()));
}

Can't be reached because subgraph will return __typename value for root, and it will trigger this "if" instead:

if let Some(input_value) = input.get_mut(field_name_str) {

That's why I changed the order of these if branches, so the subgraph's value of __typename for the root type is always ignored, and the supergraph's root type names are always used. Also, the exact same order of if branches are already used inside apply_selection_set.

However, this code is not reachable for __typenames wrapped in fragments because apply_root_selection_set incorrectly calls apply_selection_set for selection sets wrapped in fragments:

self.apply_selection_set(
selection_set,
parameters,
input,
output,
path,
// FIXME: use `ast::Name` everywhere so fallible conversion isn’t needed
#[allow(clippy::unwrap_used)]
&FieldType::new_named(type_condition.try_into().unwrap()).0,
)?;

self.apply_selection_set(
&fragment.selection_set,
parameters,
input,
output,
path,
// FIXME: use `ast::Name` everywhere so fallible conversion isn’t needed
#[allow(clippy::unwrap_used)]
&FieldType::new_named(root_type_name.try_into().unwrap()).0,
)?;

If root fields are wrapped in fragments, they are still applied on root type, e.g.:

{
   ... { 
     a: __typename # applied on query root
       ... {
         b: __typename # still query root
         
         me {
           __typename # not a root
         }
       }
     }
   }
}

So, I switch those calls to be apply_root_selection_set.
All of the above steps in combination fixed #4

5. Incorrect errors on inline fragments using root's interfaces

Root type can implement interfaces and it's legal to use those interfaces in queries, like so:

interface Foo {
   foo: String
}

type Query implements {
   foo: String
}

with query:

{
  ... on Foo {
    foo
  }
}

But before my change, this query would error here:

if type_condition.as_str() != root_type_name {
return Err(InvalidValue);
}

6. __typename with alias returned by subgraph is not validated

__typename is validated here:

if let Some(input_type) =
input_object.get(TYPENAME).and_then(|val| val.as_str())
{
// If there is a __typename, make sure the pointed type is a valid type of the schema. Otherwise, something is wrong, and in case we might
// be inadvertently leaking some data for an @inacessible type or something, nullify the whole object. However, do note that due to `@interfaceObject`,
// some subgraph can have returned a __typename that is the name of an interface in the supergraph, and this is fine (that is, we should not
// return such a __typename to the user, but as long as it's not returned, having it in the internal data is ok and sometimes expected).
let Some(ExtendedType::Object(_) | ExtendedType::Interface(_)) =
parameters.schema.types.get(input_type)
else {
parameters.nullified.push(Path::from_response_slice(path));
*output = Value::Null;
return Ok(());
};
}

And is used to assign current_type here:

let current_type = if parameters
.schema
.get_interface(field_type.inner_named_type())
.is_some()
|| parameters
.schema
.get_union(field_type.inner_named_type())
.is_some()
{
typename.as_ref().unwrap_or(field_type)
} else {
field_type
};

But aliased __typename handled here (same code for non-aliased, but they already validated by code above):

let current_type = if parameters
.schema
.get_interface(field_type.inner_named_type())
.is_some()
|| parameters
.schema
.get_union(field_type.inner_named_type())
.is_some()
{
typename.as_ref().unwrap_or(field_type)
} else {
field_type
};

This code guarantees that the aliased __typename contains the name of the object type from the supergraph schema.
But it doesn't guarantee that all __typenames (aliased and not) have the same value that is compatible with current_type.

As #4 proves, we can't trust the subgraph to return the correct names, and QP doesn't guarantee that either.
So, since we already use current_type to track the type of the current selection set, it makes sense to use it as a source of truth for all __typenames but fallback to subgraph's __typename if current_type is not an object (shouldn't happen unless QP has bugs).

Fixes #issue_number


Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Tests added and passing3
    • Unit Tests
    • Integration Tests
    • Manual Tests

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

This comment has been minimized.

if include_skip.should_skip(parameters.variables) {
continue;
}

self.apply_selection_set(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are still inside root selection, even if we go inside an inline fragment.
So it should be apply_root_selection_set

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sounds reasonable but looks like a dangerous change. Do you have a test that would show what happens before and after that change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it tested here:

async fn test_subgraph_returning_different_typename_on_query_root() -> Result<(), BoxError> {
let mut router = IntegrationTest::builder()
.config(CONFIG)
.responder(ResponseTemplate::new(200).set_body_json(json!({
"data": {
"topProducts": null,
"__typename": "SomeQueryRoot",
"aliased": "SomeQueryRoot",
"inside_fragment": "SomeQueryRoot",
"inside_inline_fragment": "SomeQueryRoot"
}
})))
.build()
.await;
router.start().await;
router.assert_started().await;
let query = r#"
{
topProducts { name }
__typename
aliased: __typename
...TypenameFragment
... {
inside_inline_fragment: __typename
}
}
fragment TypenameFragment on Query {
inside_fragment: __typename
}
"#;
let (_trace_id, response) = router.execute_query(&json!({ "query": query })).await;
assert_eq!(response.status(), 200);
assert_eq!(
response.json::<serde_json::Value>().await?,
json!({
"data": {
"topProducts": null,
"__typename": "Query",
"aliased": "Query",
"inside_fragment": "Query",
"inside_inline_fragment": "Query"
}
})
);
Ok(())
}

.unwrap_or_else(|| {
Value::String(ByteString::from(
current_type.inner_named_type().as_str().to_owned(),
))
Copy link
Member Author

@IvanGoncharov IvanGoncharov Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, we tried to use __typename values returned by the subgraph but didn't validate them. I assume (not tested) it can even be the name of the interface if the subgraph uses @interfaceObject

According to the GraphQL spec, __typename should always use an object name (interfaces and unions are forbidden), so I changed the code to first use current_type (which is always valid) and then fallback to using the subgraph's value only if current_type is not an object type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be checked with what the output rewriting code in the QP is doing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what to do here.
Realistically, I can't inspect QP in the scope of this PR.
I can revert this particular change, because it separated from other fixes.
@Geal Do you think it worth to revert it?

@IvanGoncharov IvanGoncharov marked this pull request as ready for review September 17, 2024 02:52
@IvanGoncharov IvanGoncharov requested review from a team as code owners September 17, 2024 02:52
@@ -311,6 +311,8 @@ impl PlanNode {
let _ = primary_sender.send((value.clone(), errors.clone()));
} else {
let _ = primary_sender.send((value.clone(), errors.clone()));
// primary response should be an empty object
value.deep_merge(Value::Object(Default::default()));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the data field should ve an enpty object (unless it contained a non nullable field that was nullified), then I think this should be fixed here:

&initial_value.unwrap_or_default(),

to set it to an object instead of Value::Null by default

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You right, it should be correct.
I hesitated to do it since I'm less familiar with how QP executes and also I don't have capacity to test all possible scenarios in context of this PR.
So I choose to do it just inside this specific branch that I'm 100% sure is correct.

@Geal Do you think it safe to make this change for entire QP execution?
Or I can create issue for that I do it in separate PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe in a separate PR, considering the timing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created a router issue for it.

@IvanGoncharov IvanGoncharov requested a review from a team as a code owner September 17, 2024 14:24
@@ -311,6 +311,8 @@ impl PlanNode {
let _ = primary_sender.send((value.clone(), errors.clone()));
} else {
let _ = primary_sender.send((value.clone(), errors.clone()));
// primary response should be an empty object
value.deep_merge(Value::Object(Default::default()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the data field should ve an enpty object (unless it contained a non nullable field that was nullified), then I think this should be fixed here:

&initial_value.unwrap_or_default(),

to set it to an object instead of Value::Null by default

apollo-router/tests/integration/typename.rs Show resolved Hide resolved
apollo-router/tests/integration/subgraph_response.rs Outdated Show resolved Hide resolved
assert_eq!(response.status(), 200);
assert_eq!(
serde_json::from_str::<serde_json::Value>(&response.text().await?)?,
json!({ "data": null })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why should { __typename topProducts { name } } return a null data if the subgraph returns null? topProducts is a nullable field, right?
Is it for the case where the subgraph would nullify because there was a non nullable field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it was mainly to test null propagation that goes up to data.
However, graphql servers can return data: null if execution is started by no resolvers executed:
https://spec.graphql.org/October2021/#sel-FAPHPHCAACEBr6C

For example, here is the test from graphql-js:
https://github.com/graphql/graphql-js/blob/993d7ce2ee6db2a13973b037817f3eac60607b8a/src/execution/__tests__/executor-test.ts#L971-L981

These cases are but could happen, so we should keep semantics.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the "right" way would be to detect from errors if we got a null because a non null field was nullified by the subgraph, but that may be too complex

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we switch default value to execute_recursively to be an object as discussed earlier that would mean we would get empty object in most of the cases.
But we should inspect QP execution to see all possible scenarios.

}

#[tokio::test(flavor = "multi_thread")]
async fn test_subgraph_returning_different_typename_on_query_root() -> Result<(), BoxError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure in this test that the __typename selections got all the way to the subgraph? If that's the case, we are supposed to have some field rewriting driven by the query plan fetch nodes (input_rewrites and output_rewrites). Is that test exercising that rewriting, or what is happening in query formatting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happening in query formatting, that's test that answer your comment here: #6009 (comment)

By switching to apply_root_selection_set we always execute this code (even if parts of root selection is wrapped in fragments):

if name.as_str() == TYPENAME {
if !output.contains_key(field_name_str) {
output.insert(field_name.clone(), Value::String(root_type_name.into()));
}

So now we 100% sure __typename on root type is valid independently from anything that QP will return.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's good to make sure of that but probably we need a test that exercise the entire pipeline, not only the formatting

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's done in this test:

async fn test_subgraph_returning_different_typename_on_query_root() -> Result<(), BoxError> {
let mut router = IntegrationTest::builder()
.config(CONFIG)
.responder(ResponseTemplate::new(200).set_body_json(json!({
"data": {
"topProducts": null,
"__typename": "SomeQueryRoot",
"aliased": "SomeQueryRoot",
"inside_fragment": "SomeQueryRoot",
"inside_inline_fragment": "SomeQueryRoot"
}
})))
.build()
.await;
router.start().await;
router.assert_started().await;
let query = r#"
{
topProducts { name }
__typename
aliased: __typename
...TypenameFragment
... {
inside_inline_fragment: __typename
}
}
fragment TypenameFragment on Query {
inside_fragment: __typename
}
"#;
let (_trace_id, response) = router.execute_query(&json!({ "query": query })).await;
assert_eq!(response.status(), 200);
assert_eq!(
response.json::<serde_json::Value>().await?,
json!({
"data": {
"topProducts": null,
"__typename": "Query",
"aliased": "Query",
"inside_fragment": "Query",
"inside_inline_fragment": "Query"
}
})
);
Ok(())
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it testing all possible scenario for how __typename can be nested and tests that __typename returned by subgraph are ignored.

apollo-router/src/spec/query.rs Show resolved Hide resolved
.unwrap_or_else(|| {
Value::String(ByteString::from(
current_type.inner_named_type().as_str().to_owned(),
))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be checked with what the output rewriting code in the QP is doing

if include_skip.should_skip(parameters.variables) {
continue;
}

self.apply_selection_set(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sounds reasonable but looks like a dangerous change. Do you have a test that would show what happens before and after that change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants