-
-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add lazy execution prototype #155
base: main
Are you sure you want to change the base?
Conversation
This PR is a proof of concept of adding lazy evaluation support to graphql-core. It follows a similar API to [graphql-ruby](https://graphql-ruby.org/schema/lazy_execution.html) and it allows the developer to define "lazy" type to enable batching without using asyncio. The tests illustrate how this can be used to enable a dataloader pattern. N.B. The DeferredValue object is very similar in functionality to the Promise library from graphql-core v2. I decided to reimplement a subset rather than use it directly though because it's scope is bigger than what I needed. It's a purely internal implementation detail though and can be replaced in future.
@Cito any thoughts on this approach? Do you think it's a viable way of solving this problem? |
@jkimbo In principle, yes. Will provide some feedback soon (currently working on graphql-relay-py). |
Hi @jkimbo. Thanks again for getting the ball rolling on this one. Let me recap what the problem is: Our use case is described in #71 and #148. The main problem is supporting synchronous batch data loaders, the additional problem is how to do this multi-threaded environments, where maybe each data loader would run in a different thread. Let’s start with the first problem. Your solution in this PR is to make the ExecutionContext support LazyValues as results from resolvers, and internally using DeferredValues to resolve these. The DataLoader would then convert the load calls to LazyValues, so that these are not immediately resolved. In principle, this works. But I feel not very good about this approach because of the following:
Can we do something else instead? I thought maybe in addition to Awaitables, we could also support Promises. We could make this universal by passing a Promise class to the ExecutionContext. This would allow to check if something is a promise. We would only require that a Promise has a So currently I think - why do not use what we have - support for asyncio? Instead of the DeferedValues, we could simply use asyncio.Futures. This way we would not need to introduce additional classes for lazy and deferred values and the ExecutionContext could stay as it is. To demonstrate this, I have rewritten your DataLoader in What do you think about this? |
Thanks for the detailed feedback @Cito and I agree with your summary of the problem, though specifically I'm trying to solve the case where you need batching in a purely synchronous context (and one that can't be async i.e. Django). Regarding your concerns:
That is true, though it's only the case if you implement the
Agreed but I don't see a way around it
Deferred values is meant to only be an internal concept to graphql-core. It shouldn't affect consumers of the library. It's there only to represent an object that can be resolved in the future.
I don't think that article is relevant here because it's arguing that using deferreds in JS is an antipattern because it has promises (which I agree with). The
Correct though you could implement a wrapper that would allow it. I think that is best done in userland.
That is possible though you then need a mechanism to "register" the dataloader batch functions with graphql-core. I'm not sure exactly what an API for that would look like though it might warrant further investigation. Note: the "lazy" value implementation can use threads when resolving batches so that it runs concurrently.
Yes that is a limitation, though I am trying to solve the problem of batching in a purely synchronous context so there wouldn't be any lazy values anyway. What you've implemented with Futures is basically how dataloaders work in asyncio (https://github.com/strawberry-graphql/strawberry/blob/main/strawberry/dataloader.py) and it works great when you can work in an asyncio context. However Django (and maybe other frameworks) currently don't support asyncio at all when interacting with the ORM and will raise an exception if you try and use it while there is a running asyncio loop (https://github.com/django/django/blob/37d9ea5d5c010d54a416417399344c39f4e9f93e/django/utils/asyncio.py#L17-L23). Unfortunately what you've suggested is not possible for any GraphQL server running on Django which is why I'm trying to come up with a solution that works only in a synchronous context. |
Seems I didn't understand the depth of the problem since I'm not using Django. Maybe we should then implement something like your solution for those who have this problem, but try to mitigate the issues I mentioned above as far as possble. Particularly, I think we shouldn't try to amend the existing ExecutionContext by adding the logic for lazy/deferred values, which currently can't coexist anyway, but instead create a subclass that is synchronously only, without any of the checks for awaitables and special handling of them. Instead, it should only check for deferred/lazy values. This context could then be used in the case where we can't have an async loop. It should also be configurable to allow different kinds of lazy values. Currently, the checks for lazy values are configurable via the Will try to experiment a bit to find the most clean and simple solution, but please have some patience, since I already know the next two weeks will be busy for me. But feel free to work on this in parallel as well. We will also need more tests, particularly for the error handling and the DeferredValue class. |
Sounds good @Cito , having a specific ExecutionContext class sounds like a good idea to me, I just wasn't sure how to implement it without refactoring the implementation which I know follows the js one closely. |
Hi @jkimbo. I have now implemented a specific DeferredExecutionContext as a subclass of the normal ExecutionContext. Instead of using DeferredValue and LazyValue classes, I have used a simple self-made synchronous Future class with a similar API as the existing Future classes in Python. I think this is a bit simpler to understand despite the callback hell. I have also added a test for the error handling, but we need more tests before releasing this. Let me know if this looks good to you and we should continue to follow this approach. Not sure whether we should release this as part of graphql-core or as a separate package. |
Sorry for not getting back to you sooner @Cito . Your code looks great and I think it's a much better approach. Regarding if we should split it out: I would prefer keeping it in graphql-core because it's going to be quite tied to the execution context class and so easily broken with new releases. If it's part of the library then at least it's obvious if the implement breaks. |
I think graphql-core should add support for Wait... what?I've been trying to have Basically, if you have a query like the following: query NewsFeed {
newsFeed {
stories {
text
comments @defer {
text
}
}
}
} The HTTP request (one request that resolves sequentially - it flushes per result) will resolve in: {
"data": {
"newsFeed": {
"stories": [{
"text": "blabla",
"comments": null
}]
}
}
}
// wait a few seconds
{
"path": ["newsFeed", "stories", 0, "comments"],
"data": [{
"text": "The first comment in the first story"
},
{
"text": "Second comment in the first story"
}]
}
// wait a few seconds
{
"path": ["newsFeed", "stories", 1, "comments"],
"data": [{
"text": "The first comment in the second story"
}]
}
All within the same HTTP request. Implementation Detailsasync def resolve_comments(self, info):
asyncio.sleep(2)
return ... Basically, the GraphQL query should return a ConclusionI think it's better to start from this async-first approach and then "perhaps" add a way to support in sync context. |
Hi @syrusakbary, thanks for this input, great ideas. As far as I understand, these features are currently on the way of getting into the spec. GraphQL.js already has an experimental "defer-stream" branch as well, and this may get merged in the upcoming v17. As far as I see they simply return an iterator that yields the deferred values if there are any, instead of returning a single result. I will incorporate this solution into GraphQL-core, too. Maybe we can also have an experimental branch or subpackage supporting this if this does not get merged soon. However, as @jkimbo mentioned, the problem with Django it does not tolerate a running event loop. So this case still needs some special support like the one in this PR. The good thing is that the solution I proposed above is just a subclass of the existing execution context, and nothing needs to be changed there - it will not affect the normal async usage at all. |
values.append(value) | ||
ret.resolve(values) | ||
|
||
for p in l: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for p in l: | |
for p in deferred_values.values(): |
Because the List l
will have plain_values too ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The work presented in this PR is really good 👍🏼
We were able to increase our GQL performance by at least 5X using dataloaders.
I've published a package for people to try this out: https://github.com/jkimbo/graphql-sync-dataloaders @Cito I still think this should live inside graphql-core for maintainability reasons but at least this way people can try it out in their projects. |
Thanks @jkimbo. I will definitely look into this again. But also like to get feedback whether this is useful or could be improved. |
Just to make sure I understand where this is going, we are going to have our own Promise/Future-like class that is thenable and require all all resolvers/data-loaders to return an instance of that class, and support for handling these instances would be done in a new ExecutionContext? |
This PR is a proof of concept of adding lazy evaluation support to
graphql-core.
It follows a similar API to graphql-ruby
and it allows the developer to define a "lazy" type to enable batching
without using asyncio. The tests illustrate how this can be used to enable
a dataloader pattern.
N.B. The DeferredValue object is very similar in functionality to the
Promise library from graphql-core v2. I decided to reimplement a
subset rather than use it directly though because it's scope is bigger
than what I needed. It's a purely internal implementation detail though
and can be replaced in future.
Related issue: #148