Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StrawberryIntegration slows down query performance dramatically #3670

Open
pcraciunoiu opened this issue Oct 18, 2024 · 19 comments
Open

StrawberryIntegration slows down query performance dramatically #3670

pcraciunoiu opened this issue Oct 18, 2024 · 19 comments

Comments

@pcraciunoiu
Copy link

How do you use Sentry?

Sentry Saas (sentry.io)

Version

2.17.0

Steps to Reproduce

  1. Set up a Django project with Strawberry async.
  2. Set up a query for an endpoint that returns many objects. I tested with 2,000 objects and only 2 fields for each - an ID and a string, e.g.
@strawberry_django.type(models.Item, fields=["id", "value"], pagination=True)
class Item:
   pass


@strawberry.type
class Query:
    items: list[types.Item] = strawberry_django.field()
query Items {
  items(pagination: { limit: 2000 }) {
    id
    value
  }
}

Expected Result

Not much performance difference with or without StrawberryIntegration

Actual Result

With StrawberryIntegration, this took 3-4x longer. In my case, about 800ms. Without it, I got the results back in 200-250ms.

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Oct 18, 2024
@sentrivana
Copy link
Contributor

Hey @pcraciunoiu, thanks for reporting.

I'm thinking this might be due to the SDK in general having a lot of room for improvement in the async department. Might be the transport blocking. (There are plans for a fully async transport/SDK, but this will take some time.)

Couple things to check:

  • If you don't configure the DSN (i.e., make the SDK not actually send anything out), do you still observe this? If not, my guess is that it's the transport.
  • Do you by any chance have profiling on and can see where we spend the extra time?

@pcraciunoiu
Copy link
Author

Hey @pcraciunoiu, thanks for reporting.

I'm thinking this might be due to the SDK in general having a lot of room for improvement in the async department. Might be the transport blocking. (There are plans for a fully async transport/SDK, but this will take some time.)

Couple things to check:

* If you don't configure the DSN (i.e., make the SDK not actually send anything out), do you still observe this? If not, my guess is that it's the transport.

It doesn't make a difference if the dsn value is passed in or not. Only the StrawberryIntegration makes a difference, and I have to set auto_enabling_integrations=False to ensure it doesn't get added automatically.

* Do you by any chance have profiling on and can see where we spend the extra time?

I did try with pyinstrument, but nothing stood out. I had to trial and error for a few hours until I figured out it's sentry, and then which integration.

I'll attach it in case it helps. If you have recommendations for other profiling tools that might do better, I'd like to learn about them.

1.15s - middleware.py_54 - pyinstrument.zip

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Oct 18, 2024
@szokeasaurusrex
Copy link
Member

@pcraciunoiu Interesting, I also did not see anything in the profile you attached, although I am not really sure how pyinstrument works (sampling or deterministic).

I think @sentrivana was suggesting that you could try using Sentry's built-in profiler. You can enable the profiler by setting profiles_sample_rate to a value above zero when initializing the Sentry SDK. You also need to set traces_sample_rate to a value above zero. The traces_sample_rate controls the fraction of Sentry transactions which are sent to Sentry, and the profiles_sample_rate controls the fraction of transactions which are profiled. Here is an example of how to set these.

Then, you can view the generated profiles directly in Sentry linked to the corresponding transaction.

@pcraciunoiu
Copy link
Author

@szokeasaurusrex for the profiling I sent, I do not have sentry set up yet as it's a new project.

I do have another project that was running the strawberry integration until late last week, when I found this issue and disabled it. I confirmed on that project too, that it was adding on 150ms or more to GraphQL requests as I tested locally. I have some profiling from last week before I disabled this integration, but it's a private repository and I'm not comfortable including links or details here. If you want to, I can share a sentry link privately via email or otherwise, and/or you can let me know what you need to look into this further.

@getsantry getsantry bot moved this from Waiting for: Community to Waiting for: Product Owner in GitHub Issues with 👀 3 Oct 21, 2024
@sentrivana
Copy link
Contributor

@pcraciunoiu Happy to look at the Sentry link -- you can send it to ivana.kellyer (at) sentry.io.

I will try to repro this today. Also want to compare the performance with a sync Django/Strawberry app. (My money is still on this being a general problem with the SDK blocking.) Will report back.

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Oct 31, 2024
@sentrivana
Copy link
Contributor

The profile in Sentry looks thoroughly uneventful, but the pyinstrument profile you attached is interesting. Seems like there's a lot of time spent waiting on threading locks/conditions. Don't know if that's what's causing the slowness but we can verify that.

Would you be able to get us a pyinstrument profile like the one above for the query:

  1. without Sentry
  2. with Sentry and Strawberry integration enabled sync
  3. with Sentry and Strawberry integration enabled async
  4. optionally also with Sentry but Strawberry integration disabled (just with disabled_integrations=[StrawberryIntegration()] (see here) so that you still get all the other integrations that would be enabled in options 2 and 3)

We could then see what stands out.

Feel free to attach here or send them to me via email.

@pcraciunoiu
Copy link
Author

@sentrivana I finally got around to this and emailed you those files. It is definitely slowest with the strawberry integration async.

@sentrivana
Copy link
Contributor

Thanks @pcraciunoiu, got the files and will take a look in the next couple of days.

@sentrivana
Copy link
Contributor

@pcraciunoiu So I dug a bit deeper into this. Long story short, we're using a custom Strawberry extension in our integration. The only difference between the sync and async version of the extension is the resolve method. Thing is, the performance difference is still there even when you strip all Sentry code from both versions of the method and only leave the most basic behavior.

To verify, I made a sample Django app with strawberry-django without Sentry and with this in my schema.py:

import strawberry
import strawberry_django
from strawberry_django.optimizer import DjangoOptimizerExtension
from strawberry.extensions import SchemaExtension

from inspect import isawaitable
from .types import Fruit


class MyExtension(SchemaExtension):
    # # Uncomment this and comment out the async version to switch
    # def resolve(self, _next, root, info, *args, **kwargs):
    #     return _next(root, info, *args, **kwargs)

    async def resolve(self, _next, root, info, *args, **kwargs):
        result = _next(root, info, *args, **kwargs)
        if isawaitable(result):
            result = await result
        return result


@strawberry.type
class Query:
    fruits: list[Fruit] = strawberry_django.field()

schema = strawberry.Schema(
    query=Query,
    extensions=[
        DjangoOptimizerExtension,  # not required, but highly recommended
        MyExtension,
    ],
)

I get the same performance difference using the sync vs. async version of resolve. So it seems this just comes with how extensions work.

Now the question for me would be if this is expected or if we're doing something wrong. Maybe @patrick91 could weigh in on this. TL;DR: The async version of the extension in the example above considerably slows down the endpoint, is this expected? The sync version adds almost no overhead in comparison. The app is just the example app from https://github.com/strawberry-graphql/strawberry-django?tab=readme-ov-file#basic-usage, using AsyncGraphQLView.

@patrick91
Copy link
Contributor

If I recall correct is awaitable is a slow check, also we are calling resolver for all fields, also the ones that might not need it

@sentrivana maybe you can add this check for the basic fields? https://github.com/strawberry-graphql/strawberry/blob/main/strawberry/extensions/tracing/utils.py#L12-L20

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Dec 9, 2024
@sentrivana
Copy link
Contributor

@patrick91 Thanks for getting back to me. So do I understand this right, adding should_skip_tracing should allow us to skip resolving stuff that doesn't need resolving? If should_skip_tracing is True, can we just skip either the _next (here) or the isawaitable check (here) completely?

Or did you mean something different by adding it for the basic fields? I assume you mean in the app?

@patrick91
Copy link
Contributor

@patrick91 you're correct with that you can skip the tracing entirely:

https://github.com/strawberry-graphql/strawberry/blob/6553c9e3c67249f538cd3af934e78322a24963a7/strawberry/extensions/tracing/opentelemetry.py#L157-L171

I think on our side we could find ways to skip the isawaitable, maybe by adding a property on the resolver, but I don't know yet if that's easy to do 😊

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Dec 10, 2024
@sentrivana
Copy link
Contributor

I'll keep this open for us to investigate whether we can get rid of the isawaitable in the extension.

@patrick91
Copy link
Contributor

patrick91 commented Dec 12, 2024

Just as an update, I'm trying to find a way to make the check faster, but my current attempt is surprisingly slower, see: strawberry-graphql/strawberry#3724

I've also done some benchmarks on the isawaitable function and it doesn't seem like it has this big of a performance hit, so it might be a combination of isawaitable + await result, but I'm not sure yet

EDIT: maybe PR had an error which I only just noticed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

6 participants