Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolving MissingGreenlet error when using async drivers. #419

Open
rcb4by opened this issue Dec 29, 2024 · 0 comments
Open

Resolving MissingGreenlet error when using async drivers. #419

rcb4by opened this issue Dec 29, 2024 · 0 comments

Comments

@rcb4by
Copy link

rcb4by commented Dec 29, 2024

I'm not sure if other people have run into this problem - I searched for days and couldn't find a clear answer. But eventually I figured out a solution, so wanted to document it here in case other people do run into it.

Background

Our setup uses a Postgres database which provides data for a GraphQL API, for which we initially used Flask, following the example. However, our schema uses a lot of 1:N relationships, so our performance was degraded due to the N+1 Round Trip Problem. The fix for this is, supposedly, very simple: just set batching=True in the meta class. However, doing this triggers the use of dataloaders, which operate asynchronously. So we switched from Flask to using the aiohttp web framework and asyncpg database driver, which are better suited for asynchronous tasks. This is when we first stumbled on the MissingGreenlet error: sqlalchemy.exc.MissingGreenlet: greenlet_spawn has not been called; can't call await_only() here. Was IO attempted in an unexpected place?

Issue

The problem here comes from the RelationshipLoader.batch_load_fn() (link). As the function comment explains, that function has to "piggyback on some internal APIs" of sqlalchemy. As a result (I believe), the necessary greenlet_spawn function never gets called, and the above error is thrown. (Greenlets, AFAICT, are small thread-like objects that allow synchronous requests to be performed in an asynchronous environment. In this situation, the dataloader is what actually materializes the expected attributes, so it is necessary to make these calls synchronously)

Solution

So on to the fix: we just need to call greenlet_spawn ourselves. This can be done by simply redefining the batch_load_fn attribute on the RelationshipLoader class, so that all invocations call the corrected function rather than the incorrect code. My stripped down code is below (I removed some extraneous comments/checks/conditions from batch_load_fn for simplicitly, make sure that you compare to the original code and update the correct path for your situation):

from graphene_sqlalchemy.batching import RelationshipLoader
from sqlalchemy.orm import Session
from sqlalchemy.util import immutabledict
from sqlalchemy.util._concurrency_py3k import greenlet_spawn


def patch_relationship_loader() -> None:
    """
CALL THIS FUNCTION ONCE AS PART OF SETUP CODE
    """
    RelationshipLoader.batch_load_fn = batch_load_fn


async def batch_load_fn(self: RelationshipLoader, parents: Any) -> list[Any]:
    """
FOLLOWS THE `SQL_VERSION_HIGHER_EQUAL_THAN_2` PATH
    """
    child_mapper = self.relationship_prop.mapper
    parent_mapper = self.relationship_prop.parent
    session = Session.object_session(parents[0])

    for parent in parents:
        assert session is Session.object_session(parent)
        assert session and parent not in session.dirty

    states = [(sqlalchemy.inspect(parent), True) for parent in parents]

    query_context = None
    if session:
        parent_mapper_query = session.query(parent_mapper.entity)
        query_context = parent_mapper_query._compile_context()
# CALL greenlet_spawn HERE RATHER THAN CALLING _load_for_path DIRECTLY
        await greenlet_spawn(
            self.selectin_loader._load_for_path,
            query_context,
            parent_mapper._path_registry,
            states,
            None,
            child_mapper,
            None,
            None,  # recursion depth can be none
            immutabledict(),  # default value for selectinload->lazyload
        )
    result = [
        getattr(parent, self.relationship_prop.key) for parent in parents
    ]
    return result

Conclusion

I don't have enough knowledge of graphene-sqlalchemy or other database drivers to know what a universal solution would look like (or even if this is a universal problem - the dearth of information about this issue suggests not), so I don't want to submit a PR for this change, but at least in our situation this was the best fix. Hope it helps someone else out there!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant