Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paginate without requiring length #16

Open
sebastiandev opened this issue Nov 4, 2018 · 9 comments
Open

Paginate without requiring length #16

sebastiandev opened this issue Nov 4, 2018 · 9 comments

Comments

@sebastiandev
Copy link

sebastiandev commented Nov 4, 2018

The current implementation requires you to pass the total length of the iterable, and this usually means making an extra query to find out the count. There's no need for this, since we can actually accomplish the same results by trying to fetch n+1 results, which will mean that there are more to be consumed, without the need for the real total count.

This is the custom implementation we have applied to spare us the count query, which can be costly and adds an unnecessary overhead to every query. We only need the real total count, when that value is requested (using a custom CountableConnection)

class CountableConnection(graphene.relay.Connection):
    class Meta:
        abstract = True

    total_count = graphene.Int()

    @staticmethod
    def resolve_total_count(root, info, *args, **kwargs):
        try:
            t_count = root.iterable.count()
        except:
            t_count = len(root.iterable)
        
        return t_count

def connection_from_list_slice(list_slice, args=None, connection_type=None,
                               edge_type=None, pageinfo_type=None):
    '''
    Given an iterator it consumes the needed amount based on the pagination
    params, and also tries to figure out if there are more results to be
    consumed. We do so by trying to fetch one more element than the specified
    amount, if we were able to fetch n+1 it means there are more to be consumed.

    This spares the caller passing the total count of results, which
    usually means making an extra query just to find out that number.
    '''
    from graphql_relay.utils import base64, unbase64, is_str
    from graphql_relay.connection.connectiontypes import Connection, PageInfo, Edge

    connection_type = connection_type or Connection
    edge_type = edge_type or Edge
    pageinfo_type = pageinfo_type or PageInfo

    args = args or {}

    before = args.get('before')
    after = args.get('after')
    first = args.get('first')
    last = args.get('last')

    if first:
        after = get_offset_with_default(after, -1) + 1
        _slice = list_slice[after:  max(after, 0) + first + 1]  # fetch n+1

        items = _slice[:-1]
        if len(items) < first:
            items = _slice[:]  # If there are not enough, get them all

        edges = [
            edge_type(
                node=node,
                cursor=offset_to_cursor(after + i)
            )
            for i, node in enumerate(items)
        ]

    elif last:
        if before:
            before = get_offset_with_default(before)
            _slice = list_slice[max(before-last-1, 0):before]  # fetch n+1

        else:
            # beware that not specifying before results in the need
            # to calculate the total amount
            _slice = list_slice[(last*-1)-1:]

            try:  
                before = list_slice.count()
            except:
                before = len(list_slice)

        items = _slice[1:]
        if len(items) < last:
            items = _slice[:]  # If there are not enough, get them all

        edges = [
            edge_type(
                node=node,
                cursor=offset_to_cursor(before - last -1 + i)
            )
            for i, node in enumerate(items)
        ]

    else:  # we are not allowing to pass after/before without first/last
        items = list_slice[:]
        edges = [
            edge_type(
                node=node,
                cursor=offset_to_cursor(i)
            )
            for i, node in enumerate(items)
        ]

    first_edge_cursor = edges[0].cursor if edges else None
    last_edge_cursor = edges[-1].cursor if edges else None
    
    has_previous_page = False
    if (isinstance(last, int) and len(_slice) > last) or after > 0:
        has_previous_page = True

    return connection_type(
        edges=edges,
        page_info=pageinfo_type(
            start_cursor=first_edge_cursor,
            end_cursor=last_edge_cursor,
            has_previous_page=has_previous_page,
            has_next_page=len(_slice) > first if isinstance(first, int) else False
        )
    )

@syrusakbary Not sure if we want this as the default or at least have this as an optional stragey.
If this sounds reasonable I can make a PR for this.

@sebastiandev
Copy link
Author

@syrusakbary @stefanoulis This is our updated version that fixes some bugs

def connection_from_list_slice(list_slice, args=None, connection_type=None,
                               edge_type=None, pageinfo_type=None):
    '''
    Given an iterator it consumes the needed amount based on the pagination
    params, and also tries to figure out if there are more results to be
    consumed. We do so by trying to fetch one more element than the specified
    amount, if we were able to fetch n+1 it means there are more to be consumed.

    This spares the caller passing the total count of results, which
    usually means making an extra query just to find out that number.
    '''
    from graphql_relay.utils import base64, unbase64, is_str
    from graphql_relay.connection.connectiontypes import Connection, PageInfo, Edge

    connection_type = connection_type or Connection
    edge_type = edge_type or Edge
    pageinfo_type = pageinfo_type or PageInfo

    args = args or {}

    before = args.get('before')
    after = args.get('after')
    first = args.get('first')
    last = args.get('last')

    first_is_negative = type(first) == int and first < 0
    last_is_negative = type(last) == int and last < 0

    # Validations
    if first_is_negative or last_is_negative:
        raise ValidationException("first and last can't be negative values")

    if first and last:
        raise ValidationException(
            "Including a value for both first and last is strongly discouraged,"
            " as it is likely to lead to confusing queries and results"
        )

    if before and last is None:
        raise ValidationException(
            "before without last is not supported"
        )

    if after and first is None:
        raise ValidationException(
            "after without first is not supported"
        )

    if before and after:
        raise ValidationException(
            "Mixing before and after is not supported"
        )

    if type(first) == int:
        after_offset = get_offset_with_default(after, -1)
        start_offset = after_offset + 1
        end_offset = start_offset + first

    if type(last) == int:
        if before:
            before_offset = get_offset_with_default(before)
        else:
            try:
                before_offset = int(list_slice.count())
            except (TypeError, AttributeError):
                before_offset = len(list_slice)
        end_offset = before_offset
        start_offset = before_offset - last if before_offset - last > 0 else 0

    _slice = list_slice[start_offset:end_offset]

    edges = [
        edge_type(
            node=node,
            cursor=offset_to_cursor(start_offset + index)
        )
        for index, node in enumerate(_slice)
    ]

    # To know if there are more pages we just check if there is n+1 element
    # In this case it would just be the end_offset element because array indexes
    # start from zero
    try:
        list_slice[end_offset]
        has_next_page = True
    except IndexError:
        has_next_page = False

    has_previous_page = start_offset > 0
    first_edge_cursor = edges[0].cursor if edges else None
    last_edge_cursor = edges[-1].cursor if edges else None

    return connection_type(
        edges=edges,
        page_info=pageinfo_type(
            start_cursor=first_edge_cursor,
            end_cursor=last_edge_cursor,
            has_previous_page=has_previous_page,
            has_next_page=has_next_page
        )
    )

@sliverc
Copy link

sliverc commented Jul 8, 2019

@sebastiandev
Thanks for the snippets. Just quickly tested and it seems that there is an error when no pagination arguments are passed on, otherwise it seems to work well.

All in all It would be great if you could follow up with a PR - even if it is not merged quickly it would make it much easier for others to integrate a custom version into their code and comment to improve it.

@Cito
Copy link
Member

Cito commented Jul 8, 2019

@sebastiandev @sliverc Keep in mind that the idea of this project is to copy the functionality from graphql-relay-js over to Python. I think it is wise to stick to that idea and just replicate what is there.

First, because it is easier to switch from one language to the other if you need. Second, because it is easier to follow the updates in the JS land. If we deviate too much, it is hard to get the latest changes back into the Python version.

However, I understand sometimes it may be desirable to improve the existing functionality or add stuff that may make more sense in Python. My first attempt would be to try to get improvements into the JS library. If that's not possible we should include them in a way that is easy to maintain as an add-on. E.g. by not changing the function we inherited from JS, but adding another function, with separate tests.

I'm currently preparing v2 and v3 of this library. The goal of v2 is to achieve compatibility with GraphQL-Core v2 and not do much more, it should stay compatible and bot break things. The goal of v3 is to achieve compatibility with GraphQL-Core v3 = GraphQL-Core-next, and also to bring it up to date with graphql-relay-py. We can introduce breaking changes in this version.

Currently, I have v3 more or less finished, but the best Python version of this function is also causing me headaches. Maybe we can discuss how we should proceed here.

Note that v3 also introduces type hints, and I'm wondering what the best equivalent type for an array in JS would be in Python. I think list may be too restrictive. Maybe a typingCollection or a typing.Sequence? Also should we rename the term "list" in the functions and parameters? It looks too restrictive to me. You could just as well pass a tuple or a SQLAlchemy query or numpy array or anything. Maybe just rename it back to "array" like in the JS version?

@sebastiandev
Copy link
Author

@sliverc created the PR here

@ironyee
Copy link

ironyee commented Jul 6, 2020

@sliverc created the PR here

@sliverc Can you check his PR and try to merge? Your code makes QuerySet into list and it will cause problems.

@sebastiandev
Copy link
Author

for the record @ironyee @sliverc added a fix to avoid an extra query when checking the n+1

@ayyoubelamrani4
Copy link

ayyoubelamrani4 commented Dec 30, 2022

@sebastiandev

Is this still an open issue? Are there new ways to avoid querying all data when paginating with relay.Connection ?

@ayyoubelamrani4
Copy link

@Cito do you know if it's something that will be fixed in the future?

@Cito
Copy link
Member

Cito commented Jan 8, 2023

@ayyoubelamrani4 Again (see comment above), the scope of this project is to be a replicate of graphql-relay-js in Python. So as long as we faithfully replicate that functionality, nothing is "broken" on this side that needs to be "fixed".

If you are missing features that would be really useful for everyone, then the preferred way to get them into graphql-relay-py is to request them on the issue tracker of graphql-relay-js. This has many advantages:

  • There is a larger developer community that could give feedback for the best implementation and API
  • If it is implemented there in the reference implementation, both communities can benefit from the feature, not only the Python developers, maybe it even gets ported to other languages
  • We make sure that relay-py stays compatible with relay-js when we don't implement something that later relay-js may solve in a different way

That said, I will release a new version of relay-py after core 3.3 has been released, and will then also update relay-py and consider this and some other open issues. Still, if you want to push this or other new features, the most effective way forward is to lobby for it on the relay-js issue tracker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants