Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client.get_recipes() only returns 10 results instead of all results #180

Open
joeld1 opened this issue May 31, 2024 · 5 comments
Open

client.get_recipes() only returns 10 results instead of all results #180

joeld1 opened this issue May 31, 2024 · 5 comments

Comments

@joeld1
Copy link

joeld1 commented May 31, 2024

I have over 10 recipes on MyFitnessPal but calling client.get_recipes() only returns the first 10 recipes; I was wondering if there's a way to paginate and get all recipes instead of just the first 10 ?

Thank you!

@joeld1
Copy link
Author

joeld1 commented Jun 1, 2024

Pagination no longer functions for the recipe_parser (i.e. https://www.myfitnesspal.com/recipe_parser?page=1&sort_order=recent) path since there isn't a page indicator.

I've edited get_recipes in the client.py files to try to see if there's data in the next page when determining whether to paginate or not.

Here is the refactored method:

    def get_recipes_from_page(self, page_count, recipes_dict):
        RECIPES_PATH = f"recipe_parser?page={page_count}&sort_order=recent"
        recipes_url = parse.urljoin(self.BASE_URL_SECURE, RECIPES_PATH)
        document = self._get_document_for_url(recipes_url)
        recipes = document.xpath(
            "//*[@id='main']/ul[1]/li"
        )  # get all items in the recipe list
        for recipe_info in recipes:
            recipe_path = recipe_info.xpath("./div[2]/h2/span[1]/a")[0].attrib[
                "href"
            ]
            recipe_id = recipe_path.split("/")[-1]
            recipe_title = recipe_info.xpath("./div[2]/h2/span[1]/a")[0].attrib[
                "title"
            ]
            recipes_dict[recipe_id] = recipe_title
        return document


    def get_recipes(self) -> Dict[int, str]:
        """Returns a dictionary with all saved recipes.

        Recipe ID will be used as dictionary key, recipe title as dictionary value.
        """
        recipes_dict = {}

        page_count = 1
        has_next_page = True
        while has_next_page:
            document = self.get_recipes_from_page(page_count, recipes_dict)

            # Check for Pagination
            pagination_links = document.xpath('//*[@id="main"]/ul[2]/a')
            if pagination_links:
                if page_count == 1:
                    # If Pagination exists and it is page 1 there have to be a second,
                    # but only one href to the next (obviously none to the previous)
                    page_count += 1
                elif len(pagination_links) > 1:
                    # If there are two links, ont to the previous and one to the next
                    page_count += 1
                else:
                    # Only one link means it is the last page
                    has_next_page = False
            else:
                tmp_dict = {}
                # Check and see if there's another page, if we can't determine if pagination exists
                document = self.get_recipes_from_page(page_count+1, tmp_dict)
                if tmp_dict:
                    # Increment page_count in order to get the next page
                    page_count += 1
                else:
                    # Indicator for no recipes if len(recipes_dict) is 0 here
                    has_next_page = False
        return recipes_dict

@hannahburkhardt
Copy link
Collaborator

@joeld1 amazing! Would you mind putting in a PR with this change?

@joeld1
Copy link
Author

joeld1 commented Aug 22, 2024

No problem!

@joeld1
Copy link
Author

joeld1 commented Sep 6, 2024

Hello @hannahburkhardt , I just uploaded my pull request containing the refactored method

#185

@joeld1
Copy link
Author

joeld1 commented Sep 8, 2024

Hello @hannahburkhardt ,

I managed to find another edge case so I updated the code to be able to handle that one.

The edge case is as follows:

  • pagination_links does exists, 28 recipes exist, and only 2 pagination links (i.e. page 1, page 2) are present
    -- only 20 recipes are returned because we only discovered 2 pagination links (i.e. page 1, page 2)

My latest push should now be able to handle this edge case in the event that all pagination links aren't shown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants