client.get_recipes() only returns 10 results instead of all results #180

joeld1 · 2024-05-31T00:33:14Z

I have over 10 recipes on MyFitnessPal but calling client.get_recipes() only returns the first 10 recipes; I was wondering if there's a way to paginate and get all recipes instead of just the first 10 ?

Thank you!

joeld1 · 2024-06-01T10:38:44Z

Pagination no longer functions for the recipe_parser (i.e. https://www.myfitnesspal.com/recipe_parser?page=1&sort_order=recent) path since there isn't a page indicator.

I've edited get_recipes in the client.py files to try to see if there's data in the next page when determining whether to paginate or not.

Here is the refactored method:

    def get_recipes_from_page(self, page_count, recipes_dict):
        RECIPES_PATH = f"recipe_parser?page={page_count}&sort_order=recent"
        recipes_url = parse.urljoin(self.BASE_URL_SECURE, RECIPES_PATH)
        document = self._get_document_for_url(recipes_url)
        recipes = document.xpath(
            "//*[@id='main']/ul[1]/li"
        )  # get all items in the recipe list
        for recipe_info in recipes:
            recipe_path = recipe_info.xpath("./div[2]/h2/span[1]/a")[0].attrib[
                "href"
            ]
            recipe_id = recipe_path.split("/")[-1]
            recipe_title = recipe_info.xpath("./div[2]/h2/span[1]/a")[0].attrib[
                "title"
            ]
            recipes_dict[recipe_id] = recipe_title
        return document


    def get_recipes(self) -> Dict[int, str]:
        """Returns a dictionary with all saved recipes.

        Recipe ID will be used as dictionary key, recipe title as dictionary value.
        """
        recipes_dict = {}

        page_count = 1
        has_next_page = True
        while has_next_page:
            document = self.get_recipes_from_page(page_count, recipes_dict)

            # Check for Pagination
            pagination_links = document.xpath('//*[@id="main"]/ul[2]/a')
            if pagination_links:
                if page_count == 1:
                    # If Pagination exists and it is page 1 there have to be a second,
                    # but only one href to the next (obviously none to the previous)
                    page_count += 1
                elif len(pagination_links) > 1:
                    # If there are two links, ont to the previous and one to the next
                    page_count += 1
                else:
                    # Only one link means it is the last page
                    has_next_page = False
            else:
                tmp_dict = {}
                # Check and see if there's another page, if we can't determine if pagination exists
                document = self.get_recipes_from_page(page_count+1, tmp_dict)
                if tmp_dict:
                    # Increment page_count in order to get the next page
                    page_count += 1
                else:
                    # Indicator for no recipes if len(recipes_dict) is 0 here
                    has_next_page = False
        return recipes_dict

hannahburkhardt · 2024-08-21T02:08:53Z

@joeld1 amazing! Would you mind putting in a PR with this change?

joeld1 · 2024-08-22T10:41:08Z

No problem!

joeld1 · 2024-09-06T22:36:07Z

Hello @hannahburkhardt , I just uploaded my pull request containing the refactored method

#185

joeld1 · 2024-09-08T23:01:44Z

Hello @hannahburkhardt ,

I managed to find another edge case so I updated the code to be able to handle that one.

The edge case is as follows:

pagination_links does exists, 28 recipes exist, and only 2 pagination links (i.e. page 1, page 2) are present
-- only 20 recipes are returned because we only discovered 2 pagination links (i.e. page 1, page 2)

My latest push should now be able to handle this edge case in the event that all pagination links aren't shown

joeld1 mentioned this issue Sep 6, 2024

Updated get_recipes so that we can return more than 10 recipes when the page indicator is not present #185

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

client.get_recipes() only returns 10 results instead of all results #180

client.get_recipes() only returns 10 results instead of all results #180

joeld1 commented May 31, 2024

joeld1 commented Jun 1, 2024

hannahburkhardt commented Aug 21, 2024

joeld1 commented Aug 22, 2024

joeld1 commented Sep 6, 2024

joeld1 commented Sep 8, 2024

client.get_recipes() only returns 10 results instead of all results #180

client.get_recipes() only returns 10 results instead of all results #180

Comments

joeld1 commented May 31, 2024

joeld1 commented Jun 1, 2024

hannahburkhardt commented Aug 21, 2024

joeld1 commented Aug 22, 2024

joeld1 commented Sep 6, 2024

joeld1 commented Sep 8, 2024