Extract nutritional data from scraped websites #375

mblennegard · 2024-06-24T15:41:17Z

Is your feature request related to a problem? Please describe.
Many websites today already have nutritional data present as part of the recipe.
Instead of trying to calculate this using generic ingredients within Recipya it would be better to extract this information directly from the recipe as is.

Describe the solution you'd like
If nutritional information is part of the recipe, try to extract it.
If not available, then use the current way of calculating the nutritional data.

For websites requiring custom scrapers this will of course be on a per website basis, but as nutritional information is part of the LD+JSON schema it should be possible to solve this for a big number of websites automatically by adding the nutritional extraction to the LD+JSON part of the scraper.
Additionally, this would also solve issues where the automatic nutritional calculation fails due to the recipe being in a language different than English.

reaper47 · 2024-06-24T15:48:30Z

The nutritional information is currently extracted from the LD+JSON when available: https://github.com/reaper47/recipya/blob/main/internal/models/schema-recipe.go#L36. If it not available, then this function will execute in the background: https://github.com/reaper47/recipya/blob/main/internal/services/sqlite_service.go#L628.

Which website did you fetch that calculated the nutrition instead of extracting it?

mblennegard · 2024-06-24T16:02:42Z

@reaper47 I was a bit too quick but I just noticed myself when browsing the scraper.go file that the nutrition was already part of the scraper...but you were even quicker to respond here. 😋

I tried with the following recipe:
https://tyngre.se/recept/kladdkakor/kladdkaka-med-hasselnoetter-och-brynt-smoer

It seems to be following the LD+JSON schema regarding the nutrition as well, at least as far as I can tell.

{
    "@context": "http://schema.org",
    "@type": "Recipe",
    "name": "Kladdkaka med hasselnötter och brynt smör",
    "image": "https://cdn.sanity.io/images/fbgp6g6y/production/69bbb647d50e5d715f9e85dc38ed1f94d07b3401-3024x4032.jpg",
    "author": {
        "@type": "Person",
        "name": "Skippa Sockret"
    },
    "description": "Kladdkaka med hasselnötter och brynt smör",
    "totalTime": "25 min ",
    "keywords": "bakmix kladdkaka, fika, dessert",
    "recipeCategory": "Kladdkakor",
    "recipeIngredient": [
        "4.25 dl bakmix kladdkaka ",
        "2 dl valfri mjölk",
        "2 msk olja",
        "50 g smör ",
        "1 dl rostade hasselnötter (eller efter smak)"
    ],
    "recipeInstructions": [
        {
            "@type": "HowToStep",
            "text": "Sätt ugnen på 150 grader. "
        },
        {
            "@type": "HowToStep",
            "text": "Mät upp kladdkakemixen och blanda ihop med mjölk och olja med hjälp av en slickepott. "
        },
        {
            "@type": "HowToStep",
            "text": "Bryn smöret i en kastrull tills du får en nötig karaktär. "
        },
        {
            "@type": "HowToStep",
            "text": "Grovhacka hasselnötterna. "
        },
        {
            "@type": "HowToStep",
            "text": "Tillsätt nu de brynta smöret och hasselnötterna i smeten, blanda runt. "
        },
        {
            "@type": "HowToStep",
            "text": "Smöra eller olja en rund springform och täck med lite kokos eller ströbröd alt. använd ett bakplåtspapper. Häll i smeten och grädda i ugnen cirka 15 minuter. "
        },
        {
            "@type": "HowToStep",
            "text": "Ta ut och låt svalna, låt gärna kladdkakan stå i kylen ett par timmar för godast resultat. Servera sedan med en riktigt god vaniljglass eller en klick grädde. "
        }
    ],
    "nutrition": {
        "@type": "NutritionInformation",
        "servingSize": 8,
        "calories": 1109,
        "fatContent": 90,
        "carbohydrateContent": 77,
        "proteinContent": 42
    }
}

reaper47 · 2024-06-24T16:05:54Z

Something is off because the nutrition is indeed there. I'll check it out.

mblennegard · 2024-06-25T18:34:43Z

@reaper47 I debugged this issue and the root cause is that this particular website stores only the numeric values for the nutritional information, whereas the scraper expects string only values in the UnmarshalJSON for the NutritionSchema (

recipya/internal/models/schema-recipe.go

Line 876 in c0e825a

if val, ok := x["carbohydrateContent"].(string); ok {

).
The mapping of the nutrition fields is essentially skipped.

I tested this (rather crudely) for one of the properties with the below change, assuming that the nutrition function inside Recipya expects string values. This change then populated the property correctly in the final imported recipe.

if val, ok := x["carbohydrateContent"].(float64); ok {
	n.Carbohydrates = strconv.FormatFloat(val, 'f', -1, 64)
}

Perhaps the UnmarshalJSON function for the NutritionSchema could check and account for if the source data is string, float or integer and convert the values accordingly, to accomodate different implementations of the LD+JSON schema?

reaper47 · 2024-06-26T05:17:14Z

Excellent, thank you for looking into it! That is exactly it. We shall add a test in https://github.com/reaper47/recipya/blob/main/internal%2Fmodels%2Fschema-recipe_test.go#L303 and modify the UnmarshalJSON function you linked to cover nutrition fields that use numerical values.

mblennegard · 2024-06-26T15:16:11Z

@reaper47 I have it handling both strings and number values on my end now, but then we have the interesting thing regarding that when we only have numbers we are also missing the unit type, e.g. grams, milligrams etc.

As far as I know, nutritional information is always in metric, even for american recipe sites. Have you seen anything else during your investigations?

If they are indeed always in metric then we can add static units for each property, e.g. calories in kcal, fat, sugar and protein in grams, sodium in milligrams etc., which we use in case of the nutritional information has number values.

Edit: At least the recipe schema specifies metric units, so I think I can assume metric if setting static units for each property. Do you agree?

reaper47 · 2024-06-26T17:14:21Z

Yes, nutrition is always in the metric system. I have yet to see a product in a grocery store in the US whose nutrition facts is not metric. We can safely assume the units you mentioned when not specified.

mblennegard · 2024-06-28T05:41:52Z

Implemented in pull request #382

reaper47 · 2024-07-04T20:35:01Z

Pull request 382 has been merged! Closing this issue.

mblennegard added the enhancement New feature or request label Jun 24, 2024

reaper47 added bug Something isn't working go Pull requests that update Go code and removed enhancement New feature or request labels Jun 24, 2024

reaper47 added this to Recipya Jun 24, 2024

reaper47 added this to the v1.2.0 milestone Jun 24, 2024

reaper47 moved this to Backlog in Recipya Jun 24, 2024

reaper47 closed this as completed Jul 4, 2024

github-project-automation bot moved this from Backlog to Done in Recipya Jul 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract nutritional data from scraped websites #375

Extract nutritional data from scraped websites #375

mblennegard commented Jun 24, 2024 •

edited

Loading

reaper47 commented Jun 24, 2024 •

edited

Loading

mblennegard commented Jun 24, 2024

reaper47 commented Jun 24, 2024

mblennegard commented Jun 25, 2024 •

edited

Loading

reaper47 commented Jun 26, 2024

mblennegard commented Jun 26, 2024 •

edited

Loading

reaper47 commented Jun 26, 2024

mblennegard commented Jun 28, 2024

reaper47 commented Jul 4, 2024

Extract nutritional data from scraped websites #375

Extract nutritional data from scraped websites #375

Comments

mblennegard commented Jun 24, 2024 • edited Loading

reaper47 commented Jun 24, 2024 • edited Loading

mblennegard commented Jun 24, 2024

reaper47 commented Jun 24, 2024

mblennegard commented Jun 25, 2024 • edited Loading

reaper47 commented Jun 26, 2024

mblennegard commented Jun 26, 2024 • edited Loading

reaper47 commented Jun 26, 2024

mblennegard commented Jun 28, 2024

reaper47 commented Jul 4, 2024

mblennegard commented Jun 24, 2024 •

edited

Loading

reaper47 commented Jun 24, 2024 •

edited

Loading

mblennegard commented Jun 25, 2024 •

edited

Loading

mblennegard commented Jun 26, 2024 •

edited

Loading