-
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract nutritional data from scraped websites #375
Comments
The nutritional information is currently extracted from the LD+JSON when available: https://github.com/reaper47/recipya/blob/main/internal/models/schema-recipe.go#L36. If it not available, then this function will execute in the background: https://github.com/reaper47/recipya/blob/main/internal/services/sqlite_service.go#L628. Which website did you fetch that calculated the nutrition instead of extracting it? |
@reaper47 I was a bit too quick but I just noticed myself when browsing the scraper.go file that the nutrition was already part of the scraper...but you were even quicker to respond here. 😋 I tried with the following recipe: It seems to be following the LD+JSON schema regarding the nutrition as well, at least as far as I can tell. {
"@context": "http://schema.org",
"@type": "Recipe",
"name": "Kladdkaka med hasselnötter och brynt smör",
"image": "https://cdn.sanity.io/images/fbgp6g6y/production/69bbb647d50e5d715f9e85dc38ed1f94d07b3401-3024x4032.jpg",
"author": {
"@type": "Person",
"name": "Skippa Sockret"
},
"description": "Kladdkaka med hasselnötter och brynt smör",
"totalTime": "25 min ",
"keywords": "bakmix kladdkaka, fika, dessert",
"recipeCategory": "Kladdkakor",
"recipeIngredient": [
"4.25 dl bakmix kladdkaka ",
"2 dl valfri mjölk",
"2 msk olja",
"50 g smör ",
"1 dl rostade hasselnötter (eller efter smak)"
],
"recipeInstructions": [
{
"@type": "HowToStep",
"text": "Sätt ugnen på 150 grader. "
},
{
"@type": "HowToStep",
"text": "Mät upp kladdkakemixen och blanda ihop med mjölk och olja med hjälp av en slickepott. "
},
{
"@type": "HowToStep",
"text": "Bryn smöret i en kastrull tills du får en nötig karaktär. "
},
{
"@type": "HowToStep",
"text": "Grovhacka hasselnötterna. "
},
{
"@type": "HowToStep",
"text": "Tillsätt nu de brynta smöret och hasselnötterna i smeten, blanda runt. "
},
{
"@type": "HowToStep",
"text": "Smöra eller olja en rund springform och täck med lite kokos eller ströbröd alt. använd ett bakplåtspapper. Häll i smeten och grädda i ugnen cirka 15 minuter. "
},
{
"@type": "HowToStep",
"text": "Ta ut och låt svalna, låt gärna kladdkakan stå i kylen ett par timmar för godast resultat. Servera sedan med en riktigt god vaniljglass eller en klick grädde. "
}
],
"nutrition": {
"@type": "NutritionInformation",
"servingSize": 8,
"calories": 1109,
"fatContent": 90,
"carbohydrateContent": 77,
"proteinContent": 42
}
} |
Something is off because the nutrition is indeed there. I'll check it out. |
@reaper47 I debugged this issue and the root cause is that this particular website stores only the numeric values for the nutritional information, whereas the scraper expects string only values in the UnmarshalJSON for the NutritionSchema ( recipya/internal/models/schema-recipe.go Line 876 in c0e825a
The mapping of the nutrition fields is essentially skipped. I tested this (rather crudely) for one of the properties with the below change, assuming that the nutrition function inside Recipya expects string values. This change then populated the property correctly in the final imported recipe. if val, ok := x["carbohydrateContent"].(float64); ok {
n.Carbohydrates = strconv.FormatFloat(val, 'f', -1, 64)
} Perhaps the UnmarshalJSON function for the NutritionSchema could check and account for if the source data is string, float or integer and convert the values accordingly, to accomodate different implementations of the LD+JSON schema? |
Excellent, thank you for looking into it! That is exactly it. We shall add a test in https://github.com/reaper47/recipya/blob/main/internal%2Fmodels%2Fschema-recipe_test.go#L303 and modify the UnmarshalJSON function you linked to cover nutrition fields that use numerical values. |
@reaper47 I have it handling both strings and number values on my end now, but then we have the interesting thing regarding that when we only have numbers we are also missing the unit type, e.g. grams, milligrams etc. As far as I know, nutritional information is always in metric, even for american recipe sites. Have you seen anything else during your investigations? If they are indeed always in metric then we can add static units for each property, e.g. calories in kcal, fat, sugar and protein in grams, sodium in milligrams etc., which we use in case of the nutritional information has number values. Edit: At least the recipe schema specifies metric units, so I think I can assume metric if setting static units for each property. Do you agree? |
Yes, nutrition is always in the metric system. I have yet to see a product in a grocery store in the US whose nutrition facts is not metric. We can safely assume the units you mentioned when not specified. |
Implemented in pull request #382 |
Pull request 382 has been merged! Closing this issue. |
Is your feature request related to a problem? Please describe.
Many websites today already have nutritional data present as part of the recipe.
Instead of trying to calculate this using generic ingredients within Recipya it would be better to extract this information directly from the recipe as is.
Describe the solution you'd like
If nutritional information is part of the recipe, try to extract it.
If not available, then use the current way of calculating the nutritional data.
For websites requiring custom scrapers this will of course be on a per website basis, but as nutritional information is part of the LD+JSON schema it should be possible to solve this for a big number of websites automatically by adding the nutritional extraction to the LD+JSON part of the scraper.
Additionally, this would also solve issues where the automatic nutritional calculation fails due to the recipe being in a language different than English.
The text was updated successfully, but these errors were encountered: