Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: some fixes and experiments #15

Merged
merged 8 commits into from
Nov 7, 2024
Merged

Conversation

stephanegigandet
Copy link
Contributor

@stephanegigandet stephanegigandet commented Dec 14, 2023

Main changes:

  • compute percent_estimate for parent ingredients (by summing the quantities of child ingredients), as the metrics will be computed on them if they have specified percent on the products
  • use [nutrient]_100g fields, as [nutrient] can be per serving or per 100g
  • refactored the relative constraints on ingredients, as they were not applied to children of the last ingredient
  • added a constraint that the sum of children is equal to the parent
  • added the minimization of the maximum distance between subsequent ingredients
  • experimenting with weights of nutrients
  • completely remove the nutrient contribution of ingredients without ciqual codes, as having min = 0 and max = 100 for them seems to make the whole nutrient distance not used: the percent of ingredients seeem to be distributed in an equitable way...
    e.g. for Nutella:
<title></title>
<meta name="generator" content="LibreOffice 7.3.7.2 (Linux)"/>
<style type="text/css">
	body,div,table,thead,tbody,tfoot,tr,th,td,p { font-family:"Liberation Sans"; font-size:x-small }
	a.comment-indicator:hover + comment { background:#ffd; position:absolute; display:block; border:1px solid black; padding:0.5em;  } 
	a.comment-indicator { background:red; display:inline-block; border:1px solid black; width:0.5em; height:0.5em;  } 
	comment { display:none;  } 
</style>
ingredient ciqual_food_code ciqual_proxy_food_code total_percent_estimate total_difference number_of_products number_of_products_where_specified
en:fat-reduced-cocoa     3,52 3,88 1 1
en:hazelnut-oil 17210   14 0,977 1 1
en:palm-oil 16129   14 0 1 0
en:skimmed-milk-powder 19054   3,52 3,08 1 1
en:soya-lecithin 42200   3,52 0 1 0
en:sugar   31016 54,4 0 1 0
en:vanillin     3,52 0 1 0
en:whey-powder     3,52 0 1 0

@stephanegigandet stephanegigandet changed the title fix: some fixes and experiments [work in progress] fix: some fixes and experiments Jan 29, 2024
@@ -41,6 +41,31 @@ def add_ingredients_to_solver(ingredients, solver, total_ingredients):

return ingredient_numvars

# Add constraints to ensure that the quantity of each ingredient is greater than or equal to the quantity of the next ingredient
# and that the sum of children ingredients is equal to the parent ingredient
def add_relative_constraints_on_ingredients(solver, parent_ingredient_numvar, ingredient_numvars):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this is a good way to do it. It might be better just to keep ingredients without a CIQAL code completely out of the optimisation and then just add them back in at the end, half way between their greater and lesser ingredient

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we do need the solver to know about ingredients without CIQUAL code even if we consider that they don't have any nutrient contribution, as they add constraints on the other ingredients.
e.g. if we have "unknown, sugar, oil", then sugar must be 50% or less.

def get_quantity_estimate(ingredient_numvars):
total_quantity = 0
quantity_estimate = 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what these changes are trying to do but they have broken the unit tests around quantity_estimates and evaporation. The idea of the quantity_estimate is that it is the amount of original ingredient that is needed to make 100g of the product, i.e. the quantity before evaporation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still the same definition of quantity_estimate, except that it adds quantity estimates for parent ingredients as well.

continue


computed_nutrient['weighting'] = 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a permanent change or just an experiment? Breaks a unit test (I have commented the test for now).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried many values and looked at the resulting accuracy: https://docs.google.com/document/d/1NFg2lAaDzSkRldKU6uDGcjPVxbfTsc9p59r7L6OEqiQ/edit#heading=h.fl6o8g6m5uv5

So far it's the best I found.

@stephanegigandet
Copy link
Contributor Author

Metrics are much better with this branch, so I will merge it.

reran model on recipe estimator (main branch)

$ ./scripts/run_model_on_input_test_sets.py recipe_estimator_localhost recipe_estimator_main_20241107 fr-1000-some-specified-popular

Saving metrics in result: test-sets/results/recipe_estimator_main_20241107/fr-1000-some-specified-popular/90162800.json
Saving results summary in test set directory: products_stats.csv
Results summary for test set fr-1000-some-specified-popular:
Total difference: 34272.873
Number of products: 1000
Average difference: 34.27
All ciqual test set total difference: 2435.7105
All ciqual test set number of products: 128
All ciqual test set average difference: 19.03
Percent estimate with ciqual_food_code: 50.4
Percent estimate with ciqual_proxy_food_code: 33.2
Percent estimate with ciqual or ciqual_proxy_food_code: 83.6


reran model on recipe estimator (stephane-experiments branch)

Saving metrics in result: test-sets/results/recipe_estimator_main_20241107/fr-1000-some-specified-popular/90162800.json
Saving results summary in test set directory: results_summary.json
Results summary for test set fr-1000-some-specified-popular:
Total difference: 23698.528
Number of products: 1000
Average difference: 23.7
All ciqual test set total difference: 1577.6545
All ciqual test set number of products: 128
All ciqual test set average difference: 12.33
Percent estimate with ciqual_food_code: 50.2
Percent estimate with ciqual_proxy_food_code: 35.6
Percent estimate with ciqual or ciqual_proxy_food_code: 85.9

@stephanegigandet stephanegigandet merged commit fb7a756 into main Nov 7, 2024
@stephanegigandet stephanegigandet deleted the stephane-experiments branch November 7, 2024 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

2 participants