Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(_find_ml_task): Refine ML task detection logic #1305

Merged
merged 11 commits into from
Feb 11, 2025

Conversation

auguste-probabl
Copy link
Contributor

@auguste-probabl auguste-probabl commented Feb 10, 2025

ML detection now also works for multioutput targets (e.g. where y is a 2d array).

Added 3 new MLTask variants: multioutput-binary-classification, multioutput-multiclass-classification and multioutput-regression. regression still means "single-output regression".

The detection uses the same mechanism described in be29a26 to discriminate between an array of integers that is actually for classification vs. one that is for regression.

Closes #1005

Copy link
Contributor

github-actions bot commented Feb 10, 2025

Coverage

Coverage Report for backend
FileStmtsMissCoverMissing
venv/lib/python3.12/site-packages/skore
   __init__.py140100% 
   __main__.py880%3–19
   exceptions.py440%4–23
venv/lib/python3.12/site-packages/skore/cli
   __init__.py550%3–8
   cli.py22220%3–70
   color_format.py49490%3–116
venv/lib/python3.12/site-packages/skore/persistence
   __init__.py00100% 
venv/lib/python3.12/site-packages/skore/persistence/item
   __init__.py56393%96–99
   altair_chart_item.py19191%14
   item.py22195%86
   matplotlib_figure_item.py36195%19
   media_item.py220100% 
   numpy_array_item.py27194%16
   pandas_dataframe_item.py29194%14
   pandas_series_item.py29194%14
   pickle_item.py220100% 
   pillow_image_item.py25193%15
   plotly_figure_item.py20192%14
   polars_dataframe_item.py27194%14
   polars_series_item.py22192%14
   primitive_item.py23291%13–15
   sklearn_base_estimator_item.py29194%15
   skrub_table_report_item.py10186%11
venv/lib/python3.12/site-packages/skore/persistence/repository
   __init__.py20100% 
   item_repository.py59591%15–16, 202–203, 226
venv/lib/python3.12/site-packages/skore/persistence/storage
   __init__.py40100% 
   abstract_storage.py220100% 
   disk_cache_storage.py33195%44
   in_memory_storage.py200100% 
venv/lib/python3.12/site-packages/skore/persistence/view
   __init__.py220%3–5
   view.py550%3–20
venv/lib/python3.12/site-packages/skore/project
   __init__.py30100% 
   _launch.py150199%278
   _open.py90100% 
   project.py62199%236
venv/lib/python3.12/site-packages/skore/sklearn
   __init__.py40100% 
   _base.py1401490%36, 48, 58, 91, 94, 147–156, 168–>173, 183–184
   find_ml_task.py49197%103–>111, 135
   types.py20100% 
venv/lib/python3.12/site-packages/skore/sklearn/_cross_validation
   __init__.py50100% 
   metrics_accessor.py140099%126–>137
   report.py950100% 
venv/lib/python3.12/site-packages/skore/sklearn/_estimator
   __init__.py50100% 
   metrics_accessor.py255995%144–153, 178–>187, 186, 226–>228, 252, 279–283, 298, 318
   report.py120099%213–>219, 221–>223
   utils.py11110%1–19
venv/lib/python3.12/site-packages/skore/sklearn/_plot
   __init__.py40100% 
   precision_recall_curve.py119198%229–>246, 317
   prediction_error.py95198%159, 173–>176
   roc_curve.py1260100% 
   utils.py89593%23, 47–49, 53
venv/lib/python3.12/site-packages/skore/sklearn/train_test_split
   __init__.py00100% 
   train_test_split.py36294%16–17
venv/lib/python3.12/site-packages/skore/sklearn/train_test_split/warning
   __init__.py80100% 
   high_class_imbalance_too_few_examples_warning.py17378%16–18, 80
   high_class_imbalance_warning.py18288%16–18
   random_state_unset_warning.py11187%15
   shuffle_true_warning.py9091%44–>exit
   stratify_is_set_warning.py11187%15
   time_based_column_warning.py22189%17, 69–>exit
   train_test_split_warning.py5180%21
venv/lib/python3.12/site-packages/skore/ui
   __init__.py00100% 
   app.py32320%3–83
   dependencies.py440%3–10
   project_routes.py41410%3–92
   serializers.py77770%3–194
   server.py17170%3–40
venv/lib/python3.12/site-packages/skore/utils
   __init__.py60100% 
   _accessor.py70100% 
   _environment.py26097%29–>34
   _logger.py21484%14–18
   _patch.py11546%19–35
   _progress_bar.py300100% 
   _show_versions.py310100% 
TOTAL259035386% 

Tests Skipped Failures Errors Time
541 3 💤 0 ❌ 0 🔥 53.563s ⏱️

Copy link
Contributor

github-actions bot commented Feb 10, 2025

Documentation preview @ ba2168c

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also another "bug" that I realised. In l.99-100, we return "unknown".

In short, we use the estimator and if the estimator does not provide us the require information then we return "unknown". I think it would be better to fall back to the target inference, and try to get the ML task from it.

I realized that if someone provide a non-compatible estimator, we would not be inferring the task properly.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the rest of the PR, all look good.

@auguste-probabl auguste-probabl merged commit 9db5286 into main Feb 11, 2025
19 checks passed
@auguste-probabl auguste-probabl deleted the refine-ml-task-logic branch February 11, 2025 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Have distinct problem for single-output regression and multi-output regression
3 participants