feat(_find_ml_task): Refine ML task detection logic #1305

auguste-probabl · 2025-02-10T16:57:15Z

ML detection now also works for multioutput targets (e.g. where y is a 2d array).

Added 3 new MLTask variants: multioutput-binary-classification, multioutput-multiclass-classification and multioutput-regression. regression still means "single-output regression".

The detection uses the same mechanism described in be29a26 to discriminate between an array of integers that is actually for classification vs. one that is for regression.

Closes #1005

github-actions · 2025-02-10T17:01:14Z

Coverage Report for backend

File	Stmts	Miss	Cover	Missing
venv/lib/python3.12/site-packages/skore
__init__.py	14	0	100%
__main__.py	8	8	0%	3–19
exceptions.py	4	4	0%	4–23
venv/lib/python3.12/site-packages/skore/cli
__init__.py	5	5	0%	3–8
cli.py	22	22	0%	3–70
color_format.py	49	49	0%	3–116
venv/lib/python3.12/site-packages/skore/persistence
__init__.py	0	0	100%
venv/lib/python3.12/site-packages/skore/persistence/item
__init__.py	56	3	93%	96–99
altair_chart_item.py	19	1	91%	14
item.py	22	1	95%	86
matplotlib_figure_item.py	36	1	95%	19
media_item.py	22	0	100%
numpy_array_item.py	27	1	94%	16
pandas_dataframe_item.py	29	1	94%	14
pandas_series_item.py	29	1	94%	14
pickle_item.py	22	0	100%
pillow_image_item.py	25	1	93%	15
plotly_figure_item.py	20	1	92%	14
polars_dataframe_item.py	27	1	94%	14
polars_series_item.py	22	1	92%	14
primitive_item.py	23	2	91%	13–15
sklearn_base_estimator_item.py	29	1	94%	15
skrub_table_report_item.py	10	1	86%	11
venv/lib/python3.12/site-packages/skore/persistence/repository
__init__.py	2	0	100%
item_repository.py	59	5	91%	15–16, 202–203, 226
venv/lib/python3.12/site-packages/skore/persistence/storage
__init__.py	4	0	100%
abstract_storage.py	22	0	100%
disk_cache_storage.py	33	1	95%	44
in_memory_storage.py	20	0	100%
venv/lib/python3.12/site-packages/skore/persistence/view
__init__.py	2	2	0%	3–5
view.py	5	5	0%	3–20
venv/lib/python3.12/site-packages/skore/project
__init__.py	3	0	100%
_launch.py	150	1	99%	278
_open.py	9	0	100%
project.py	62	1	99%	236
venv/lib/python3.12/site-packages/skore/sklearn
__init__.py	4	0	100%
_base.py	140	14	90%	36, 48, 58, 91, 94, 147–156, 168–>173, 183–184
find_ml_task.py	49	1	97%	103–>111, 135
types.py	2	0	100%
venv/lib/python3.12/site-packages/skore/sklearn/_cross_validation
__init__.py	5	0	100%
metrics_accessor.py	140	0	99%	126–>137
report.py	95	0	100%
venv/lib/python3.12/site-packages/skore/sklearn/_estimator
__init__.py	5	0	100%
metrics_accessor.py	255	9	95%	144–153, 178–>187, 186, 226–>228, 252, 279–283, 298, 318
report.py	120	0	99%	213–>219, 221–>223
utils.py	11	11	0%	1–19
venv/lib/python3.12/site-packages/skore/sklearn/_plot
__init__.py	4	0	100%
precision_recall_curve.py	119	1	98%	229–>246, 317
prediction_error.py	95	1	98%	159, 173–>176
roc_curve.py	126	0	100%
utils.py	89	5	93%	23, 47–49, 53
venv/lib/python3.12/site-packages/skore/sklearn/train_test_split
__init__.py	0	0	100%
train_test_split.py	36	2	94%	16–17
venv/lib/python3.12/site-packages/skore/sklearn/train_test_split/warning
__init__.py	8	0	100%
high_class_imbalance_too_few_examples_warning.py	17	3	78%	16–18, 80
high_class_imbalance_warning.py	18	2	88%	16–18
random_state_unset_warning.py	11	1	87%	15
shuffle_true_warning.py	9	0	91%	44–>exit
stratify_is_set_warning.py	11	1	87%	15
time_based_column_warning.py	22	1	89%	17, 69–>exit
train_test_split_warning.py	5	1	80%	21
venv/lib/python3.12/site-packages/skore/ui
__init__.py	0	0	100%
app.py	32	32	0%	3–83
dependencies.py	4	4	0%	3–10
project_routes.py	41	41	0%	3–92
serializers.py	77	77	0%	3–194
server.py	17	17	0%	3–40
venv/lib/python3.12/site-packages/skore/utils
__init__.py	6	0	100%
_accessor.py	7	0	100%
_environment.py	26	0	97%	29–>34
_logger.py	21	4	84%	14–18
_patch.py	11	5	46%	19–35
_progress_bar.py	30	0	100%
_show_versions.py	31	0	100%
TOTAL	2590	353	86%

Tests	Skipped	Failures	Errors	Time
541	3 💤	0 ❌	0 🔥	53.563s ⏱️

github-actions · 2025-02-10T17:07:46Z

Documentation preview @ ba2168c

glemaitre

There is also another "bug" that I realised. In l.99-100, we return "unknown".

In short, we use the estimator and if the estimator does not provide us the require information then we return "unknown". I think it would be better to fall back to the target inference, and try to get the ML task from it.

I realized that if someone provide a non-compatible estimator, we would not be inferring the task properly.

glemaitre

For the rest of the PR, all look good.

…regression

github-actions bot assigned auguste-probabl Feb 10, 2025

auguste-probabl force-pushed the refine-ml-task-logic branch from da17821 to 76dc0fd Compare February 10, 2025 16:58

glemaitre reviewed Feb 11, 2025

View reviewed changes

auguste-probabl requested a review from glemaitre February 11, 2025 09:51

thomass-dev approved these changes Feb 11, 2025

View reviewed changes

auguste-probabl added 11 commits February 11, 2025 14:26

add failing tests

304e0cf

lint

4b367dc

deal with continuous-multioutput

8ffda05

Deal with multiclass-multioutput

e642c89

refine analysis

2b18daa

deal with multioutput-binary-classification

0083806

deal with multioutput-binary-classification that is actually a multi-…

8f63507

…regression

refactor classification detection logic

9237964

sort

84eafd8

add doctests

65feca4

fallback on target in more cases

ba2168c

auguste-probabl force-pushed the refine-ml-task-logic branch from e7a7a96 to ba2168c Compare February 11, 2025 13:26

auguste-probabl merged commit 9db5286 into main Feb 11, 2025
19 checks passed

auguste-probabl deleted the refine-ml-task-logic branch February 11, 2025 14:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(_find_ml_task): Refine ML task detection logic #1305

feat(_find_ml_task): Refine ML task detection logic #1305

auguste-probabl commented Feb 10, 2025 •

edited

Loading

github-actions bot commented Feb 10, 2025 •

edited

Loading

github-actions bot commented Feb 10, 2025 •

edited

Loading

glemaitre left a comment

glemaitre left a comment

feat(_find_ml_task): Refine ML task detection logic #1305

feat(_find_ml_task): Refine ML task detection logic #1305

Conversation

auguste-probabl commented Feb 10, 2025 • edited Loading

github-actions bot commented Feb 10, 2025 • edited Loading

github-actions bot commented Feb 10, 2025 • edited Loading

glemaitre left a comment

Choose a reason for hiding this comment

glemaitre left a comment

Choose a reason for hiding this comment

auguste-probabl commented Feb 10, 2025 •

edited

Loading

github-actions bot commented Feb 10, 2025 •

edited

Loading

github-actions bot commented Feb 10, 2025 •

edited

Loading