Add probabilistic classification to hiclass #minor #119

LukasDrews97 · 2024-04-09T14:51:33Z

Add probabilistic classification via calibration to hiclass using the following methods:

Platt Scaling
Isotonic Regression
Beta calibration
(Inductive/Cross) Venn-ABERS calibration

…ty distribution

hiclass/HierarchicalClassifier.py

mirand863 · 2024-04-16T14:24:24Z

hiclass/LocalClassifierPerLevel.py

+
+            else:
+                calibrators = Parallel(n_jobs=self.n_jobs)(
+                    delayed(logging_wrapper)(


Have you tested the parallel logging in the cluster? It used to be the case that messages were repeated multiple times.

That was not the case in my experiments

mirand863 · 2024-04-17T11:38:51Z

hiclass/LocalClassifierPerParentNode.py

+        )
+        proba = calibrator.predict_proba(X)
+
+        y[:, 0] = calibrator.classes_[np.argmax(proba, axis=1)]


If you need to use the predictions, wouldn't it be better to use the already implemented predict method? I imagine it could simplify the code here and avoid redundancy

This is not the same as calling the implemented predict method, because it is using calibrated probabilities for the prediction while the predict method uses uncalibrated probabilities.

hiclass/__init__.py

hiclass/_calibration/Calibrator.py

mirand863 · 2024-04-17T14:39:17Z

hiclass/metrics.py

+    y_true = make_leveled(y_true)
+    y_true = classifier._disambiguate(y_true)


I am not sure I follow why make_leveled and _disambiguate need to be called here.

Training and calibration samples are transformed using these methods in the HierarchicalClassifier class. I apply the same transformations for the test samples before calculating the metrics to get the classes for each level seperately (instead of a list of labels for each data point).

hiclass/probability_combiner/ProbabilityCombiner.py

mirand863 · 2024-04-24T15:18:18Z

tests/test_LocalClassifierPerNode.py

@@ -189,10 +350,82 @@ def test_predict_sparse(fitted_logistic_regression):
    assert_array_equal(ground_truth, prediction)


+def test_predict_proba(fitted_logistic_regression):


maybe you can use parametrize to reduce redundancy when tests are the same in different files

mirand863 · 2024-04-24T16:43:31Z

hiclass/_calibration/IsotonicRegression.py

why is this wrapper necessary?

It is not necessary, although I think it improves the readability of the code and makes future customizations easier. Extending _BinaryCalibrator also ensures that all calibrators have the same methods.

mirand863 · 2024-04-29T14:01:10Z

hiclass/_calibration/VennAbersCalibrator.py

+        positive_label = 1
+        unique_labels = np.unique(y)
+        assert len(unique_labels) <= 2
+
+        y = np.where(y == positive_label, 1, 0)
+        y = y.reshape(-1)  # make sure it's a 1D array
+


Maybe these lines can be replaced with the binary_only estimator tag https://scikit-learn.org/stable/developers/develop.html#estimator-tags

Suggested change

positive_label = 1

unique_labels = np.unique(y)

assert len(unique_labels) <= 2

y = np.where(y == positive_label, 1, 0)

y = y.reshape(-1) # make sure it's a 1D array

I can't use estimator tags on this class because it does not extend BaseEstimator

mirand863 · 2024-05-03T14:06:29Z

Hi @LukasDrews97,

Just a quick request from someone from France that reached out to me via e-mail. Would it be possible to add a threshold to remove labels that have low probability?

codecov-commenter · 2024-09-08T13:50:13Z

Codecov Report

Attention: Patch coverage is 90.49505% with 96 lines in your changes missing coverage. Please review.

Project coverage is 94.03%. Comparing base (4595264) to head (38c011a).
Report is 4 commits behind head on main.

Files with missing lines	Patch %	Lines
hiclass/_calibration/VennAbersCalibrator.py	79.62%	43 Missing ⚠️
hiclass/HierarchicalClassifier.py	85.84%	16 Missing ⚠️
hiclass/metrics.py	95.45%	8 Missing ⚠️
hiclass/LocalClassifierPerLevel.py	92.00%	6 Missing ⚠️
hiclass/_calibration/Calibrator.py	91.89%	6 Missing ⚠️
hiclass/Pipeline.py	44.44%	5 Missing ⚠️
hiclass/LocalClassifierPerNode.py	95.78%	4 Missing ⚠️
hiclass/_calibration/BetaCalibrator.py	87.87%	4 Missing ⚠️
...iclass/probability_combiner/ProbabilityCombiner.py	88.88%	3 Missing ⚠️
hiclass/LocalClassifierPerParentNode.py	98.76%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #119      +/-   ##
==========================================
- Coverage   96.68%   94.03%   -2.66%     
==========================================
  Files          13       28      +15     
  Lines        1268     2297    +1029     
==========================================
+ Hits         1226     2160     +934     
- Misses         42      137      +95

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…igmoid' as calibration method

LukasDrews97 added 30 commits January 11, 2024 14:40

update pipfile

79b54f2

add 'calibration_method' parameter

2afe7ec

update

211d249

update

eb9b753

update

ded07f1

add predict_proba() method for LocalClassifierPerNode

c85c0f8

fix tests

6e69c95

add scipy as dependency

a0a3206

add scipy as dependency

40a4c9e

flake8, pydocstyle

e522365

add stratified sampling to cvap

505af20

add tests for calibration

fd3c66c

add local brier score + test

521741f

fix cvap

d437d0e

add log loss + test, make brier loss more robust

c0ddc09

add MultiplyCombiner + test

9bc6b2e

add ArithmeticMeanCombiner + test

eb2c044

add GeometricMeanCombiner + test

dacf667

add more test cases for ArithmeticMeanCombiner and GeometricMeanCombiner

3be6b55

enable multithreaded calibration

8f2fa73

add custom Pipeline to support calibration step

d59fea4

add ECE, SCE and ACE calibration metrics + tests

85f8554

fix misspelling

918f734

merge main into uncertainty

0506ebd

add support for LocalClassifierPerParentNode

33e0764

add predict_proba to all model types, change output to full probabili…

8ff021f

…ty distribution

fix test - make LocalClassifierPerLevel compatible with scikit learn

e7bbbf5

add multithreading to LocalClassifierPerLevel

d124318

fix bug

f95981e

fix error with different number of levels

aad04a6

correct test

ad2f735

mirand863 reviewed Apr 16, 2024

View reviewed changes

hiclass/HierarchicalClassifier.py Outdated Show resolved Hide resolved

mirand863 reviewed Apr 16, 2024

View reviewed changes

enable BetaCalibrator to handle null values;add test

6f9a1f5

mirand863 reviewed Apr 17, 2024

View reviewed changes

hiclass/__init__.py Outdated Show resolved Hide resolved

mirand863 reviewed Apr 17, 2024

View reviewed changes

hiclass/_calibration/Calibrator.py Outdated Show resolved Hide resolved

mirand863 reviewed Apr 17, 2024

View reviewed changes

LukasDrews97 added 2 commits April 19, 2024 20:19

add method to normalize probabilities

49ca501

remove unused code

f6fe948

mirand863 reviewed Apr 24, 2024

View reviewed changes

hiclass/probability_combiner/ProbabilityCombiner.py Outdated Show resolved Hide resolved

mirand863 reviewed Apr 24, 2024

View reviewed changes

mirand863 reviewed Apr 29, 2024

View reviewed changes

LukasDrews97 added 4 commits May 4, 2024 17:29

fix error when calculating log loss

d1a73ac

fix bug when calculating metrics for a single level

5f4fca9

merge main into branch

8b2bbe0

remove redundant import

db15547

LukasDrews97 added 8 commits September 8, 2024 15:55

fix spelling error

9e0417f

update Pipfile to contain docs requirements

38c011a

add missing option in doc string

2953400

change try-catch to if statement

3f2ae97

make multiply the default aggregation method; add 'platt' alias to 's…

0f347f9

…igmoid' as calibration method

update docstrings; add calibration example to documentation

f299ccd

lint

5b1c795

add more documentation

b318e59

mirand863 changed the title ~~Add probabilistic classification to hiclass~~ Add probabilistic classification to hiclass #minor Nov 25, 2024

mirand863 merged commit ee8cb75 into scikit-learn-contrib:main Nov 25, 2024
14 checks passed

		y_true = make_leveled(y_true)
		y_true = classifier._disambiguate(y_true)

		@@ -189,10 +350,82 @@ def test_predict_sparse(fitted_logistic_regression):
		assert_array_equal(ground_truth, prediction)


		def test_predict_proba(fitted_logistic_regression):

Add probabilistic classification to hiclass #minor #119

Add probabilistic classification to hiclass #minor #119

Uh oh!

Conversation

LukasDrews97 commented Apr 9, 2024

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mirand863 commented May 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-commenter commented Sep 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

mirand863 commented May 3, 2024 •

edited

Loading

codecov-commenter commented Sep 8, 2024 •

edited

Loading