-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add probabilistic classification to hiclass #minor #119
Add probabilistic classification to hiclass #minor #119
Conversation
|
||
else: | ||
calibrators = Parallel(n_jobs=self.n_jobs)( | ||
delayed(logging_wrapper)( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you tested the parallel logging in the cluster? It used to be the case that messages were repeated multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was not the case in my experiments
) | ||
proba = calibrator.predict_proba(X) | ||
|
||
y[:, 0] = calibrator.classes_[np.argmax(proba, axis=1)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you need to use the predictions, wouldn't it be better to use the already implemented predict method? I imagine it could simplify the code here and avoid redundancy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the same as calling the implemented predict method, because it is using calibrated probabilities for the prediction while the predict method uses uncalibrated probabilities.
y_true = make_leveled(y_true) | ||
y_true = classifier._disambiguate(y_true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I follow why make_leveled and _disambiguate need to be called here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Training and calibration samples are transformed using these methods in the HierarchicalClassifier class. I apply the same transformations for the test samples before calculating the metrics to get the classes for each level seperately (instead of a list of labels for each data point).
@@ -189,10 +350,82 @@ def test_predict_sparse(fitted_logistic_regression): | |||
assert_array_equal(ground_truth, prediction) | |||
|
|||
|
|||
def test_predict_proba(fitted_logistic_regression): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe you can use parametrize to reduce redundancy when tests are the same in different files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this wrapper necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not necessary, although I think it improves the readability of the code and makes future customizations easier. Extending _BinaryCalibrator also ensures that all calibrators have the same methods.
positive_label = 1 | ||
unique_labels = np.unique(y) | ||
assert len(unique_labels) <= 2 | ||
|
||
y = np.where(y == positive_label, 1, 0) | ||
y = y.reshape(-1) # make sure it's a 1D array | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe these lines can be replaced with the binary_only
estimator tag https://scikit-learn.org/stable/developers/develop.html#estimator-tags
positive_label = 1 | |
unique_labels = np.unique(y) | |
assert len(unique_labels) <= 2 | |
y = np.where(y == positive_label, 1, 0) | |
y = y.reshape(-1) # make sure it's a 1D array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't use estimator tags on this class because it does not extend BaseEstimator
Hi @LukasDrews97, Just a quick request from someone from France that reached out to me via e-mail. Would it be possible to add a threshold to remove labels that have low probability? |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #119 +/- ##
==========================================
- Coverage 96.68% 94.03% -2.66%
==========================================
Files 13 28 +15
Lines 1268 2297 +1029
==========================================
+ Hits 1226 2160 +934
- Misses 42 137 +95 ☔ View full report in Codecov by Sentry. |
…igmoid' as calibration method
Add probabilistic classification via calibration to hiclass using the following methods: