Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

415 documentation for winkler interval score #444

Merged
merged 42 commits into from
May 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
09721e1
FIX: image size
LacombeLouis May 15, 2024
06e5630
FIX: missing ref in metrics.py
LacombeLouis May 15, 2024
ee0e17d
chore: Add METRICS section to table of contents
LacombeLouis May 15, 2024
64a0299
ADD: theoretical description for metrics
LacombeLouis May 15, 2024
e65a081
Update notebook links in regression, classification and multilabel_cl…
LacombeLouis May 15, 2024
9f8b451
chore: Add verbose mode to LGBMRegressor in plot_cqr_tutorial.py
LacombeLouis May 15, 2024
c7fd1bc
Update theoretical description titles to reflect the specific type
LacombeLouis May 15, 2024
e0c19c8
Update doc/theoretical_description_metrics.rst
LacombeLouis May 16, 2024
e194190
Update doc/theoretical_description_metrics.rst
LacombeLouis May 16, 2024
b4a2c38
Update doc/theoretical_description_metrics.rst
LacombeLouis May 16, 2024
d5b2d2f
Update doc/theoretical_description_metrics.rst
LacombeLouis May 16, 2024
a49582f
Update doc/theoretical_description_metrics.rst
LacombeLouis May 16, 2024
696fee8
Update doc/theoretical_description_metrics.rst
LacombeLouis May 16, 2024
b115895
Update doc/theoretical_description_metrics.rst
LacombeLouis May 16, 2024
dfa2ca6
Update doc/theoretical_description_metrics.rst
LacombeLouis May 16, 2024
e9810ec
Update doc/theoretical_description_metrics.rst
LacombeLouis May 16, 2024
7ad8509
Update doc/theoretical_description_metrics.rst
LacombeLouis May 16, 2024
04531d1
Update doc/theoretical_description_metrics.rst
LacombeLouis May 16, 2024
8f0c081
Update doc/theoretical_description_metrics.rst
LacombeLouis May 16, 2024
eca3e52
Update doc/theoretical_description_metrics.rst
LacombeLouis May 16, 2024
71a0e46
Update doc/theoretical_description_metrics.rst
LacombeLouis May 16, 2024
9d98b0d
Update doc/theoretical_description_metrics.rst
LacombeLouis May 16, 2024
009ad15
Update Michelin image size in README.rst
LacombeLouis May 16, 2024
e2dcf3e
Update theoretical_description_metrics.rst with ECE and Top-Label ECE…
LacombeLouis May 16, 2024
eaaff00
FIX: fix small issues with documentation
LacombeLouis May 16, 2024
ee62fda
Add documentation for metrics.
LacombeLouis May 16, 2024
d86006f
Apply suggestions from TCO from code review
LacombeLouis May 21, 2024
5cc1e6f
Update maxdepth for metrics documentation index.rst
LacombeLouis May 21, 2024
e319da2
FIX: issues of documentation with bullet points
LacombeLouis May 21, 2024
6edf468
Update maxdepth to 0 in index.rst
LacombeLouis May 21, 2024
422de43
FIX: reset correct maxdepth
LacombeLouis May 22, 2024
488a7b4
FIX: headers showing in sidebar
LacombeLouis May 22, 2024
10b54ec
Merge branch 'master' into 415-documentation-for-winkler-interval-score
LacombeLouis May 22, 2024
668b555
FIX: add all metrics of calibration in the same spot
LacombeLouis May 22, 2024
9b458ba
FIX: header and labels correction
LacombeLouis May 22, 2024
75716ce
FIX: indentation of headers
LacombeLouis May 23, 2024
70bb4b1
FIX: standardization
LacombeLouis May 23, 2024
2090163
FIX: no references in tutorials
LacombeLouis May 23, 2024
ded3f1e
Fix formatting and indentation in regression tutorial
LacombeLouis May 23, 2024
9dcca60
FIX: add some line breaks in doc
thibaultcordier May 27, 2024
9f21fda
Update examples/regression/4-tutorials/plot_main-tutorial-regression.py
LacombeLouis May 27, 2024
4d823ab
Merge branch 'master' into 415-documentation-for-winkler-interval-score
LacombeLouis May 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions HISTORY.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ History
* Reduce precision for test in `MapieCalibrator`.
* Fix invalid certificate when downloading data.
* Add citations utility to the documentation.
* Add documentation for metrics.
* Add explanation and example for symmetry argument in CQR.

0.8.3 (2024-03-01)
Expand Down
15 changes: 10 additions & 5 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -172,23 +172,28 @@ and with the financial support from Région Ile de France and Confiance.ai.
|Quantmetry| |Michelin| |ENS| |Confiance.ai| |IledeFrance|

.. |Quantmetry| image:: https://www.quantmetry.com/wp-content/uploads/2020/08/08-Logo-quant-Texte-noir.svg
:height: 35
:height: 35px
:width: 140px
:target: https://www.quantmetry.com/

.. |Michelin| image:: https://agngnconpm.cloudimg.io/v7/https://dgaddcosprod.blob.core.windows.net/corporate-production/attachments/cls05tqdd9e0o0tkdghwi9m7n-clooe1x0c3k3x0tlu4cxi6dpn-bibendum-salut.full.png
:height: 35
:height: 50px
:width: 45px
:target: https://www.michelin.com/en/

.. |ENS| image:: https://file.diplomeo-static.com/file/00/00/01/34/13434.svg
:height: 35
:height: 35px
:width: 140px
:target: https://ens-paris-saclay.fr/en

.. |Confiance.ai| image:: https://pbs.twimg.com/profile_images/1443838558549258264/EvWlv1Vq_400x400.jpg
:height: 35
:height: 45px
:width: 45px
:target: https://www.confiance.ai/

.. |IledeFrance| image:: https://www.iledefrance.fr/sites/default/files/logo/2024-02/logoGagnerok.svg
:height: 35
:height: 35px
:width: 140px
:target: https://www.iledefrance.fr/


Expand Down
7 changes: 7 additions & 0 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,13 @@
examples_calibration/index
notebooks_calibration

.. toctree::
:maxdepth: 2
:hidden:
:caption: METRICS

theoretical_description_metrics

.. toctree::
:maxdepth: 2
:hidden:
Expand Down
8 changes: 4 additions & 4 deletions doc/notebooks_classification.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ problems for computer vision settings that are too heavy to be included in the e
galleries.


1. Estimating prediction sets on the Cifar10 dataset : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/Cifar10.ipynb>`_
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
1. Estimating prediction sets on the Cifar10 dataset : `cifar_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/Cifar10.ipynb>`_
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2. Top-label calibration for outputs of ML models : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/top_label_calibration.ipynb>`_
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2. Top-label calibration for outputs of ML models : `top_label_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/top_label_calibration.ipynb>`_
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
8 changes: 4 additions & 4 deletions doc/notebooks_multilabel_classification.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ The following examples present advanced analyses
on multi-label classification problems with different
methods proposed in MAPIE.

1. Overview of Recall Control for Multi-Label Classification : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/tutorial_multilabel_classification_recall.ipynb>`_
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1. Overview of Recall Control for Multi-Label Classification : `recall_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/tutorial_multilabel_classification_recall.ipynb>`_
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

2. Overview of Precision Control for Multi-Label Classification : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/tutorial_multilabel_classification_precision.ipynb>`_
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2. Overview of Precision Control for Multi-Label Classification : `precision_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/classification/tutorial_multilabel_classification_precision.ipynb>`_
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
8 changes: 4 additions & 4 deletions doc/notebooks_regression.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ This section lists a series of Jupyter notebooks hosted on the MAPIE Github repo
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


2. Estimating the uncertainties in the exoplanet masses : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/regression/exoplanets.ipynb>`_
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
2. Estimating the uncertainties in the exoplanet masses : `exoplanet_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/regression/exoplanets.ipynb>`_
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


3. Estimating prediction intervals for time series forecast with EnbPI and ACI : `notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/regression/ts-changepoint.ipynb>`_
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
3. Estimating prediction intervals for time series forecast with EnbPI and ACI : `ts_notebook <https://github.com/scikit-learn-contrib/MAPIE/tree/master/notebooks/regression/ts-changepoint.ipynb>`_
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


10 changes: 4 additions & 6 deletions doc/quick_start.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,9 @@ In regression settings, **MAPIE** provides prediction intervals on single-output
In classification settings, **MAPIE** provides prediction sets on multi-class data.
In any case, **MAPIE** is compatible with any scikit-learn-compatible estimator.

Estimate your prediction intervals
==================================

1. Download and install the module
----------------------------------
==================================

Install via ``pip``:

Expand All @@ -33,7 +31,7 @@ To install directly from the github repository :


2. Run MapieRegressor
---------------------
=====================

Let us start with a basic regression problem.
Here, we generate one-dimensional noisy data that we fit with a linear model.
Expand Down Expand Up @@ -114,8 +112,8 @@ It is given by the alpha parameter defined in ``MapieRegressor``, here equal to
thus giving target coverages of ``0.95`` and ``0.68``.
The effective coverage is the actual fraction of true labels lying in the prediction intervals.

2. Run MapieClassifier
----------------------
3. Run MapieClassifier
=======================

Similarly, it's possible to do the same for a basic classification problem.

Expand Down
10 changes: 5 additions & 5 deletions doc/theoretical_description_binary_classification.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
.. title:: Theoretical Description : contents
.. title:: Theoretical Description Binary Classification : contents

.. _theoretical_description_binay_classification:

=======================
#######################
Theoretical Description
=======================
#######################

There are mainly three different ways to handle uncertainty quantification in binary classification:
calibration (see :doc:`theoretical_description_calibration`), confidence interval (CI) for the probability
Expand Down Expand Up @@ -83,8 +83,8 @@ for the labels of test objects which are guaranteed to be well-calibrated under
that the observations are generated independently from the same distribution [2].


4. References
-------------
References
----------

[1] Gupta, Chirag, Aleksandr Podkopaev, and Aaditya Ramdas.
"Distribution-free binary classification: prediction sets, confidence intervals, and calibration."
Expand Down
117 changes: 7 additions & 110 deletions doc/theoretical_description_calibration.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,9 @@

.. _theoretical_description_calibration:

=======================
#######################
Theoretical Description
=======================

#######################

One method for multi-class calibration has been implemented in MAPIE so far :
Top-Label Calibration [1].
Expand Down Expand Up @@ -34,8 +33,8 @@ To apply calibration directly to a multi-class context, Gupta et al. propose a f
a multi-class calibration to multiple binary calibrations (M2B).


1. Top-Label
------------
Top-Label
---------

Top-Label calibration is a calibration technique introduced by Gupta et al. to calibrate the model according to the highest score and
the corresponding class (see [1] Section 2). This framework offers to apply binary calibration techniques to multi-class calibration.
Expand All @@ -50,109 +49,8 @@ according to Top-Label calibration if:
Pr(Y = c(X) \mid h(X), c(X)) = h(X)


2. Metrics for calibration
--------------------------

**Expected calibration error**

The main metric to check if the calibration is correct is the Expected Calibration Error (ECE). It is based on two
components, accuracy and confidence per bin. The number of bins is a hyperparamater :math:`M`, and we refer to a specific bin by
:math:`B_m`.

.. math::
\text{acc}(B_m) &= \frac{1}{\left| B_m \right|} \sum_{i \in B_m} {y}_i \\
\text{conf}(B_m) &= \frac{1}{\left| B_m \right|} \sum_{i \in B_m} \hat{f}(x)_i


The ECE is the combination of these two metrics combined.

.. math::
\text{ECE} = \sum_{m=1}^M \frac{\left| B_m \right|}{n} \left| acc(B_m) - conf(B_m) \right|

In simple terms, once all the different bins from the confidence scores have been created, we check the mean accuracy of each bin.
The absolute mean difference between the two is the ECE. Hence, the lower the ECE, the better the calibration was performed.

**Top-Label ECE**

In the top-label calibration, we only calculate the ECE for the top-label class. Hence, per top-label class, we condition the calculation
of the accuracy and confidence based on the top label and take the average ECE for each top-label.

3. Statistical tests for calibration
------------------------------------

**Kolmogorov-Smirnov test**

Kolmogorov-Smirnov test was derived in [2, 3, 4]. The idea is to consider the cumulative differences between sorted scores :math:`s_i`
and their corresponding labels :math:`y_i` and to compare its properties to that of a standard Brownian motion. Let us consider the
cumulative differences on sorted scores:

.. math::
C_k = \frac{1}{N}\sum_{i=1}^k (s_i - y_i)

We also introduce a typical normalization scale :math:`\sigma`:

.. math::
\sigma = \frac{1}{N}\sqrt{\sum_{i=1}^N s_i(1 - s_i)}

The Kolmogorov-Smirnov statistic is then defined as :

.. math::
G = \max|C_k|/\sigma

It can be shown [2] that, under the null hypothesis of well-calibrated scores, this quantity asymptotically (i.e. when N goes to infinity)
converges to the maximum absolute value of a standard Brownian motion over the unit interval :math:`[0, 1]`. [3, 4] also provide closed-form
formulas for the cumulative distribution function (CDF) of the maximum absolute value of such a standard Brownian motion.
So we state the p-value associated to the statistical test of well calibration as:

.. math::
p = 1 - CDF(G)

**Kuiper test**

Kuiper test was derived in [2, 3, 4] and is very similar to Kolmogorov-Smirnov. This time, the statistic is defined as:

.. math::
H = (\max_k|C_k| - \min_k|C_k|)/\sigma

It can be shown [2] that, under the null hypothesis of well-calibrated scores, this quantity asymptotically (i.e. when N goes to infinity)
converges to the range of a standard Brownian motion over the unit interval :math:`[0, 1]`. [3, 4] also provide closed-form
formulas for the cumulative distribution function (CDF) of the range of such a standard Brownian motion.
So we state the p-value associated to the statistical test of well calibration as:

.. math::
p = 1 - CDF(H)

**Spiegelhalter test**

Spiegelhalter test was derived in [6]. It is based on a decomposition of the Brier score:

.. math::
B = \frac{1}{N}\sum_{i=1}^N(y_i - s_i)^2

where scores are denoted :math:`s_i` and their corresponding labels :math:`y_i`. This can be decomposed in two terms:

.. math::
B = \frac{1}{N}\sum_{i=1}^N(y_i - s_i)(1 - 2s_i) + \frac{1}{N}\sum_{i=1}^N s_i(1 - s_i)

It can be shown that the first term has an expected value of zero under the null hypothesis of well calibration. So we interpret
the second term as the Brier score expected value :math:`E(B)` under the null hypothesis. As for the variance of the Brier score, it can be
computed as:

.. math::
Var(B) = \frac{1}{N^2}\sum_{i=1}^N(1 - 2s_i)^2 s_i(1 - s_i)

So we can build a Z-score as follows:

.. math::
Z = \frac{B - E(B)}{\sqrt{Var(B)}} = \frac{\sum_{i=1}^N(y_i - s_i)(1 - 2s_i)}{\sqrt{\sum_{i=1}^N(1 - 2s_i)^2 s_i(1 - s_i)}}

This statistic follows a normal distribution of cumulative distribution CDF so that we state the associated p-value:

.. math::
p = 1 - CDF(Z)

3. References
-------------
References
----------

[1] Gupta, Chirag, and Aaditya K. Ramdas.
"Top-label calibration and multiclass-to-binary reductions."
Expand All @@ -171,8 +69,7 @@ arXiv preprint arXiv:2202.00100.

[4] D. A. Darling. A. J. F. Siegert.
The First Passage Problem for a Continuous Markov Process.
Ann. Math. Statist. 24 (4) 624 - 639, December,
1953.
Ann. Math. Statist. 24 (4) 624 - 639, December, 1953.

[5] William Feller.
The Asymptotic Distribution of the Range of Sums of
Expand Down
13 changes: 7 additions & 6 deletions doc/theoretical_description_classification.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
.. title:: Theoretical Description : contents
.. title:: Theoretical Description Classification : contents

.. _theoretical_description_classification:

=======================
#######################
Theoretical Description
=======================

#######################

Three methods for multi-class uncertainty quantification have been implemented in MAPIE so far :
LAC (that stands for Least Ambiguous set-valued Classifier) [1], Adaptive Prediction Sets [2, 3] and Top-K [3].
Expand Down Expand Up @@ -141,8 +140,10 @@ Despite the RAPS method having a relatively small set size, its coverage tends t
of the last label in the prediction set. This randomization is done as follows:

- First : define the :math:`V` parameter:

.. math::
V_i = (s_i(X_i, Y_i) - \hat{q}_{1-\alpha}) / \left(\hat{\mu}(X_i)_{\pi_k} + \lambda \mathbb{1} (k > k_{reg})\right)

- Compare each :math:`V_i` to :math:`U \sim` Unif(0, 1)
- If :math:`V_i \leq U`, the last included label is removed, else we keep the prediction set as it is.

Expand Down Expand Up @@ -227,8 +228,8 @@ where :

.. TO BE CONTINUED

5. References
-------------
References
----------

[1] Mauricio Sadinle, Jing Lei, & Larry Wasserman.
"Least Ambiguous Set-Valued Classifiers With Bounded Error Levels."
Expand Down
16 changes: 8 additions & 8 deletions doc/theoretical_description_conformity_scores.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
.. title:: Theoretical Description : contents
.. title:: Theoretical Description Conformity Scores : contents

.. _theoretical_description_conformity_scores:

=============================================
#############################################
Theoretical Description for Conformity Scores
=============================================
#############################################

The :class:`mapie.conformity_scores.ConformityScore` class implements various
methods to compute conformity scores for regression.
Expand All @@ -25,7 +25,7 @@ quantiles will be computed : one on the right side of the distribution
and the other on the left side.

1. The absolute residual score
==============================
------------------------------

The absolute residual score (:class:`mapie.conformity_scores.AbsoluteConformityScore`)
is the simplest and most commonly used conformal score, it translates the error
Expand All @@ -44,7 +44,7 @@ With this score, the intervals of predictions will be constant over the whole da
This score is by default symmetric (*see above for definition*).

2. The gamma score
==================
------------------

The gamma score [2] (:class:`mapie.conformity_scores.GammaConformityScore`) adds a
notion of adaptivity with the normalization of the residuals by the predictions.
Expand All @@ -69,7 +69,7 @@ the order of magnitude of the predictions, implying that this score should be us
in use cases where we want greater uncertainty when the prediction is high.

3. The residual normalized score
=======================================
--------------------------------

The residual normalized score [1] (:class:`mapie.conformity_scores.ResidualNormalisedScore`)
is slightly more complex than the previous scores.
Expand Down Expand Up @@ -97,7 +97,7 @@ it is not proportional to the uncertainty.


Key takeaways
=============
-------------

- The absolute residual score is the basic conformity score and gives constant intervals. It is the one used by default by :class:`mapie.regression.MapieRegressor`.
- The gamma conformity score adds a notion of adaptivity by giving intervals of different sizes
Expand All @@ -107,7 +107,7 @@ Key takeaways
without specific assumptions on the data.

References
==========
----------

[1] Lei, J., G'Sell, M., Rinaldo, A., Tibshirani, R. J., & Wasserman, L. (2018). Distribution-Free
Predictive Inference for Regression. Journal of the American Statistical Association, 113(523), 1094–1111.
Expand Down
Loading
Loading