Conformal Thresholded Intervals (CTI) for Efficient Regression

This repository contains code implementations for the following conformal prediction methods:

This project is built upon the Python toolbox TorchCP.

Conformal Thresholded Intervals (CTI) for Efficient Regression

This paper introduces Conformal Thresholded Intervals (CTI), a novel conformal regression method that aims to produce the smallest possible prediction set with guaranteed coverage.

Key Points

Utilizes multi-output quantile regression to estimate conditional interquantile intervals.
Constructs prediction sets by thresholding these intervals based on their length.
Guarantees marginal coverage and can achieve optimal conditional coverage under certain conditions.
Computationally efficient and avoids estimating full conditional distributions.
Produces potentially non-convex prediction sets that adapt to local data density.

Key Definitions

Interquantile intervals: $I_k(x) = (q_{k-1}(x), q_k(x)] \quad \text{for } k = 1, \dots, K$

Prediction set: $C(x) = \bigcup { I_k(x) \mid \mu(I_k(x)) \leq t, k = 1, \dots, K }$

Where:

$q_k(x)$ = estimated $k/K$ quantile
$t$ = threshold determined on calibration set
$\mu$ = Lebesgue measure

Algorithm Overview

Train a quantile regression model on training data.
Estimate interquantile intervals on calibration and test data.
Determine threshold $t$ using the calibration set.
Construct prediction sets for test points by thresholding intervals.

Theoretical Guarantees

Marginal coverage: $P(Y \in C(X)) \geq 1 - \alpha$
Can achieve optimal conditional coverage and smallest expected prediction set size under certain conditions.

Entropy Reweighted Conformal Classification

This paper proposes Entropy Reweighted Conformal Classification, a novel approach to improve the efficiency of prediction sets in conformal classification.

Key Points

Leverages the uncertainty of the classifier by using entropy-based reweighting.
Applies a reweighting to the logits of the classification model based on the entropy of the predicted probability distribution.
The reweighted logits are defined as: $\tilde{z}_k(X) = \frac{z_k(X)}{H(X) \cdot T}$ where:
- $z_k(X)$ = original logit
- $H(X)$ = entropy of the predicted probability distribution
- $T$ = temperature parameter
The reweighted probabilities are obtained by applying the softmax function to the reweighted logits: $\tilde{f}_k(X) = \frac{\exp(\tilde{z}_k(X))}{\sum_j \exp(\tilde{z}_j(X))}$
These reweighted probabilities are used to compute the conformity scores for conformal prediction.
The temperature parameter $T$ is optimized using cross-validation on a separate validation set.
Aims to improve the efficiency of prediction sets while maintaining the coverage guarantees of conformal prediction.
Experimental results on various datasets (AG News, CAREER, MNIST, Fashion MNIST) demonstrate improved performance in terms of prediction efficiency and accuracy compared to existing techniques.

Key Definitions

Prediction Set:

$$C_A = \{ y_{N+1} \in \mathcal{Y} |\sum_{n \in \mathcal{I}_2} \mathbb{1}(A_n \leq A_{N+1}) \leq n\alpha \} = \{ y_{N+1} \in \mathcal{Y} | A_{N+1} \leq Q_A \}$$

Where:

$A_n = a(f(X_n), Y_n)$, conformity score
$A_{N+1} = a(f(X_{N+1}), y_{N+1})$
$Q_A$ is the $(1-\alpha)$-th sample quantile of $A_1, \dots, A_N$
$n\alpha = \lceil (1-\alpha)(|\mathcal{I}_2| + 1) \rceil$

Adaptive Prediction Sets (APS) Score: $A_n = a(f(X_n), Y_n) = \sum_{i=1}^{r(Y_n, f(X_n)) - 1} f_{(i)}(X_n) + U f_{(r(Y_n, f(X_n)))}(X_n)$

Where:

$f_{(i)}(X_n)$ denotes the $i$-th largest element of the probability vector $f(X_n)$.
$r(Y_n, f(X_n))$ is the rank of the true label $Y_n$ in the probability vector $f(X_n)$.
$U \sim \text{Uniform}(0,1)$ is independent of everything else.

Algorithm Overview

Train a classification model $f$ on training data.
Compute entropy $H(X)$ for each data point.
Apply entropy-based reweighting to logits.
Compute reweighted probabilities.
Calculate conformity scores using reweighted probabilities.
Construct prediction sets using the conformal prediction framework.
Optimize the temperature parameter $T$ using cross-validation.

Theoretical Guarantees

Marginal coverage: $\mathbb{P}(Y_{N+1} \in C(X_{N+1})) \geq 1 - \alpha$
Maintains the validity of conformal prediction while improving efficiency.

This approach integrates accurate uncertainty quantification with the coverage guarantees of conformal prediction, addressing limitations observed in previous attempts to combine confidence calibration with conformal prediction.

Trustworthy Classification through Rank-Based Conformal Prediction Sets

This paper introduces a novel Rank-Based Conformal Prediction method for classification tasks, aiming to construct prediction sets that are both trustworthy and as small as possible while guaranteeing a pre-specified coverage rate.

Key Points

Rank-Based Score Function:
The method utilizes a ranking of the classifier's prediction probabilities to construct conformity scores. This is particularly effective for models that output reasonably ordered but not well-calibrated probabilities.
Prediction Set Construction:
Prediction sets are built by considering the rank order of prediction scores. Instead of directly thresholding raw probabilities, the method determines the minimal set of top-ranked labels required to achieve the desired coverage.
Robust to Miscalibration:
By relying on the ordering of labels rather than calibrated probabilities, the approach remains robust even when the underlying classifier exhibits poor calibration.

Key Definitions

Conformity Score:
For each data point ( (x_i, y_i) ), the conformity score is defined as:

$$c(x_i, y_i) = \text{rank}\left(\pi_{y_i}(x_i)\right) - 1 + \frac{1}{n} \cdot \left[\text{tie-breaking term based on the ranking among calibration scores}\right],$$

where $\pi_{y_i}(x_i)$ is the predicted score (or probability) for the true class.

Rank Threshold $r^*_\alpha$:
The rank threshold is determined as the $\lfloor (n+1)\alpha \rfloor$-th largest rank in the calibration set:

$$r^*_\alpha = \text{the } \lfloor (n+1)\alpha \rfloor\text{-th order statistic among the calibration ranks}.$$

Prediction Set (C_\alpha(x)):
Constructed as:

$$C_\alpha(x) = \begin{cases} \{ \text{labels with top } r^*_\alpha \text{ ranks} \}, & \text{if } \pi_b(r^*_\alpha)(x) \geq \pi^* \\\ \{ \text{labels with top } (r^*_\alpha - 1) \text{ ranks} \}, & \text{otherwise} \end{cases}$$

where $\pi^*$ is a secondary threshold determined from the calibration set.

Theoretical Guarantees

Coverage Guarantee
Expected Set Size Analysis

Reference

@inproceedings{luo2024entropy,
  title={Entropy Reweighted Conformal Classification},
  author={Luo, Rui and Colombo, Nicolo},
  booktitle={The 13th Symposium on Conformal and Probabilistic Prediction with Applications},
  pages={264--276},
  year={2024},
  organization={PMLR}
}

@article{luo2024conformal,
  title={Conformal Thresholded Intervals for Efficient Regression},
  author={Luo, Rui and Zhou, Zhixin},
  journal={arXiv preprint arXiv:2407.14495},
  year={2024}
}

@article{luo2024trustworthy,
  title={Trustworthy classification through rank-based conformal prediction sets},
  author={Luo, Rui and Zhou, Zhixin},
  journal={arXiv preprint arXiv:2407.04407},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 617 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
tests		tests
torchcp.egg-info		torchcp.egg-info
torchcp		torchcp
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
AUTHORS.txt		AUTHORS.txt
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
LICENSE.GPL		LICENSE.GPL
MANIFEST.in		MANIFEST.in
README.md		README.md
__init__.py		__init__.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Conformal Thresholded Intervals (CTI) for Efficient Regression

Key Points

Key Definitions

Algorithm Overview

Theoretical Guarantees

Entropy Reweighted Conformal Classification

Key Points

Key Definitions

Algorithm Overview

Theoretical Guarantees

Trustworthy Classification through Rank-Based Conformal Prediction Sets

Key Points

Key Definitions

Theoretical Guarantees

Reference

About

Releases

Packages

Languages

License

AllenPu/CTI

Folders and files

Latest commit

History

Repository files navigation

Conformal Thresholded Intervals (CTI) for Efficient Regression

Key Points

Key Definitions

Algorithm Overview

Theoretical Guarantees

Entropy Reweighted Conformal Classification

Key Points

Key Definitions

Algorithm Overview

Theoretical Guarantees

Trustworthy Classification through Rank-Based Conformal Prediction Sets

Key Points

Key Definitions

Theoretical Guarantees

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages