Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

True Positive and Negative rates in multiclosure weighted fits #2258

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

comane
Copy link
Member

@comane comane commented Jan 17, 2025

Module for computation of True Positive and Negative rates for flagging a dataset as inconsistent.

When the fit is weighted TPR and TNR are computed taking into account whether adding the weight has deteriorated the overall fit quality.

  • Add arXiv reference once paper is out
  • notation of various sets should be consistent with the one used in the paper (currently S2 <-> S3)
  • polish plotting functions

@comane comane changed the title [WIP] True Positive and Negative rates in multiclosure weighted fits True Positive and Negative rates in multiclosure weighted fits Jan 18, 2025
@comane comane force-pushed the nsigma_multiclosure_tpr_tnr branch from a3aed71 to a569fd6 Compare January 23, 2025 11:39
@comane comane force-pushed the nsigma_multiclosure_tpr_tnr branch from 93b23f0 to b46f020 Compare March 26, 2025 15:12
Copy link
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is mostly ok. The only thing is that, if possible, I'd try to trim down the helpers since some of the functionality is already in results.py (or, the one that is not, would be better suited as enhancements there).

Comment on lines +21 to +23
"""
Quantile range for computing the true positive rate and true negative rate.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
Quantile range for computing the true positive rate and true negative rate.
"""
# Quantile range for computing the true positive rate and true negative rate.

I think sphinx is be able to pick up docstrings for variables but they need to be below (see here). I don't think it is needed, but in any case I would either make them into comments or put them below (for this and all others)

import dataclasses
import pandas as pd
import numpy as np
import logging
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure you run pre-commit. To make sure that the hooks are applied to these files you can just add an empty line to these files and then

git add -u
pre-commit

to make sure pre-commit sees them as modified.

@@ -13,3 +13,6 @@
from validphys.closuretest.multiclosure_preprocessing import *
from validphys.closuretest.multiclosure_pseudodata import *
from validphys.closuretest.inconsistent_closuretest.multiclosure_inconsistent_output import *
from validphys.closuretest.multiclosure_nsigma_helpers import *
from validphys.closuretest.multiclosure_nsigma import *
from validphys.closuretest.multiclosure_nsigma_output import *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this? It would be great to only import the functions that are actually needed. We have to live with those that are already there, but right now if I import closuretest and try to autocomplete I get 301 possibilities, some of them are not really useful (like closuretest.np for numpy...)

@@ -0,0 +1,308 @@
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
r"""

Otherwise you need to escape the \S



def set_2(dataspecs_nsigma_alpha: list) -> dict:
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
r"""


@property
def reduced(self):
return self.value / self.ndata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already Chi2Data in results... I wonder whether it'd be better to extend that one? Or maybe use a different name here?

The difference is that this one holds the dataset only while the other holds a whole result object.

(in any case docstr needed)

Comment on lines +32 to +46
def central_predictions(dataset: DataSetSpec, pdf: PDF) -> pd.DataFrame:
"""
Computes the central prediction (central PDF member) for a dataset.

Parameters
----------
dataset: validphys.core.DataSetSpec
pdf: validphys.core.PDF

Returns
-------
pd.DataFrame
index is datapoints, column is the central prediction.
"""
return convolution.central_predictions(dataset, pdf)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def central_predictions(dataset: DataSetSpec, pdf: PDF) -> pd.DataFrame:
"""
Computes the central prediction (central PDF member) for a dataset.
Parameters
----------
dataset: validphys.core.DataSetSpec
pdf: validphys.core.PDF
Returns
-------
pd.DataFrame
index is datapoints, column is the central prediction.
"""
return convolution.central_predictions(dataset, pdf)
from convolution import central_predictions

would be the same. You can add the type hints there if you need them.


value = calc_chi2(sqrt_covmat, diff)
ndata = len(central_predictions)
return CentralChi2Data(value=value, ndata=ndata, dataset=dataset)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test it because I'm not 100% sure (maybe you are doing something in the multiclosure that will break this) but in principle this function could just depend on abs_chi2_data which would automatically get the predictions, the data and the compute the chi2.


Returns
-------
str or None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns True/False

title: Inconsistency classification probabilities
author: Lazy Person
keywords: [nsigma, chi2, multiclosure test, inconsistent]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add perhaps a comment saying that the commented dataset are those to reproduce the paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants