Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EvaluationResult of T-test (and W-test) do not contain the original percentile/alpha value #262

Open
pabloitu opened this issue Aug 19, 2024 · 1 comment · May be fixed by #263
Open

EvaluationResult of T-test (and W-test) do not contain the original percentile/alpha value #262

pabloitu opened this issue Aug 19, 2024 · 1 comment · May be fixed by #263
Assignees

Comments

@pabloitu
Copy link
Collaborator

pabloitu commented Aug 19, 2024

The t-test requires an alpha value to create a confidence interval (e.g., 5%)

def paired_t_test(forecast, benchmark_forecast, observed_catalog,
alpha=0.05, scale=False):
from which information-gain bounds and 2-type error are return inside an EvaluationResult. However, this alpha value is then forgotten, which cause the EvaluationResult plotting to require recalling the original value of alpha with which the t-test was carried out.
percentile = plot_args.get('percentile', 95)

Not sure if creating a new attribute alpha of the resulting EvaluationResult

result = EvaluationResult()
result.name = 'Paired T-Test'
result.test_distribution = (out['ig_lower'], out['ig_upper'])
result.observed_statistic = out['information_gain']
result.quantile = (out['t_statistic'], out['t_critical'])
result.sim_name = (forecast.name, benchmark_forecast.name)
result.obs_name = observed_catalog.name
result.status = 'normal'
result.min_mw = numpy.min(forecast.magnitudes)

or to redefine the attributes of the t-test. For instance, shouldnt result.quantile, instead of result.test_distribution, contain actually the information_gain lower and upper bounds?

Also, the W-test confidence interval is calculated inside the plotting functions, instead of the evaluation function itself.

@pabloitu pabloitu changed the title EvaluationResult of T-test do not contain the original percentile/alpha value EvaluationResult of T-test (and W-test) do not contain the original percentile/alpha value Aug 19, 2024
@pabloitu pabloitu mentioned this issue Aug 19, 2024
16 tasks
@pabloitu pabloitu self-assigned this Aug 20, 2024
@pabloitu
Copy link
Collaborator Author

addressed in #263, commit 7306329, where added an extra value to EvaluationResult().quantile that stores the type1-error alpha value. Now, the alpha value can be written in the plotting legend to explain what the symbols/colors in the t-test plot mean.

Currently, the t-test EValuationResult() is defined as:

  • test_distribution = (lower_IG_bound, upper_IG_bound)
  • observation_statistic = Mean_IG (i.e., just the difference between the forecasts logscores)
  • quantiles = (T_statistic, T_critical, alpha (recently added))

but the values doesn't feel so in place. The dof value is also lost. Which would involve to do some crazy acrobatics if a different confidence interval is desired, or re-run the entire test. This is different with consistency test, where the confidence interval is defined at the Plot level.

I wonder if the attributes of the resulting EvaluationResult should be re-defined for the t-test as:

test_distribution: the actual t-distribution, with the 3 parameters of the location-scaled dist: e.g (meanIG, stdIG, dof).
observation_statistic: 0, since we are testing if LogScores are substantially different, i.e., IG=0
quantile: % mass of test_distribution below 0.

In this way, the comparison test results are analogous to consistency test. A test_distribution, similar to Poisson/NegBinom distribution. A quantile value, that can be immediately looked if below (or above) of a confidence level.

Ideas? Or should we keep it like this?. @mherrmann3 @wsavran @bayonato89 @Serra314

@pabloitu pabloitu linked a pull request Aug 27, 2024 that will close this issue
10 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant