Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to deal with forecast rates ≤ 0 in any LL-based test? #226

Open
Tracked by #269
mherrmann3 opened this issue May 21, 2023 · 9 comments
Open
Tracked by #269

How to deal with forecast rates ≤ 0 in any LL-based test? #226

mherrmann3 opened this issue May 21, 2023 · 9 comments

Comments

@mherrmann3
Copy link
Contributor

mherrmann3 commented May 21, 2023

It's not a surprise that zero (or even negative) rates cause issues in LL scores, but I believe pyCSEP does not generally check, deal, or warn in LL-based tests if forecasts contain them (an example: Akinci's HAZGRIDX in the 5-yr Italy experiment).

This affects every grid-based and catalog-based test except the N-test.

I noticed one exception: rates ≤ 0 are silently omitted in binomial_evaluations using masked arrays (numpy.ma.masked_where). But this is not an optimal treatment, because it provides a model the opportunity to cheat and game the system: in areas where it is unsure or expects low seismicity, it simply forecasts 0; if a target event should occur in such bins, it won't count in the testing. Apart that, excluding rates ≤ 0 could trigger a corner case in the T-test when all target event rates are ≤ 0 (see case 3 in #225).

So the better approach is to replace forecast rates ≤ 0 (in a reproducible way, not randomly), e.g., with

  • the minimum among all forecast rates,
  • an order of magnitude below the minimum,
  • the average among the surrounding bins, or
  • ...

It would be convenient to write a separate function for this treatment and use it

  • at the top of .core.poisson_evaluations._poisson_likelihood_test(),
  • at the top of .core.poisson_evaluations._t_test_ndarray(),
  • at the top of .utils.calc._compute_likelihood(), and
  • somewhere in the middle of .core.catalog_evaluations.magnitude_test().
@mjw98
Copy link

mjw98 commented May 21, 2023 via email

@mherrmann3
Copy link
Contributor Author

Currently no Error is thrown when reading or processing unreasonable rates. Case 2 in #225 illustrates what happens in the T-test:

  1. two obscure and generic (Runtime)Warnings are logged, which may go unnoticed;
  2. yes, the IG becomes -inf.

Good idea, Max, to use -inf as an indicator for the evaluators to act on! But outside of elaborate experiment designs, a regular pyCSEP user, if (s)he notices it, may not know what and why it happens, ignore the RuntimeWarnings, and accept the -inf. So perhaps pyCSEP could feature

  • an informative Warning (or Error) message about the actual problem, and
  • an easy way to deal with it having a reasonable default solution - either automatically or manually (e.g., via a flag); this avoids modifying the input data.

Granted, it's a rare case, but it does seem to happen with grid-based forecasts. Catalog-based forecasts with a limited number of simulations and no dedicated background simulation may be even more prone. But since target events should rarely occur in ≤ 0 rate bins, I believe that an -inf result unnecessarily invalidates an otherwise "usable" LL score.

@pabloitu
Copy link
Collaborator

Nice catch. I think the best is to provide a forecast format checker (e.g. prohibit negative mean rate values), and additionally, to raise an additional warning in case the forecast has zero-rate bins. However, the now present Runtime warnings are low-level numpy warnings, and although they are ugly, I don't think they should be silenced or bypassed. Also, agree with -inf should be passed as the given result.

Personally, I wouldn't provide options (or even less defaulting) to "fix" the forecasts. Although can be obvious that they need fixing for negative rates, it is not for zero rated forecasts. The method selection would have an impact on expected results, and perhaps should be addressed by the modeler in their workflow as a modeling decision, not testing.

@wsavran
Copy link
Collaborator

wsavran commented Jun 6, 2023

I agree with Pablo here. Grid based forecasts should not contain zero-rate cells, so we should have a mechanism for checking this condition. If we want to use the forecasts as is, we can implement some type of masking process into forecasts to simply ignore these cells and carry on with the tests like normal.

@mherrmann3
Copy link
Contributor Author

mherrmann3 commented Jun 16, 2023

So let me summarize what we probably agree on to implement:

  • a format checker of grid-based forecasts
  • an informative Warning message if rates ≤ 0 are present (because we expect Poisson rates, which must be positive); perhaps mention already that LL-based scores may become -inf/undefined
  • an informative Warning message if a LL-based score is indeed -inf (due to a target event occurring in a 0-rate bin) instead of RuntimeWarnings; this Warning ...
  • avoid masking of forecast rates ≤ 0 by all means (as this would allow cheating)

Things are a bit different for the Brier score (see #232 for feedback from Francesco on this issue), which is not affected by rates = 0 (not sure about negative rates). So additionally we could do inside the format checker (still have to agree on):

  • clamp rates < 0 to 0 – and inform the user about that operation
    But since this is such a rare case (that I have never seen), we could also just do nothing in such a case and leave it with the Warning message in the format checker – this would be consistent with also doing nothing if rates = 0.

(I can keep those item list and check boxes updated based on further discussions and/or progress 👀)

@pabloitu
Copy link
Collaborator

Thank you Marcus with the summary!

I agree with 1, 2 and 4.
I would leave number 3 as is, because it should be clear already to the modeler with checkbox 2, and we shouldnt suppress numpy warnings.
About brier score, we should implement it without masking.
About 5, we should raise an exception in the format checker (negative rates are non-physical, and should not be accepted by pycsep).

Somethings I have already done in floatcsep, and should be moved to pycsep. Will update soon about this.

@pabloitu
Copy link
Collaborator

Just wanted to mention, given our discussion last meeting, that having a checker for negative rates or zero rates is not trivial because of performance.

We could

  1. we check on initialization, which screws up on-the-fly instantiation of array (crazy useful)
  2. we check on returning the rates (calling the property forecast.data), but this would particularly slow us down when using global forecasts, i.e., everytime when data is needed doing an if check on 200m cells, .

any ideas welcome before entering testing those 2 ideas.

@mherrmann3
Copy link
Contributor Author

mherrmann3 commented Nov 22, 2023

Since we only want to be informative, without modifying the forecast data at all, I believe option (1) would be sufficient: simply implement a format checker that gets called after loading the forecasts. (a check on every T-test execution would be indeed overkill).

I'd place the format checker in a new method _check_rates() or _assert_validity() of class core.forecasts.GriddedForecast. It could get called either:

Likewise, we should add a format checker in core.forecasts.CatalogForecast, which informs if any bin of .expected_rates is zero (cannot be negative, unless provided on initialization).

About

an informative Warning message if a LL-based score is indeed -inf

in my previous comment: this is not a performance issue as it checks only one value, not an array.

@wsavran
Copy link
Collaborator

wsavran commented Nov 22, 2023

I agree with @mherrmann3 here. I think we should have a checker that gets called after loading the forecasts along with a flag that can disable this option in case someone doesn't want to use it. An associated warning could let them know in the output that the results are unchecked.

As far as the catalog forecasts go, a checker will be a lot more heavy weight because you will have to create a gridded forecast from the catalogs in order to do the checking. This should probably be called in .expected_rates() like Marcus said.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

4 participants