-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added log-logistic/fisk distribution to SPEI #440
base: master
Are you sure you want to change the base?
Conversation
I was wondering as I was making changes to the docs whether they should not include the possibility of pre-computing SPEI distribution fitting variables. This seems possible in the code to me. |
Yes, I agree, we have done this already for SPI, there is no reason we couldn't do something very similar for SPEI. Good idea! |
Hmmm, the tests that fail are for SPI and Pearson, are they related to things I've changed in the branch? |
Yes, almost certainly. This is why the tests are in place, to shield from introducing changes that will give different results from before. In this case, however, we're expecting to see different values since we're using a new distribution, so we should come up with an entirely new test result. For example, we expect certain results when using the Pearson fitting and other results when using gamma. We now need to come up with an expected result when we use this new fitting, so when the next person changes some code we'll have a test in place to make sure they haven't broken things for this distribution fitting. So for this new distribution, we need to have a separate collection of tests that exercise the new code and produce a result we can feel confident about. Once this is in place then all new code must pass the test, otherwise, we'll know we're introducing a significant/breaking change. With all that being said I'm not certain that the code you've introduced here is actually breaking things. These checks went off the rails a year or so ago and I just haven't had time to do the requisite house cleaning to get us back on track. I expect what's up here is that you may have broken some things and others were already broken. I can run these tests locally to debug but it will take time, my day job is demanding. I totally appreciate your help with this package, don't forget that many (usually invisible) others will benefit! |
No, @Emmadd I don't think your new code has caused any of the existing tests to fail. Several of them were broken before you started this work, and I have not found the time to maintain the package to get that back in order. Your new code looks legit (very nice work, thanks for all the comments!) and my only recommendation at this point is to add a few tests into the test suite for this new distribution fitting. For example, in tests/test_compute.py we have test_gamma_parameters and test_pearson_parameters so now we should add a test_fisk_parameters function, etc. Once those are in place and working as expected then I will approve this merge. Hopefully, I will soon find the motivational impetus to get the rest of our tests back in shape and this project back on the rails! |
There was a fix merged today (#450) which may fix the failing tests here. I'm not sure how to get the checks to run again (the instructions in GitHub help pages didn't include the same buttons as I see in the web UI etc.), I think it can't happen since the original run of the checks happened over 30 days ago. Maybe you can pull from master to get the latest changes, then make a commit of that to this fork branch, which would then trigger a new check? |
You can merge master back into your branch. Then push your branch back up. That will retrigger tests. Turns out the science lead on my project is also interested in this particular distribution, so I'll let you know if he has any feedback. I should double-check the implementation here because it borrows from the pearson III code which I had to fix. As for the option to save out the fitted params for SPEI, stay tuned... I have a draft branch with that feature for pearson3 and gamma. |
Oh, before merging master back into your branch, you should fetch master from this repo. Let me know if you have questions! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there is a lot of boilerplate here. Would be interesting to find out if a bunch of this can still get simplified.
Not sure, but I don't think any of these changes actually adds support in the main scripts, either. And I do think unit tests should get added. It is what caught the issues with Pearson III.
# minimums_possible = _minimum_possible(c, loc, scale) | ||
# minimums_mask = values <= minimums_possible | ||
zero_mask = np.logical_and((values < 0.0005), (probabilities_of_zero > 0.0)) | ||
trace_mask = np.logical_and((values < 0.0005), (probabilities_of_zero <= 0.0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, should these go away? Furthermore, if probabilities_of_zero isn't used, it should get taken out of the call signature here, and the fisk_params()
should be updated to not return it.
month or day of the year | ||
:param locs: pre-computed loc values for each month or day of the year | ||
:param scales: pre-computed scale values for each month or day of the year | ||
:param skews: pre-computed skew values for each month or day of the year |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No skews
argument, and missing documentation for c
. Also, I am not a fan of single letter variable names. Is this c
parameter known by any other names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, meaningful variable names are preferred.
probabilities_of_zero = fitting_params["probabilities_of_zero"] | ||
locs = fitting_params["locs"] | ||
scales = fitting_params["scales"] | ||
c = fitting_params["c"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to conflict a bit with #453.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should/will normalize.
@@ -31,7 +31,7 @@ The following indices are provided: | |||
- `SPI <https://climatedataguide.ucar.edu/climate-data/standardized-precipitation-index-spi>`__, | |||
Standardized Precipitation Index, utilizing both gamma and Pearson Type III distributions | |||
- `SPEI <https://www.researchgate.net/publication/252361460_The_Standardized_Precipitation-Evapotranspiration_Index_SPEI_a_multiscalar_drought_index>`__, | |||
Standardized Precipitation Evapotranspiration Index, utilizing both gamma and Pearson Type III distributions | |||
Standardized Precipitation Evapotranspiration Index, utilizing both gamma, log-logistic (Fisk) and Pearson Type III distributions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing Oxford comma here and below.
# get the fisk parameters for this time | ||
# step's values within the calibration period | ||
c[time_step_index], scales[time_step_index], locs[time_step_index], = \ | ||
scipy.stats.fisk.fit(time_step_values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had a thought about this last night. How good is this fit that uses Maximum Likelihood Estimation? For pearson3, the L-moments method was used (which needed to be implemented because scipy stats doesn't have it as an option), and for gamma, the Maximum Likelihood Estimation was used, but via the Thom method, which also isn't available through scipy.stats. I have no experience in judging various parameter estimations, but we probably should have some sort of justification for which approach is used. I think having an existing, well-cited paper would be a good standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally wanted to add this since the Spanish authors who are most famous for SPEI used this distribution for much of their work. They have provided an implementation in R. The methodology is nicely described here (page 3)
Here is an interesting goodness of fit analysis
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love papers like this! Very clearly lays out the algorithm, both in words and in equations, and also provides names for some of the steps they use. It is clear from this paper that an L-moments method is used (but seems different from the L-moments used for pearson III). The formulation is also different enough to give me pause, but that just might be that the scipy documentation is showing a standardized form while the paper isn't. It is going to take a bit of algebra to cross-check the paper's L-moments with the parameters used in scipy's implementation.
On a science note, I am a bit confused by the "non-negative" aspect of this distribution, given that Pt - PET can be negative, but I guess that is what the "loc" (in scipy) or "gamma" (in paper) can account for?
This PR also got me thinking about my work-in-progress adding SPEI support to the "spi" script. Right now, the spi script pre-allocates fitting arrays and stores them in a dictionary keyed by their parameter name. With the two current distributions, the fitting parameter names do not collide with each other. However, it seems like this distribution would have colliding parameter names. I'll think on this a bit, but an elegant solution may require a significant overhaul of the "spi" script, and possibly of the other main script, too (which isn't a bad thing!). In addition, we should also consider updating the API and CLI so that one can select which distributions to compute for in order to keep computational and memory costs down. |
SonarCloud Quality Gate failed. 0 Bugs No Coverage information Catch issues before they fail your Quality Gate with our IDE extension SonarLint |
My first attempt to include the Fisk distribution into SPEI as suggested in #106, copying as mush as possible from the PearsonIII implementation. The last @numba.jit leads to problems, but code runs fine without. I have not created any tests or stuff yet. Hope someone can have a look :-) let me know what you think and what should be done.