Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement - ideas for increasing generated ensemble sizes #38

Open
abigailsnyder opened this issue Oct 25, 2021 · 1 comment
Open
Labels
enhancement New feature or request

Comments

@abigailsnyder
Copy link
Collaborator

abigailsnyder commented Oct 25, 2021

Running list of options/ideas:

  1. efficiently update the permutation recipes to think about the staggered-repetitions (near-collapse) that result, in a smart way, so that we can further enrich ensembles.
  2. increasing the target ensemble size doesn't lead to nearly as much of an increase in generated ensemble size as increasing the archive ensemble size, but it could buy a few extra. We could use A) fldgen to generate new annual, gridded tas data with variability. Then calculating GSAT from those will still have a lot of variability. Probably enough to have decently different GSAT trajectories even after 9 year smoothing. B) extract different target GSAT trajectories from the staggered archive, intelligently.
  3. Think about the statistics of multiple draws a little more carefully to see if we can still get something more from doing large numbers of draws, even though draws and ensemble size aren't interchangeable for this setup.
  4. Adjust the permutation function to get more out of the archive we have now, consistently
'reproducible mode': To do that, I had to set a random seed for the specific draw function (python doesn't like global seeds like R) but I didn't play around with what seed. So like for MRI SSP370, we now get the same 3 stitched realizations every time. But with a different seed, we would potentially get 4 or even 5.This is because it still all depends on what the first match we draw from the potential pool is. Because we're setting the seed super carefully now, once we have the first drawn match, everything is deterministic. But the first draw is obviously different with different seeds.
I think this is easiest to explain concretely:
Say we have two target realizations: r1 and r2. And for each of them, we have target windows with matches like this:
r1 year 2025: archive point A, archive point B
r2 year 2025: archive point B
r1 year 2050: archive point C
r2 year 2050: archive point D, archive point E
So because of r2 year 2025 and r1 year 2050, each target realization can support AT MOST one stitched realization
Because of how we order things, the stitched realizations for r1 will happen first.
IF the random seed is such that the match actually drawn for r1 year 2025 is archive point B, then there's no stitched realization possible for r2. If r1 year 2025 happens to draw archive point A, then r2 can get a stitched realization.

So our code has always been written to avoid collapses, but we didn't write it to both avoid collapse AND maximize the generated ensemble.
 Honestly, I'm not even sure how we could write the code in a general way so that when we run into the above situation, r1 year 2025 is always going to draw archive point A.
 I've got the start of an idea for how to maybe do it, somehow identifying the matches for each target realizationXwindow that are unique, draw from those until exhaustion, 
then see about doing the draws from matches shared across target realizationXwindows that are shared. 
But it feels like the kind of thing with a lot of hidden places for mistakes/bugs.
@abigailsnyder abigailsnyder added the enhancement New feature or request label Oct 25, 2021
@abigailsnyder abigailsnyder changed the title enhancement - using staggered year archives enhancement - increasing generated ensemble sizes Mar 18, 2022
@abigailsnyder abigailsnyder changed the title enhancement - increasing generated ensemble sizes enhancement - ideas for increasing generated ensemble sizes Mar 18, 2022
@abigailsnyder
Copy link
Collaborator Author

  1. combine with pattern scaling. For ever stitched GSAT, it could be run through a monthly pattern scaling setup for whatever variablesXmodels have been validated. There should be no collapse with the stitched gridded results because it's unlikely any single real ensemble member in any window is identical to the mean field. Won't give daily. Won't give different kinds of months but it will be an extra set of reasonable scenarios for every draw of generated GSATs. Coherence would be implicit from the coherent training data that the pattern scaling learns from. Also worth thinking about the Hector use case - Hector might give a single very smooth GSAT trajectory, but the stitched GSATs matching it will have variability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant