enhancement - ideas for increasing generated ensemble sizes #38

abigailsnyder · 2021-10-25T20:06:54Z

Running list of options/ideas:

efficiently update the permutation recipes to think about the staggered-repetitions (near-collapse) that result, in a smart way, so that we can further enrich ensembles.
increasing the target ensemble size doesn't lead to nearly as much of an increase in generated ensemble size as increasing the archive ensemble size, but it could buy a few extra. We could use A) fldgen to generate new annual, gridded tas data with variability. Then calculating GSAT from those will still have a lot of variability. Probably enough to have decently different GSAT trajectories even after 9 year smoothing. B) extract different target GSAT trajectories from the staggered archive, intelligently.
Think about the statistics of multiple draws a little more carefully to see if we can still get something more from doing large numbers of draws, even though draws and ensemble size aren't interchangeable for this setup.
Adjust the permutation function to get more out of the archive we have now, consistently

'reproducible mode': To do that, I had to set a random seed for the specific draw function (python doesn't like global seeds like R) but I didn't play around with what seed. So like for MRI SSP370, we now get the same 3 stitched realizations every time. But with a different seed, we would potentially get 4 or even 5.This is because it still all depends on what the first match we draw from the potential pool is. Because we're setting the seed super carefully now, once we have the first drawn match, everything is deterministic. But the first draw is obviously different with different seeds.
I think this is easiest to explain concretely:
Say we have two target realizations: r1 and r2. And for each of them, we have target windows with matches like this:
r1 year 2025: archive point A, archive point B
r2 year 2025: archive point B
r1 year 2050: archive point C
r2 year 2050: archive point D, archive point E
So because of r2 year 2025 and r1 year 2050, each target realization can support AT MOST one stitched realization
Because of how we order things, the stitched realizations for r1 will happen first.
IF the random seed is such that the match actually drawn for r1 year 2025 is archive point B, then there's no stitched realization possible for r2. If r1 year 2025 happens to draw archive point A, then r2 can get a stitched realization.

So our code has always been written to avoid collapses, but we didn't write it to both avoid collapse AND maximize the generated ensemble.
 Honestly, I'm not even sure how we could write the code in a general way so that when we run into the above situation, r1 year 2025 is always going to draw archive point A.
 I've got the start of an idea for how to maybe do it, somehow identifying the matches for each target realizationXwindow that are unique, draw from those until exhaustion, 
then see about doing the draws from matches shared across target realizationXwindows that are shared. 
But it feels like the kind of thing with a lot of hidden places for mistakes/bugs.

The text was updated successfully, but these errors were encountered:

abigailsnyder · 2022-03-31T13:45:30Z

combine with pattern scaling. For ever stitched GSAT, it could be run through a monthly pattern scaling setup for whatever variablesXmodels have been validated. There should be no collapse with the stitched gridded results because it's unlikely any single real ensemble member in any window is identical to the mean field. Won't give daily. Won't give different kinds of months but it will be an extra set of reasonable scenarios for every draw of generated GSATs. Coherence would be implicit from the coherent training data that the pattern scaling learns from. Also worth thinking about the Hector use case - Hector might give a single very smooth GSAT trajectory, but the stitched GSATs matching it will have variability.

related, I saw a talk on PI3NN (https://arxiv.org/abs/2108.02327, https://www.aimsciences.org/article/doi/10.3934/dcdss.2021138?viewType=html) and there may be something there that can be done with pattern scaling similarly? Where essentially we get the true stitched remix from STITCHES proper (and many draws of that), but we could also get a sort of mean and envelope set of ensemble members for every stitched GSAT?

abigailsnyder added the enhancement New feature or request label Oct 25, 2021

abigailsnyder changed the title ~~enhancement - using staggered year archives~~ enhancement - increasing generated ensemble sizes Mar 18, 2022

abigailsnyder changed the title ~~enhancement - increasing generated ensemble sizes~~ enhancement - ideas for increasing generated ensemble sizes Mar 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enhancement - ideas for increasing generated ensemble sizes #38

enhancement - ideas for increasing generated ensemble sizes #38

abigailsnyder commented Oct 25, 2021 •

edited

Loading

abigailsnyder commented Mar 31, 2022

enhancement - ideas for increasing generated ensemble sizes #38

enhancement - ideas for increasing generated ensemble sizes #38

Comments

abigailsnyder commented Oct 25, 2021 • edited Loading

abigailsnyder commented Mar 31, 2022

abigailsnyder commented Oct 25, 2021 •

edited

Loading