-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cached FIM - Part 1a - Initial VPP Workflow Implementation #604
Conversation
…needs some abstraction)
…ds some abstraction)
…function-defined sql execution for cached fim operations
…ws_table flags for cached_fim operations.
…_data_prep (was data_sql folder). Yet to be implemented.
…all product-sepecific sql files - these will now be integrated with postprocess_sql function.
…ving special fim config options for now (will add back in later)
… to clean up, I'm getting there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaves me speechless....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed domain variable for recurrence flows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update WHERE clause to use prc_status and remove joins
This is the second of several PRs to implement the major Cached FIM Workflow enhancement. This PR includes most of the application changes, but is not fully tested with all active pipelines. Subsequent PRs will be submitted based on testing this with all pipelines on TI (some bugs expected), as well as at least several more minor components of the Cached FIM Workflow (notably the full implementation of special FIM configurations like AEP FIM, CatFIM, etc.)
Broadly speaking, the Cached FIM Enhancement utilizes a new AWS Redshift data warehouse DB (setup in part0 of this PR series) to store every HAND synthetic rating curve (hydrotable) step that our pipelines process, with the extent geometry of the upper 1-ft stage value (Note: This means that all produced FIM is now rounded up to the nearest stage ft). On subsequent FIM runs, the Redshift HAND cache is queried before HAND Processing takes place, and cached extent geometries are used when streamflow is within the range of a cached hydrotable step (just like the Ras2FIM steps were implemented by Corey before, although those steps have also been overhauled/generalized as part of this new process).
Initial tests show promising optimizations to both run times and lambda costs, with ~60% reductions to hand processing times, and ~90%+ reductions in hand_processing lambda costs.
General New VPP FIM Workflow
There are some technical limitations on how data is moved back and forth between RDS and Redshift databases, so this workflow is a little messier than ideal.
a. ingest.{fim_config}_flows - this is a version of max_flows, with fim crosswalk columns added, as well as filtering for hight water threshold
b. ingest.{fim_config} - this is the fim table, but without geometry
c. ingest.{fim_config}_geo - this is the geometries for the fim table (one-to-many, since we're subdividing to keep geometries small for Redshift)
d. ingest.{fim_config}_zero_stage - this table holds all of the fim features (hydro_table, feature_id, huc8, branch combinations) that have zero or NaN stage at the current discharge value
e. ingest.{fim_config}_geo_view (RDS only) - this view subdivides the newly polygons in the inundation_geo table (because Redshift has a limit on the size of geometries)
f. publish.{fim_config} (RDS only) - This is the finished publish table that gets copied to the EGIS service
a. Query the HAND cache on Redshift, joining to the just-populated flows table, to populate the inundation, inundation_geo, and inundation_zero_stage tables on Redshift
a. Prioritize Ras2FIM by querying the Ras2FIM cache on RDS first #TODO
b. Copy the FIM tables on Redshift (which were just populated from the HAND cache in 2a) into the inundation tables on RDS (skipping any records that were already added from Ras2FIM)
c. HAND processing for any FIM features remaining in the inundation flows table, that have not been added to the inundation table from Ras2FIM or the HAND cache (not done here, but administered by the fim_data_prep lambda function
a. We can use a template to do this generically for most inland inundation configurations (e.g. NWM)
a. Insert records from the RDS inundation, inundation_geo, and inundation_zero_stage tables/view into the Redshift HAND cache tables, only taking records generated by HAND Processing, and which the primary key does not already exist (hydro_id, feature_id, huc8, branch, rc_stage_ft)
Changes to Specific Components
This PR contains significant updates to the FIM Config steps of the VPP,:
TODOs / Roadmap
I plan to take the following steps after this is deployed to TI, likely through 2-3 subsequent PRs as part of this series: