Cached FIM - Part 1a - Initial VPP Workflow Implementation #604

TylerSchrag-NOAA · 2023-12-20T18:50:57Z

This is the second of several PRs to implement the major Cached FIM Workflow enhancement. This PR includes most of the application changes, but is not fully tested with all active pipelines. Subsequent PRs will be submitted based on testing this with all pipelines on TI (some bugs expected), as well as at least several more minor components of the Cached FIM Workflow (notably the full implementation of special FIM configurations like AEP FIM, CatFIM, etc.)

Broadly speaking, the Cached FIM Enhancement utilizes a new AWS Redshift data warehouse DB (setup in part0 of this PR series) to store every HAND synthetic rating curve (hydrotable) step that our pipelines process, with the extent geometry of the upper 1-ft stage value (Note: This means that all produced FIM is now rounded up to the nearest stage ft). On subsequent FIM runs, the Redshift HAND cache is queried before HAND Processing takes place, and cached extent geometries are used when streamflow is within the range of a cached hydrotable step (just like the Ras2FIM steps were implemented by Corey before, although those steps have also been overhauled/generalized as part of this new process).

Initial tests show promising optimizations to both run times and lambda costs, with ~60% reductions to hand processing times, and ~90%+ reductions in hand_processing lambda costs.

General New VPP FIM Workflow
There are some technical limitations on how data is moved back and forth between RDS and Redshift databases, so this workflow is a little messier than ideal.

Create four tables, if they don't already exist, on both RDS and Redshift. These tables replicate the schema of the HAND cache on Redshift, and are truncated and re-populated as part of each FIM run:
a. ingest.{fim_config}_flows - this is a version of max_flows, with fim crosswalk columns added, as well as filtering for hight water threshold
b. ingest.{fim_config} - this is the fim table, but without geometry
c. ingest.{fim_config}_geo - this is the geometries for the fim table (one-to-many, since we're subdividing to keep geometries small for Redshift)
d. ingest.{fim_config}_zero_stage - this table holds all of the fim features (hydro_table, feature_id, huc8, branch combinations) that have zero or NaN stage at the current discharge value
e. ingest.{fim_config}_geo_view (RDS only) - this view subdivides the newly polygons in the inundation_geo table (because Redshift has a limit on the size of geometries)
f. publish.{fim_config} (RDS only) - This is the finished publish table that gets copied to the EGIS service
Populate the FIM flows table on RDS (from max_flows with some joins), then copy it to Redshift
Query the HAND Cache on Redshift
a. Query the HAND cache on Redshift, joining to the just-populated flows table, to populate the inundation, inundation_geo, and inundation_zero_stage tables on Redshift
Populate the inundation tables on RDS
a. Prioritize Ras2FIM by querying the Ras2FIM cache on RDS first #TODO
b. Copy the FIM tables on Redshift (which were just populated from the HAND cache in 2a) into the inundation tables on RDS (skipping any records that were already added from Ras2FIM)
c. HAND processing for any FIM features remaining in the inundation flows table, that have not been added to the inundation table from Ras2FIM or the HAND cache (not done here, but administered by the fim_data_prep lambda function
Generate publish.inundation table on RDS, and copy it to the EGIS (done via the update_egis_data function)
a. We can use a template to do this generically for most inland inundation configurations (e.g. NWM)
Add any newly generated HAND features in this run into the Redshift HAND cache ( #TODO: it would be good to figure out how to do this in parallel outside of the fim_config map, so that this doesn't hold things up).
a. Insert records from the RDS inundation, inundation_geo, and inundation_zero_stage tables/view into the Redshift HAND cache tables, only taking records generated by HAND Processing, and which the primary key does not already exist (hydro_id, feature_id, huc8, branch, rc_stage_ft)

Changes to Specific Components
This PR contains significant updates to the FIM Config steps of the VPP,:

viz_fim_data_prep lambda function - Refactor / simplification to just lookup features for hand processing and write huc processing group cvs to S3 (which are used by the fim processing step function to delegate hand processing jobs). Logic to get flows, and Ras2FIM caching template sql have been moved to the generalized postprocess_sql lambda function:
viz_postprocess_sql lambda function
- Now contains a fim_caching_templates folder that executes various fim workflow steps mentioned above.
- A fim_flows folder for getting flows for special fim configurations has also been added (this used to be in the data_sql folder of fim_data_prep lambda function).
- Lambda function logic has been tweaked to allow for a list of sql statements to be executed in a single lambda invocation.
- Lambda function logic has been tweaked to allow for new sql_templates_to_run parameter specified in the step function definition, which is used similaraly / in combination with the step parameter (more could be done here to optimize / abstract).
- Also added a new optional check_dependencies parameter that can be specified by the step function, which will not use the check_required_tables_updated function if set to false (this is needed on several of the fim steps due to intentionally empty tables).
- Abstraction of named discharge columns in max_flows tables
- Dependent changes to product / summary sql files for all various changes listed above.
viz_fim_hand_processing lambda function
- Minor changes to track/upload new columns required for cached fim (rc_previous_discharge_ft), for example.
- Function now tracks zero-stage reaches / reaches that can't do a valid stage lookup, and uploads those to the fim_zero_stage table (these were previously just skipped altogether).
viz_initialize_pipeline lambda function - Updates to product configs to support new FIM workflows.
viz pipeline step function - New FIM configs workflow changes
Other minor bug fixes and enhancements (notably some improvements to db connection handling in several spots).

TODOs / Roadmap
I plan to take the following steps after this is deployed to TI, likely through 2-3 subsequent PRs as part of this series:

Fully implement special FIM configuration (AEP FIM, CatFIM, etc.)
Test all pipeline configurations / fix any bugs
Evaluate / document potential future optimizations
Plan for / document deployment strategy related to FIM hand cache (I need to create the cache tables on deployment, or within the lambda function... and we need to wipe the cache whenever we update HAND FIM versions, which can be done by truncating the cache tables on Redshift)
Plan for / implement historic data request functionality.

…nnections.

…needs some abstraction)

…ds some abstraction)

…isting the hour.

…function-defined sql execution for cached fim operations

…ws_table flags for cached_fim operations.

…nction

…_data_prep (was data_sql folder). Yet to be implemented.

…all product-sepecific sql files - these will now be integrated with postprocess_sql function.

…ving special fim config options for now (will add back in later)

… to clean up, I'm getting there)

shawncrawley

Leaves me speechless....

TylerSchrag-NOAA · 2024-01-04T21:33:56Z

Core/LAMBDA/layers/main.tf

This is wrong.

TylerSchrag-NOAA · 2024-01-04T22:13:53Z

...nctions/viz_db_postprocess_sql/fim_caching_templates/1a_rds_build_inundation_flows_table.sql

Missed domain variable for recurrence flows

TylerSchrag-NOAA · 2024-01-05T18:38:38Z

Core/LAMBDA/viz_functions/viz_fim_data_prep/templates_sql/hand_features.sql

Update WHERE clause to use prc_status and remove joins

TylerSchrag-NOAA added 21 commits November 21, 2023 10:21

Lambda Shared Funcs - Update to Database class to support Redshift co…

8c74350

…nnections.

Updates to postprocess_sql to support new fim caching queries (still …

bcae929

…needs some abstraction)

FIM data prep lambda function - Updates so support caching (still nee…

4c8db8b

…ds some abstraction)

HAND FIM Processsing Lambda - Updates to support new caching schemas.

74fb4aa

BIG Update - Making columns in max flows tables generic, instead of l…

039b142

…isting the hour.

Additional Max Flows Changes

65e7d39

Lambda Shared Funcs Layer - Case fix to support Redshift connections

090bf19

More viz_db_postprocess_sql logic changes / clean up to support step …

9a87271

…function-defined sql execution for cached fim operations

fim data prep - more tweaks / clean up for cached fim.

2377211

Fixes to new generalized max_flows field name.

4869d13

viz initialize pipeline - inundation product configs - new domain flo…

29fd4a2

…ws_table flags for cached_fim operations.

Adding connection.close on db connectios from viz_db_ingest lambda fu…

ed91adf

…nction

viz_postprocess_sql fixes, including moving fim_flows folder from fim…

2f3d740

…_data_prep (was data_sql folder). Yet to be implemented.

Big refactor of fim_data_prep lambda function, including deletion of …

8b6cf62

…all product-sepecific sql files - these will now be integrated with postprocess_sql function.

Big refactor / simplification to fim_data_prep lambda function / remo…

e8095fd

…ving special fim config options for now (will add back in later)

Some tweaks to postprocess_sql lambda function (still needs some work…

87f104a

… to clean up, I'm getting there)

Bug/typo fix to the lambda layers output.

50a412c

Postprocess SQL Lambda clean-up and documentation for new FIM workflows

c3cfcd1

Bug fix: Properly cache reaches that have no valid rating curve step.

ca3308e

Adding some extra documentation to cached fim template sql files.

015c2fc

Viz Pipeline Step Function Logic Update

31a6a1c

TylerSchrag-NOAA requested review from shawncrawley and nickchadwick-noaa December 20, 2023 19:19

TylerSchrag-NOAA assigned nickchadwick-noaa and TylerSchrag-NOAA Dec 20, 2023

TylerSchrag-NOAA added the enhancement New feature or request label Dec 20, 2023

TylerSchrag-NOAA removed their assignment Dec 20, 2023

TylerSchrag-NOAA added this to the V2.1.5 milestone Dec 20, 2023

shawncrawley approved these changes Dec 21, 2023

View reviewed changes

nickchadwick-noaa merged commit de4f3ac into ti Dec 22, 2023
1 check passed

nickchadwick-noaa deleted the cached_fim_part1 branch December 22, 2023 16:26

TylerSchrag-NOAA commented Jan 4, 2024

View reviewed changes

Core/LAMBDA/layers/main.tf

Copy link

Contributor Author

TylerSchrag-NOAA Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong.

TylerSchrag-NOAA commented Jan 4, 2024

View reviewed changes

TylerSchrag-NOAA commented Jan 5, 2024

View reviewed changes

nickchadwick-noaa modified the milestones: V2.1.5, V2.1.6 Feb 23, 2024

nickchadwick-noaa assigned TylerSchrag-NOAA and unassigned nickchadwick-noaa Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cached FIM - Part 1a - Initial VPP Workflow Implementation #604

Cached FIM - Part 1a - Initial VPP Workflow Implementation #604

TylerSchrag-NOAA commented Dec 20, 2023 •

edited

Loading

shawncrawley left a comment

TylerSchrag-NOAA Jan 4, 2024

TylerSchrag-NOAA Jan 4, 2024

TylerSchrag-NOAA Jan 5, 2024

Cached FIM - Part 1a - Initial VPP Workflow Implementation #604

Cached FIM - Part 1a - Initial VPP Workflow Implementation #604

Conversation

TylerSchrag-NOAA commented Dec 20, 2023 • edited Loading

shawncrawley left a comment

Choose a reason for hiding this comment

TylerSchrag-NOAA Jan 4, 2024

Choose a reason for hiding this comment

TylerSchrag-NOAA Jan 4, 2024

Choose a reason for hiding this comment

TylerSchrag-NOAA Jan 5, 2024

Choose a reason for hiding this comment

TylerSchrag-NOAA commented Dec 20, 2023 •

edited

Loading