Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update and publish ensemble tide modelling functionality #32

Merged
merged 10 commits into from
Dec 19, 2024
Merged

Conversation

robbibt
Copy link
Member

@robbibt robbibt commented Nov 26, 2024

This PR updates previous undocumented ensemble tide modelling functionality. The ensemble_tides function combines multiple tide models into a single locally optimised ensemble tide model using external model ranking data (e.g. satellite altimetry or NDWI-tide correlations along the coastline) to inform the selection of the best local models.

This function performs the following steps:

  1. Takes a dataframe of tide heights from multiple tide models, as produced by eo_tides.model.model_tides
  2. Loads model ranking points from an external file, filters them based on the valid data percentage, and retains relevant columns
  3. Interpolates the model rankings into the coordinates of the original dataframe using Inverse Weighted Interpolation (IDW)
  4. Uses rankings to combine multiple tide models into a single optimised ensemble model (by default, by taking the mean of the top 3 ranked models)
  5. Returns a new dataFrame with the combined ensemble model predictions

Ensemble tides can be generated by either running the ensemble_tides function directly on the output of model_tides:

from eo_tides.model import model_tides, ensemble_tides

df = model_tides(
    x=155.374,
    y=-1.909,
    time=pd.date_range("2024-11-01", "2024-11-13", freq="1h"),
    model="all",
    directory="/gdata1/data/tide_models_clipped/",
)

ensemble_models = ["EOT20", "FES2012", "FES2014_extrapolated", "FES2022_extrapolated", "GOT4.10", "GOT5.6_extrapolated", "TPXO10-atlas-v2-nc", "TPXO8-atlas-nc", "TPXO9-atlas-v5-nc"]
ensemble_tides(df, crs="EPSG:4326", ensemble_models=ensemble_models)

Or by specifying ensemble as a model in mode_tides:

from eo_tides.model import model_tides

df = model_tides(
    x=155.374,
    y=-1.909,
    time=pd.date_range("2024-11-01", "2024-11-13", freq="1h"),
    model="ensemble",
    directory="/gdata1/data/tide_models_clipped/",
)

Changes

  • Update ensemble code to latest version that includes FES2022, GOT5.6 and TPXO10.
  • Load ranking points from external flatgeobuff format file for faster cloud access
  • Make ensemble model calculation function a top level function (i.e. rename from _ensemble_model to ensemble_tides
  • Make buffer distance applied when cropping model files configurable via the crop_buffer param, with a default of 5 degrees
  • Reorder model_tides params to provide more logical flow and move more common params like mode, output_format and output_units higher

Bug fixes

  • Fix warnings from load_gauge_gesla function

@robbibt
Copy link
Member Author

robbibt commented Nov 26, 2024

Currently ensemble model is not producing expected test results due to .dropna() removing all rankings point that contain missing rankings for any individual tide model (related to tsutterley/pyTMD#366) - need to test whether we have to drop NaN rows at all or whether interpolation will handle them gracefully.

try:
        model_ranks_gdf = (
            gpd.read_file(ranking_points, engine="pyogrio")
            .to_crs(crs)
            .query(f"valid_perc > {ranking_valid_perc}")
            .dropna()[model_ranking_cols + ["geometry"]]
        )

@robbibt robbibt added this to the v1.0.0 milestone Nov 26, 2024
@robbibt robbibt added the enhancement New feature or request label Nov 26, 2024
@robbibt robbibt self-assigned this Nov 26, 2024
@codecov-commenter
Copy link

codecov-commenter commented Dec 13, 2024

Codecov Report

Attention: Patch coverage is 78.94737% with 8 lines in your changes missing coverage. Please review.

Project coverage is 80.6%. Comparing base (2018614) to head (f734d83).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
eo_tides/model.py 79.1% 4 Missing and 1 partial ⚠️
eo_tides/utils.py 71.4% 1 Missing and 1 partial ⚠️
eo_tides/eo.py 0.0% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main     #32     +/-   ##
=======================================
- Coverage   81.3%   80.6%   -0.7%     
=======================================
  Files          6       6             
  Lines        632     646     +14     
  Branches     110     112      +2     
=======================================
+ Hits         514     521      +7     
- Misses        71      76      +5     
- Partials      47      49      +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@robbibt robbibt merged commit a71cd5f into main Dec 19, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Does ensemble code need x, y params when tide_df already has x and y coords?
2 participants