Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for reading, defining and outputting CF-compliant projection info #38

Open
wants to merge 33 commits into
base: main
Choose a base branch
from

Conversation

observingClouds
Copy link
Contributor

@observingClouds observingClouds commented Nov 13, 2024

eventually fixes #33

Main changes:

  • add optional projections attribute to dataset input section (see example.danra.yaml)
  • projections section will accept crs_wkt, single-value projection attributes or both, with crs_wkt being prioritized
  • add tests to ensure projections are identical across datasets and variables (only one common projection will be supported for now)
  • projection information written to output dataset

Off-topic-changes (though required for better testing):

  • adding doctests to CI (this allows to run examples given in docstrings as tests)
  • change the VALID_EXAMPLE_CONFIG_YAML to point to online datasets and no longer use non-implemented flatten-method

Questions:

  • does it make sense to give any of the output variables a grid_mapping attribute, e.g. static and others which depend on the grid_index which can map to x and y? Is there a standard for this?

created with
```
globe=ccrs.Globe(semimajor_axis=6367470,semiminor_axis=6367470)
4:41
ccrs.LambertConformal(central_longitude=proj_inf[‘LoV’]/1e6,standard_parallels=(proj_inf[‘Latin1’]/1e6,proj_inf[‘Latin2’]/1e6), central_latitude=((proj_inf[‘Latin1’]/1e6+proj_inf[‘Latin2’]/1e6))/2 ,false_easting=0, false_northing=0, globe=globe).to_cf()
```
@observingClouds observingClouds force-pushed the projection_support_stage1 branch 3 times, most recently from c8d6ed0 to 5946210 Compare November 14, 2024 08:34
@observingClouds observingClouds force-pushed the projection_support_stage1 branch from 5946210 to db737ab Compare November 14, 2024 08:36
Comment on lines +209 to +211
# TODO: generalize the retrieval of x and y coords
# coords = (dataarrays_by_target[target_output_var][0]['x'], dataarrays_by_target[target_output_var][0]['y'])
# lon, lat = get_latitude_longitude_from_projection(projection, coords)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ealerskans this would be my current proposal for you on where and how you could generate the lats and lons from x and y.

This is obviously per projection in each dataset and because we only accept one projection for now, you might just skip this step for all following datasets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the function is available from from .ops.projection

Copy link

@SimonKamuk SimonKamuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Only some small comments from me :)

CHANGELOG.md Outdated Show resolved Hide resolved
example.danra.yaml Outdated Show resolved Hide resolved
mllam_data_prep/create_dataset.py Outdated Show resolved Hide resolved
mllam_data_prep/ops/projection.py Outdated Show resolved Hide resolved
@leifdenby leifdenby added this to the v0.6.0 (proposed) milestone Nov 20, 2024
@leifdenby
Copy link
Member

@observingClouds I just had a quick read of this again and I think maybe we should sit down and talk it through. There are somethings you have left "todo", which might be because you're missing something from me. So I will try and catch you before the holidays :)

@leifdenby
Copy link
Member

leifdenby commented Dec 10, 2024

Ok, thanks for the chat @observingClouds!

Here's what we decided (please let me know if I missed something / wrote it incorrectly):

  • check that the parsed projection (i.e. the produced pyproj.CRS object) has the bbox attribute defined. The bounding box is required if the resulting projection object is to be used by cartopy for plotting (otherwise cartopy cannot use it). Issue a warning if the user hasn't included the bbox (we can explicitly raise an exception in neural-lam if this is missing and tell the user to add it).
  • include only crs_wkt string to define projection in config. This means that all other projection definition components should be removed. In the CF-conventions the crs_wkt string takes precedence anyway and is the only way to define a bbox (which is needed to create a projection for plotting with cartopy, see above). To simplify the implementation here we only use the crs_wkt string.
  • Implement config format which a) does not use a __common__ section (as suggested in Introduce common input section that will be applied to all inputs #31), we decided it was better to be explicit about the projection definition rather than implicit. This means that the user must copy the same projection to all input datasets in the config, b) names the projection explicitly (here called danra_projection) which follows the CF-compliant approach where projections are explicitly named, c) does not require that the named projection to be used is explicitly named for each variable selected (we can add that in future, but for now we will implement that one and only one projection per input dataset may be given, and that is assigned to each variable):
inputs:
  danra_height_levels:
    path: https://mllam-test-data.s3.eu-north-1.amazonaws.com/height_levels.zarr
    dims: [time, x, y, altitude]
    variables:
      u:
        altitude:
          values: [100,]
          units: m
      v:
        altitude:
          values: [100, ]
          units: m
    projections:
      danra_projection:
        crs_wkt: 'PROJCRS["DMI HARMONIE DANRA lambert projection", BASEGEOGCRS["DMI HARMONIE DANRA lambert CRS", DATUM["DMI HARMONIE DANRA lambert datum", ELLIPSOID["Sphere", 6367470, 0, LENGTHUNIT["metre", 1, ID["EPSG", 9001]]]], PRIMEM["Greenwich", 0, ANGLEUNIT["degree", 0.0174532925199433, ID["EPSG", 8901]]], ID["EPSG",4035]], CONVERSION["Lambert Conic Conformal (2SP)", METHOD["Lambert Conic Conformal (2SP)", ID["EPSG", 9802]], PARAMETER["Latitude of false origin", 56.7, ANGLEUNIT["degree", 0.0174532925199433, ID["EPSG", 8821]]], PARAMETER["Longitude of false origin", 25, ANGLEUNIT["degree", 0.0174532925199433, ID["EPSG", 8822]]], PARAMETER["Latitude of 1st standard parallel", 56.7, ANGLEUNIT["degree", 0.0174532925199433, ID["EPSG", 8823]]], PARAMETER["Latitude of 2nd standard parallel", 56.7, ANGLEUNIT["degree", 0.0174532925199433, ID["EPSG", 8824]]], PARAMETER["Easting at false origin", 0, LENGTHUNIT["metre", 1, ID["EPSG", 8826]]], PARAMETER["Northing at false origin", 0, LENGTHUNIT["metre", 1, ID["EPSG", 8827]]]], CS[Cartesian, 2], AXIS["(E)", east, ORDER[1], LENGTHUNIT["metre", 1, ID["EPSG", 9001]]], AXIS["(N)", north, ORDER[2], LENGTHUNIT["metre", 1, ID["EPSG", 9001]]], USAGE[AREA["Denmark and surrounding regions"], BBOX[47, -3, 65, 25], SCOPE["Danra reanalysis projection"]]]'
        dims: [x, y]

    dim_mapping:
      time:
        method: rename
        dim: time
      state_feature:
        method: stack_variables_by_var_name
        dims: [altitude]
        name_format: "{var_name}{altitude}m"
    target_output_variable: state
  • implement assertion (and raise exception if not met) that all input datasets defined in the config have the same projection set
  • add setting of CF-compliant projection info on output of mllam-data-prep by defining the projection in a CF-compliant manner (again with crs_wkt) and setting the grid_mapping attr on each output variable (that contains the dimensions that the projection uses) to reference that grid-mapping. It will likely also be necessary to set the coordinate names for coordinates used in the projection in a CF-compliant manner too.
  • Write short section in README on 1) how to add projection info using the new section in the config, 2) the importance of the bbox, 3) may encourage people to raise an issue if they are unsure how to make their own crs wkt string?

@leifdenby leifdenby changed the title Projection support Add support for reading, defining and outputting CF-compliant projection info Dec 10, 2024
@leifdenby leifdenby modified the milestones: v0.6.0 (proposed), v0.6.0 Dec 16, 2024
@observingClouds observingClouds force-pushed the projection_support_stage1 branch from 3b7bbed to 8af4643 Compare January 9, 2025 19:02
@observingClouds observingClouds force-pushed the projection_support_stage1 branch from 8af4643 to ec75257 Compare January 9, 2025 19:37
@observingClouds
Copy link
Contributor Author

observingClouds commented Jan 10, 2025

@leifdenby now that we make projections a required argument of the config file, I wonder how we make this best work with some of the tests against older schema, e.g.:

>   ???
E   dataclass_wizard.errors.MissingFields: `InputDataset.__init__()` missing required fields.
E     Provided: ['path', 'dims', 'variables', 'dim_mapping', 'target_output_variable']
E     Missing: ['projections']
E     Input JSON: {"path": "https://mllam-test-data.s3.eu-north-1.amazonaws.com/height_levels.zarr", "dims": ["time", "x", "y", "altitude"], "variables": {"u": {"altitude": {"values": [100], "units": "m"}}, "v": {"altitude": {"values": [100], "units": "m"}}}, "dim_mapping": {"time": {"method": "rename", "dim": "time"}, "state_feature": {"method": "stack_variables_by_var_name", "dims": ["altitude"], "name_format": "f\"{var_name}{altitude}m\""}, "grid_index": {"method": "stack", "dims": ["x", "y"]}}, "target_output_variable": "state"}
E     error: InputDataset.__init__() missing 1 required positional argument: 'projections'

<string>:36: MissingFields
============================================================ short test summary info ============================================================
FAILED tests/test_from_config.py::test_config_revision_examples[fp_example0] - dataclass_wizard.errors.MissingFields: `InputDataset.__init__()` missing required fields.

This also affects the datasets saved on S3 we check against. Do we create new ones that include the projection information?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make mllam-data-prep aware of geographical coordinates (by introducing projection info)
3 participants