Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iids with version specified are not parsed #141

Closed
jbusecke opened this issue Apr 30, 2024 · 2 comments
Closed

iids with version specified are not parsed #141

jbusecke opened this issue Apr 30, 2024 · 2 comments

Comments

@jbusecke
Copy link
Collaborator

The way we are specifying the list of iids has an issue when the actual version is specified explicitly (which is what we recommend in the Readme currently.

An illustration (run from this PR branch jbusecke/pangeo-forge-esgf#40):

from pangeo_forge_esgf.parsing import parse_instance_ids
parse_iids = ["CMIP6.*.CAS.FGOALS-g3.ssp585.r3i1p1f1.day.[psl, pr, sfcWind].*.*"]
parsed_expanded = []
for piid in parse_iids:
    parsed_expanded.extend(parse_instance_ids(piid))
parsed_expanded

This works as expected and gives:

['CMIP6.ScenarioMIP.CAS.FGOALS-g3.ssp585.r3i1p1f1.day.sfcWind.gn.v20191217',
 'CMIP6.ScenarioMIP.CAS.FGOALS-g3.ssp585.r3i1p1f1.day.pr.gn.v20191217',
 'CMIP6.ScenarioMIP.CAS.FGOALS-g3.ssp585.r3i1p1f1.day.psl.gn.v20191217']

But if we only specify one of these datasets that were just parsed things go wrong:

parse_iids = ['CMIP6.ScenarioMIP.CAS.FGOALS-g3.ssp585.r3i1p1f1.day.sfcWind.gn.v20191217']
parsed_expanded = []
for piid in parse_iids:
    parsed_expanded.extend(parse_instance_ids(piid))
parsed_expanded

gives:
image

A simple solution is to remove the version from the iid:

parse_iids = ['CMIP6.ScenarioMIP.CAS.FGOALS-g3.ssp585.r3i1p1f1.day.sfcWind.gn.*']
parsed_expanded = []
for piid in parse_iids:
    parsed_expanded.extend(parse_instance_ids(piid))
parsed_expanded

gives:

['CMIP6.ScenarioMIP.CAS.FGOALS-g3.ssp585.r3i1p1f1.day.sfcWind.gn.v20191217']

I believe this is due to the fact that we specify "latest"="true" here which is generally a bit confusing, but it is what is is for now.

So I see two solutions:

  • We preprocess the iid list (within the recipe) and replace every version with a wildcard. This means that we only ever ingest the latest version.
  • We deactivate latest, and allow our recipe to check for specific versions.

I think I favor the former, but happy to get some input on this.

@jbusecke
Copy link
Collaborator Author

I might have been incorrect above. Here we actually treat iids without wildcards in a special way which we probably should not!
An iid with e.g. square brackets would be passed as is without running through pangeo-forge-esgf, likely leading to errors.

@jbusecke
Copy link
Collaborator Author

jbusecke commented May 8, 2024

This is not an issue anymore with the newest client jbusecke/pangeo-forge-esgf#45, but ill leave open until that is merged (pending some decisions over at jbusecke/pangeo-forge-esgf#46)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant