Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed Recipes for ClimateBench target variables #134

Open
duncanwp opened this issue May 4, 2022 · 3 comments
Open

Proposed Recipes for ClimateBench target variables #134

duncanwp opened this issue May 4, 2022 · 3 comments

Comments

@duncanwp
Copy link

duncanwp commented May 4, 2022

Thanks for setting up this great resource. I'm curious if dataset proposed below would make a good PangeoForge recipe. It could be a fairly simple extension of a CMIP6 recipe but I'm not sure if dependencies / recipe chaining is supported yet or in your plans?

Source Dataset

This fairly simple dataset consists of a few key (2D) CMIP6 variables from a single model for benchmarking Climate model emulation approaches: tas, diurnal_temperature_range (tasmax-tasmin), pr and pr90.

  • The file format is CMORized NetCDF
  • The files are arranged slightly differently for different scenarios but are broadly one file per ensemble member per scenario
  • Accessed via open ESGF THREDDS server

Transformation / Alignment / Merging

The transformations are fairly light, just combining across members and time where necessary and then calculating the monthly and annual quantities from daily data.

Output Dataset

zarr output would be preferable, either one file per scenario, or one big file with a scenario dimension (though the time dimension varies with scenario making that tricky I think).

@rabernat
Copy link
Contributor

rabernat commented May 4, 2022

Duncan, this would be an ideal recipe, and we would love to support it. Do you have any idea how big the total dataset is?

@duncanwp
Copy link
Author

@rabernat Fantastic! I'm going to get working on this now the paper is close to acceptance.

The final dataset is only a few Gb, maybe 10's Gbs if we extend to multiple CMIP models. Doing this in the cloud will be great though since running it locally requires storing all the intermediary daily data which is pretty large.

Is there an example recipe that pulls from the ESGF S3 bucket if available but falls back to the ESGF nodes if unavailable? This PR looks close but has been superceded by this, which hasn't been merged.

@duncanwp
Copy link
Author

duncanwp commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants