Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed Recipes for ClimateBench #243

Closed
jbusecke opened this issue Jan 4, 2023 · 2 comments
Closed

Proposed Recipes for ClimateBench #243

jbusecke opened this issue Jan 4, 2023 · 2 comments

Comments

@jbusecke
Copy link
Contributor

jbusecke commented Jan 4, 2023

Dataset Name

ClimateBench

Dataset URL

https://zenodo.org/record/7064308#.Y7St9S-B30o

Description

"ClimateBench is a benchmark dataset for climate model emulation inspired by WeatherBench."
I believe the ClimateBench dataset would be of broad use for the climate community (especially folks who are interested in ML driven emulators).

License

Creative Commons Attribution 4.0 International

Data Format

NetCDF

Data Format (other)

No response

Access protocol

HTTP(S)

Source File Organization

There is a variety of netcdfs in this repository in a mix of .zip and .tar.gz files which each contain several netcdfs.

Example URLs

https://zenodo.org/record/7064308/files/test.tar.gz

Authorization

No; data are fully public

Transformation / Processing

I am unsure if we could further concat some of these datasets (will inquire with the authors), but as a first go I think it would be very useful to have a zarr store for each of the multiple netcdfs in each archive.

Target Format

Zarr

Comments

For the CMIP6 data (which is in a regular zip file) this is solved, but unfortunately we have the same issue here as in #219, where we cannot index single files in a gzipped tar.

I guess the same possible solution applies here: We could see if the authors would consider issuing a new release on zenodo that uses regular zip files for all data.

Hey @duncanwp, do you think that is a possibility?

@duncanwp
Copy link

duncanwp commented Jan 6, 2023

Hi @jbusecke - thanks for the suggestion! I have already opened an issue proposing a ClimateBench recipe here: #134.

Your approach of just polling Zenodo is certainly easier and I would be happy to repackage the data to make that easier, but it would be really valuable to have a proper pipeline from ESGF so that other models and variables can be easily added.

That approach is WIP (https://github.com/duncanwp/climatebench-feedstock) but is currently held up by this PR on pangeo-forge-esgf: jbusecke/pangeo-forge-esgf#9.

@jbusecke
Copy link
Contributor Author

Oh right on! So I am blocking myself here in a way. and apparently I am too stupid to use the search function 🤪.
I have allocated some time next week to work on pgf (including pangeo-forge-esgf). I will look into this issue for sure, apologies for the long wait.

Also I have been very interested in this use case of producing 'derived' datasets from CMIP6. There have been discussions about this earlier (might have been private chats between me and @cisaacstern though). I have some thoughts on this, and would be really curious to hear about your experience/needs.
Perhaps a short meeting on this could also help? Id be happy to jump on zoom some time next week.

Either way, I am closing this here as a dupe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants