-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ClimateBench dataset #43
Comments
Hey, I'm standing with @cisaacstern and keen to figure out how to make this work 😊 |
Fantastic. Thanks for pinging this issue. I propose that we formalize this effort into a working group. I would like to include @SammyAgrawal in this group, since I think this is a perfect way to learn more about apache beam and fits right into sammys project work of building reproducible ML/climate science pipelines. Sammy I hope this works well for you, I think this is a real great ✨synergy🤗.
|
Perfect, thanks @jbusecke! Please send a doodle for January. I have a single-model version pulling from ESGF here: https://github.com/duncanwp/ClimateBench/blob/main/prepare_data.py |
Just wanted to ping this thread. @cisaacstern could you fill out your availability? @SammyAgrawal could you double check the dates next week? It would be fantastic if we could make Tue (16th) or Wed (17th) work. Also a quick check back to @duncanwp: I think the code you provided up top is only to download from ESGF. I was under the assumption we want to avoid this step and load directly from the cloud zarr stores? What code is used to process the final output? Is that this notebook: https://github.com/duncanwp/ClimateBench/blob/main/prep_input_data.ipynb Excited to push this ahead! |
I sent an invite for Tue (Jan 16) 3-4PM EST! |
Hi all,Apologies but that time slot is now filled for me and I can’t move it… was there another that worked?Cheers,DuncanSent from my iPhone, at a time that suits me. Please feel free to respond at a time that suits you.On Jan 10, 2024, at 11:17, Julius Busecke ***@***.***> wrote:
I sent an invite for Tue (Jan 16) 3-4PM EST!
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: ***@***.***>
|
The ranges that worked according to the poll were:
Do you have availability in those ranges? |
Yes, any of the times on Wednesday would still work! |
I moved it to wed 3pm! Looking forward. |
Dataset Name
ClimateBench
Dataset URL
https://zenodo.org/record/7064308
Description
I propose to create a pipeline so that more climate models and variables, at a higher temporal resolution can be easily ingested into ClimateBench transparently and efficiently. These consist of post-processed and harmonized CMIP6 input and output files split across experiments/scenarios. This would allow others to expand upon the ClimateBench protocol and apply climate model emulation more generally.
Size
Roughly 10Gb files totaling around 1Tb (depending on storage availability)
License
Unknown
Data Format
NetCDF
Data Format (other)
No response
Access protocol
HTTP(S)
Source File Organization
The CMIP6 data is organized in ESGF into time sharded files per experiment, per model, per variable. I'm not sure how they're stored in Pangeo, which might be a more natural source.
Example URLs
No response
Authorization
No; data are fully public
Transformation / Processing
The data needs to be put onto a common grid, the piControl subtracted, and units harmonized. Some common statistics may be used for some variables (e.g. 99th percentile of precipitation).
Target Format
Zarr
Comments
Original Pangeo-Forge recipe: pangeo-forge/staged-recipes#134
Pull request regarding processing daily data with pangeo-forge-esgf: jbusecke/pangeo-forge-esgf#9
It might make sense to create this recipe as just a 'pointer' to the underlying CMIP data rather than storing it, but it depends on the compute/storage costs I guess.
Thanks!
The text was updated successfully, but these errors were encountered: