-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Derived CMIP6 data recipe builder (WIP) #252
Conversation
A few notes on implementation. Proposed changes in #242 (see also #242 (comment)) may make more complex derivations possible, but we were pleased to see that for the weighted mean example demonstrated in the above-linked notebook, that refactor was actually not strictly necessary. The two issues we did encounter with existing code were:
|
So happy to see it be used in the wild, @jbusecke! |
Hi @jbusecke! This recipe looks exactly what I need for pangeo-forge/staged-recipes#134 but seems to have stalled, is it likely to be merged or is there another approach I should follow? |
👋 @duncanwp, thanks for checking in here. This issue was an early attempt that @jbusecke and I made to work on this issue of generalizing recipes derived from ESGF holdings. We have since moved on, and simply forgot to close this issue. The current state of these efforts is well-summarized by Julius in pangeo-data/pangeo-cmip6-cloud#31 (comment). We don't have an end-to-end solution deployed for this today, but have made a lot of headway since this issue, and welcome your collaboration on this. I'll close this issue now, as its gone stale, but please do follow up on the issue I've linked here, or any of the other issues that it links to. Look forward to working together on this! |
Perfect, thanks @cisaacstern! I'll dive in to that and see if / how I can help |
@cisaacstern and I had a really productive hack today and I think we made some good progress towards using pangeo-forge to derive datasets from existing ARCO data.
Our Goal
This application of pangeo-forge represents a bit of a deviation from the core or initial mission of migrating legacy datasets into the cloud, but I believe it could really boost the adaptation of "cloud-first" workflows in many science contexts. Other attempts have been made to achieve this functionality (#176, #205), but this represents our most successful effort to date.
We had the goal of producing a derived dataset from arbitrary CMIP6 and decided to start with a weighted mean of a surface variable to keep computation short, not get into too much trouble with dask chunking (averaging over lateral dimension in time chunks parallelizes nicely), but also produce something that would be actually useful in a science context.
Successes
We added a new
builder
directory containing a module which could be extended to have several generalizable functions that build recipes but use logic that is specific to cmip6 and are inspired by the workflow developed in cmip6_preprocessing(this provided the advantage of an easy transition from interactive work to a recipe).We were able to programmatically build a set of recipes for different variables (sea surface temperature and salinity) and two examples models, execute these locally and plot the resulting data.
The builder function relies on 'facets' which are used to query an intake-esm catalog that is also queried to get available weights (in this case surface area
areacello
) and find the best match to the data (see here for details).We made a little demo notebook (Big Shoutout to @yuvipanda for the amazing notebooksharing.space 🔥)