-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
testing NetCDFtoZarrSequentialRecipe on a few CMIP6 datasets #47
Comments
Naomi this is incredibly helpful. I'm just going to paste your main bullet points here for the record:
There are about 5 distinct issues surfaced by your tests. I'll work in converting them into specific issues in this repo. Everything you raise seems solvable, except for this one:
I can see this coming up a lot, and it will introduce significant new complexity into the structure of recipes. It means we have to do a scan through all the input files first and actually open them to see what's inside. Then we have to propagate this information through the pipeline in order to prepare the target and figure each chunk's target region. But we can do it! 💪 |
You should check out pydata/xarray#2844. It's going to do the job for you! |
Can you clarify this comment? What happens when you call |
For example, with the following 4 netcdf source files: (note that the last file is only 14 years long - others are 51 years long)
So, when I
So perhaps the solution lies in setting |
I have now re-run the four tests with the latest version of pangeo-forge and am happy to report that these issues have been resolved. See 51. We can now handle variable length netcdf files, large netcdf files can be chunked and the cftime issues have disappeared. Progress! |
@rabernat, I have begun to test how far we can get with your basic
NetCDFtoZarrSequentialRecipe
. The tutorial is a great start for learning how to use such a recipe! I was initially confused on two silly points - but might be worth a comment. First, one must also specify how many time slices are in each file - 1 in your example. The other was the:because I initially thought the 'buttons' were the green check marks and I wondered what sort of odd notebook extension you were using !? Then I realized that 'Xarray HTML repr' just meant the stuff printed out in the prior cell with 'ds_chunk'!
Anyway, I could only find one CMIP6 dataset which would work (except couldn't do pre-process to move *_bnds to coordinates) all the way through. You can see my attempts and their difficulties in this notebook.
But altogether very promising!
The text was updated successfully, but these errors were encountered: