Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: missing files #374

Open
ghislainp opened this issue Jun 3, 2022 · 4 comments
Open

feature request: missing files #374

ghislainp opened this issue Jun 3, 2022 · 4 comments

Comments

@ghislainp
Copy link

I have a dataset with one file per day but some files are missing. Is there a way to deal with this case programmatically ? For instance a function like process_input that would be called when a file is missing. process_missing ?

@cisaacstern
Copy link
Member

👋 @ghislainp, thanks for this question. Are you able to determine which specific dates are missing prior to writing the recipe?

If so, you could employ a pattern like this:

https://github.com/pangeo-forge/noaa-coastwatch-geopolar-sst-feedstock/blob/32ba8c8f6a639975a1061ece699ac2f053cb8d02/feedstock/recipe.py#L7-L18

to drop them from the file list before the recipe is executed.

This is probably the easiest way to handle this case at the moment. Automatically skipping over missing dates during recipe execution is not currently supported, though that would certainly be worth aiming for eventually.

@ghislainp
Copy link
Author

I could but the resulting structure of the output data is not regular in time if some dates are skipped.

Is it possible to re-align/ the dataset after the concatenation, before writting the zarr ? I assume by using the process_chunk function, but the output of the process_chunk would be larger than the input and what would happen if the missing date is between two chunks...

@cisaacstern
Copy link
Member

I see. If I understand correctly, you would ideally like arrays of NaNs (or some other filler value) in place of the empty dates, so that the dataset chunking remains correctly aligned, despite the missing dates?

To the best of my knowledge, this is not currently possible (at least, without some seriously hacky maneuvers), but the ongoing work to resolve #256, which is a current priority, would probably make this much more possible. I'll be curious to know if @rabernat agrees with this assessment of if I've overlooked something.

@cisaacstern
Copy link
Member

Noting that pangeo-forge/cesm-atm-025deg-feedstock#2 would benefit from a similar feature (failing gracefully in the case of missing files).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants