Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remic/entozud #404

Merged
merged 12 commits into from
Feb 26, 2024
Merged

Remic/entozud #404

merged 12 commits into from
Feb 26, 2024

Conversation

remicousin
Copy link
Contributor

I did a bit of cleaning by factoring out some functionality, removing all CONFIG entries from being inside functions, adding Documentation, and introducing system arguments rather than a loop on vars and time_res.

All of that to have convert either create a new zarr, or append to an existing one. At present this is assuming that there is no gap between the new set and the old set... Not sure what happens if there is. Maybe that's where the region case come into play? (I haven't looked into that).

I only choose to use to_zarr because it's what we were using in the first place. I am wondering if the 2 cases (create/append) could be even a lot closer if appending to nothing happens to create... or in the case reading the zarr store returns nothing, then make an empty zarr store and that's all the 2 cases differ from...

@remicousin remicousin self-assigned this Feb 20, 2024
Copy link
Collaborator

@aaron-kaplan aaron-kaplan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In https://bitbucket.org/iridl/subx-cloud/src/master/zarrify.py I took the approach of creating an empty store and then writing into regions of it. (The motivation in that case was to parallelize writes.) It might be more complicated that what's needed in this case.

It would be nice to be able to either (a) skip opening any netcdf files whose data are already present in the zarr, for speed, or (b) compare pre-existing zarr data to what's in the netcdf files, for extra safety when you're not in a hurry. But that can wait for a subsequent version.

@remicousin
Copy link
Contributor Author

remicousin commented Feb 26, 2024

I fixed all but the use of strftime to compare times... I ran out of options/ideas to compare comparable objects of same type...

I'll probably leave the other 2 recommendations as Issues, depending when the rest is validated and what's going on elsewhere.

Copy link
Collaborator

@aaron-kaplan aaron-kaplan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[edit: delete an unfinished sentence that I didn't intend to put here.]

@remicousin remicousin merged commit e78f080 into master Feb 26, 2024
1 check passed
@remicousin remicousin deleted the remic/entozud branch February 26, 2024 19:47
@aaron-kaplan
Copy link
Collaborator

Thinking about how to build a workflow around this... when I tried the scipt it was extremely slow. Is it still slow when there's nothing to update?

@remicousin
Copy link
Contributor Author

no. It didn't feel like it. I've been updating either nothing or one month at a time (or creating up to 5 months now)

@aaron-kaplan
Copy link
Collaborator

I'm asking because I'm wondering whether we should add the ENACTS zarr update step to update_datalib. Is it fast enough for that not to be annoying?

@remicousin
Copy link
Contributor Author

It doesn't take time, but we already ran in a case where the filename pattern expected by enactstozarr was not respected (Guyana). So at least for that reason, we might want to wait until we parameterize that aspect or something?

@aaron-kaplan
Copy link
Collaborator

It will be optional, because not everyone has both ENACTS and python maprooms. So we can put it in the update script with a conditional such that it only runs on servers where it's been enabeld. Then we can enable it at one partner at a time, and if/when we need to enable it on one that requires that the script be generalized, we can make those changes then.

@aaron-kaplan
Copy link
Collaborator

Which country was the impetus for doing this now?

@remicousin
Copy link
Contributor Author

Kenya MD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants