Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring/fixing zarr-pyhton v3 incompatibilities in xarray datatrees #10020

Merged
merged 90 commits into from
Mar 20, 2025

Conversation

aladinor
Copy link
Contributor

@aladinor aladinor commented Feb 3, 2025

Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aladinor for working on this! I'm happy to see that we are actually quite close to getting this working. I think the changes you made to the tests highlight that we have some work to do around consolidated metadata and possibly error handling in Zarr.

@TomNicholas TomNicholas added topic-zarr Related to zarr storage library topic-DataTree Related to the implementation of a DataTree class labels Feb 4, 2025
@maxrjones maxrjones mentioned this pull request Feb 6, 2025
2 tasks
@TomNicholas
Copy link
Member

It's working!

@aladinor this was really tricky in the end - you did very well to even get as far as you did!

(@jhamman and I wrote this together so he implicitly approves this PR)

@requires_zarr
@parametrize_zarr_format
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We can also do this, though that fixture is in test_backends.py :

@pytest.mark.usefixtures("default_zarr_format")

Copy link
Member

@TomNicholas TomNicholas Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to consolidate, but is that implementation better in any way? I find it a lot harder to read...

zarr_format: Literal[2, 3],
) -> None:
"""For one zarr array, check that all expected metadata and chunk data files exist."""
# TODO: This function is now so complicated that it's practically checking compliance with the whole zarr spec...
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I think the better test here is to write some dask arrays with random data setting compute=False and then assert that you only get fill_value on read. That checks that real data is not written with compute=False.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just punt this to a follow-up PR? All the detailed checks here are so subtle that I'm worried if I try to refactor it again I'm going to either miss things or just delay this PR much further, and ultimately this works fine.

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good lets iterate on a future PR.

Thanks @aladinor @TomNicholas @jhamman !

@dcherian dcherian enabled auto-merge (squash) March 20, 2025 00:44
@dcherian dcherian merged commit f0809e4 into pydata:main Mar 20, 2025
30 checks passed
dcherian added a commit to dcherian/xarray that referenced this pull request Mar 20, 2025
* main:
  Refactoring/fixing zarr-python v3 incompatibilities in xarray datatrees (pydata#10020)
  Refactor calendar fixtures (pydata#10150)
  Use flox for grouped first, last. (pydata#10148)
  Update flaky pydap test (pydata#10149)
@TomNicholas TomNicholas deleted the dtree-zarrv3 branch March 20, 2025 13:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class topic-zarr Related to zarr storage library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DataTree roundtrip fails on None group lookup
6 participants