Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring/fixing zarr-pyhton v3 incompatibilities in xarray datatrees #10020

Merged
merged 90 commits into from
Mar 20, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
90 commits
Select commit Hold shift + click to select a range
0a2a49e
fixing compatibility with relative paths in open_store function withi…
aladinor Feb 2, 2025
ae80662
fixing/refactoring test to be compatible with Zarr-python v3
aladinor Feb 3, 2025
379db18
adding @requires_zarr_v3 decorator to TestZarrDatatreeIO
aladinor Feb 3, 2025
846dc50
replacing 0 with 1 in _create_test_datatree wich will write a chunk
aladinor Feb 3, 2025
ddfd0b5
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 3, 2025
3f9a8fb
fixing issues with groups
aladinor Feb 3, 2025
f140658
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 3, 2025
0e790eb
Merge branch 'main' into dtree-zarrv3
aladinor Feb 3, 2025
403afa9
fixing issue with dict creation
aladinor Feb 3, 2025
58e8f8e
Merge branch 'dtree-zarrv3' of https://github.com/aladinor/xarray int…
aladinor Feb 3, 2025
fd357fa
fixing issues with Mypy
aladinor Feb 3, 2025
8b993a1
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Feb 3, 2025
d4aeeca
refactoring open_store in ZarrStore class to use Zarr.core.group.Grou…
aladinor Feb 3, 2025
3125647
refactoring datree test for zarr ensuring compatibility with zarr-pyt…
aladinor Feb 3, 2025
0c7485b
importing zarr.core.group only inside open_store function
aladinor Feb 3, 2025
fdeee94
documenting changes in what's-nwe.rst file
aladinor Feb 3, 2025
f3e2c66
Update xarray/backends/zarr.py
aladinor Feb 4, 2025
f9f1043
keeping grroup creation compatible with zarr v2
aladinor Feb 6, 2025
c118841
Merge branch 'main' into dtree-zarrv3
aladinor Feb 6, 2025
ec2086a
fixing issue with mypy
aladinor Feb 6, 2025
abaea4e
Merge branch 'main' into dtree-zarrv3
aladinor Feb 12, 2025
aa85bed
Merge branch 'main' into dtree-zarrv3
aladinor Feb 12, 2025
fce2957
adding root_path equal to '/' when opening group in zarr v3 to avoid …
aladinor Feb 12, 2025
e27b4b9
fixing tests accordingly
aladinor Feb 12, 2025
d03b003
Merge branch 'dtree-zarrv3' of https://github.com/aladinor/xarray int…
aladinor Feb 12, 2025
810a623
removing print statement
aladinor Feb 12, 2025
eabcc76
Merge branch 'main' into dtree-zarrv3
aladinor Feb 21, 2025
60e19d9
Merge branch 'main' into dtree-zarrv3
aladinor Feb 26, 2025
0934461
reverting changes made in unaligned test in zarr
aladinor Mar 5, 2025
6a74275
Merge branch 'main' into dtree-zarrv3
aladinor Mar 5, 2025
011f29c
adding requires_zarr_v3 decorator
aladinor Mar 5, 2025
e31c646
changing max_depth=None in Group.members to get all nested groups
aladinor Mar 6, 2025
e65f229
fixing unaligned test in datrees using zarr
aladinor Mar 6, 2025
9c88b26
Merge branch 'main' into dtree-zarrv3
dcherian Mar 7, 2025
5a668a4
Merge branch 'main' into dtree-zarrv3
aladinor Mar 7, 2025
53a9309
Merge branch 'main' into dtree-zarrv3
aladinor Mar 7, 2025
72c1ad6
Merge branch 'main' into dtree-zarrv3
aladinor Mar 7, 2025
502981c
Update xarray/backends/zarr.py
aladinor Mar 11, 2025
2f94763
Merge branch 'main' into dtree-zarrv3
aladinor Mar 12, 2025
3e09b61
Merge branch 'main' into dtree-zarrv3
aladinor Mar 13, 2025
d5a061e
updating whats-new.rst entry
aladinor Mar 13, 2025
a417371
remove funny-looking line and refactor to ensure reading consolidated…
TomNicholas Mar 13, 2025
8756919
parametrize over whether or not we write consolidated metadata
TomNicholas Mar 13, 2025
b85d70d
fix consolidated metadata
TomNicholas Mar 13, 2025
f1cc331
ian hcanges
ianhi Mar 13, 2025
296ed03
open_datatree_specific_group consolidated true works
ianhi Mar 13, 2025
46c61ca
refactoring
aladinor Mar 13, 2025
77e68e3
Merge branch 'main' into dtree-zarrv3
aladinor Mar 13, 2025
4da72ae
test: add consolidated parametrize to zarr datatree test
ianhi Mar 13, 2025
5f7c6b9
fix: group finding behavior consolidated
ianhi Mar 13, 2025
5dc7df7
Merge remote-tracking branch 'ianhi/aladinor/ian/updates' into dtree_…
TomNicholas Mar 17, 2025
9823d64
remove more debugging print statements
TomNicholas Mar 17, 2025
980ebb4
Merge branch 'dtree-zarrv3' into dtree-zarrv3-2
TomNicholas Mar 17, 2025
30f5bba
revert changes to test fixture
TomNicholas Mar 18, 2025
4d1fdb5
formatting
TomNicholas Mar 18, 2025
ecef578
add decorator to parametrize over zarr formats
TomNicholas Mar 18, 2025
c2a1f5f
ensure both versions of zarr-python and both versions of zarr-python …
TomNicholas Mar 18, 2025
cde6b65
change datatree fixture to not produce values that would be fill_valu…
TomNicholas Mar 18, 2025
09fad6e
refactor test to make expected behaviour clearer
TomNicholas Mar 18, 2025
77575b5
fix wrongly expected behaviour - should not expect inherited variable…
TomNicholas Mar 19, 2025
0a9f874
make arrays no longer scalars to dodge https://github.com/pydata/xarr…
TomNicholas Mar 19, 2025
565938b
Merge branch 'dtree-zarrv3-2' of https://github.com/TomNicholas/xarra…
TomNicholas Mar 19, 2025
daf0f42
fix bad merge
TomNicholas Mar 19, 2025
84bde40
parametrize almost every test over zarr_format
TomNicholas Mar 19, 2025
04d937c
parametrize encoding test over zarr_formats
TomNicholas Mar 19, 2025
765c5f0
use xfail in encoding test
TomNicholas Mar 19, 2025
7eee31c
updated expected behaviour of zarr on-disk in light of https://github…
TomNicholas Mar 19, 2025
0969422
fully revert change to simple_datatree test fixture by considered zar…
TomNicholas Mar 19, 2025
cacf419
parametrize unaligned_zarr test fixture over zarr_format
TomNicholas Mar 19, 2025
1a60ebe
move parametrize_over_zarr_format decorator to apply to entire test c…
TomNicholas Mar 19, 2025
d98abe3
for now explicitly consolidate metadata in test fixture
TomNicholas Mar 19, 2025
2dcefe4
correct bug in writing of consolidated metadata
TomNicholas Mar 19, 2025
a88e503
delete commented-out lines
TomNicholas Mar 19, 2025
22ac9b4
merges from main
TomNicholas Mar 19, 2025
69dc976
Revert "merges from main"
TomNicholas Mar 19, 2025
6e3e2aa
fix encodings test for zarr_format=3
TomNicholas Mar 19, 2025
6ce9578
tidy up
TomNicholas Mar 19, 2025
94f0ddc
Merge pull request #1 from TomNicholas/dtree-zarrv3-2
TomNicholas Mar 19, 2025
8573740
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 19, 2025
e2a58e8
Merge branch 'main' into dtree-zarrv3
TomNicholas Mar 19, 2025
71288c6
account for different default value of write_empty_chunks between zar…
TomNicholas Mar 19, 2025
47f3315
fix expected encoding key for compressor in zarr-python v2
TomNicholas Mar 19, 2025
2b50a97
account for exception type changing
TomNicholas Mar 19, 2025
59a978d
various typing fixes
TomNicholas Mar 19, 2025
fc368ce
Merge branch 'dtree-zarrv3' into dtree-zarrv3-2
TomNicholas Mar 19, 2025
cd6aad6
Merge pull request #2 from TomNicholas/dtree-zarrv3-2
TomNicholas Mar 19, 2025
3fb0b7f
remove outdated comment
TomNicholas Mar 19, 2025
e06fa25
bool type
TomNicholas Mar 20, 2025
0829c68
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Mar 20, 2025
ee4273d
Merge branch 'main' into dtree-zarrv3
dcherian Mar 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions doc/whats-new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,10 @@ Deprecations

Bug fixes
~~~~~~~~~

- Fix ``open_datatree`` incompatibilities with Zarr-Python V3 and refactor
``TestZarrDatatreeIO`` accordingly (:issue:`9960`, :pull:`10020`).
By `Alfonso Ladino-Rincon <https://github.com/aladinor>`_.
- Default to resolution-dependent optimal integer encoding units when saving
chunked non-nanosecond :py:class:`numpy.datetime64` or
:py:class:`numpy.timedelta64` arrays to disk. Previously units of
Expand Down Expand Up @@ -97,6 +101,7 @@ Bug fixes
datetimes and timedeltas (:issue:`8957`, :pull:`10050`).
By `Kai Mühlbauer <https://github.com/kmuehlbauer>`_.


Documentation
~~~~~~~~~~~~~
- Better expose the :py:class:`Coordinates` class in API reference (:pull:`10000`)
Expand Down
92 changes: 56 additions & 36 deletions xarray/backends/zarr.py
Original file line number Diff line number Diff line change
Expand Up @@ -666,10 +666,21 @@ def open_store(
use_zarr_fill_value_as_mask=use_zarr_fill_value_as_mask,
zarr_format=zarr_format,
)

from zarr import Group

group_members: dict[str, Group] = {}
group_paths = list(_iter_zarr_groups(zarr_group, parent=group))
return {
for path in group_paths:
if path == group:
group_members[path] = zarr_group
else:
rel_path = path.removeprefix(f"{group}/")
group_members[path] = zarr_group[rel_path.removeprefix("/")]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about moving this if/else into _iter_zarr_groups and handling the zarr2/3 difference there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will make the code way much cleaner!

out = {
group: cls(
zarr_group.get(group),
group_store,
mode,
consolidate_on_close,
append_dim,
Expand All @@ -680,8 +691,9 @@ def open_store(
use_zarr_fill_value_as_mask,
cache_members=cache_members,
)
for group in group_paths
for group, group_store in group_members.items()
}
return out

@classmethod
def open_group(
Expand Down Expand Up @@ -1034,8 +1046,6 @@ def store(
if self._consolidate_on_close:
kwargs = {}
if _zarr_v3():
# https://github.com/zarr-developers/zarr-python/pull/2113#issuecomment-2386718323
kwargs["path"] = self.zarr_group.name.lstrip("/")
kwargs["zarr_format"] = self.zarr_group.metadata.zarr_format
zarr.consolidate_metadata(self.zarr_group.store, **kwargs)

Expand Down Expand Up @@ -1662,8 +1672,6 @@ def open_groups_as_dict(
zarr_version=None,
zarr_format=None,
) -> dict[str, Dataset]:
from xarray.core.treenode import NodePath

filename_or_obj = _normalize_path(filename_or_obj)

# Check for a group and make it a parent if it exists
Expand All @@ -1686,7 +1694,6 @@ def open_groups_as_dict(
)

groups_dict = {}

for path_group, store in stores.items():
store_entrypoint = StoreBackendEntrypoint()

Expand Down Expand Up @@ -1762,44 +1769,57 @@ def _get_open_params(
consolidated = False

if _zarr_v3():
missing_exc = ValueError
# TODO: replace AssertionError after https://github.com/zarr-developers/zarr-python/issues/2821 is resolved
missing_exc = AssertionError
else:
missing_exc = zarr.errors.GroupNotFoundError

if consolidated is None:
try:
zarr_group = zarr.open_consolidated(store, **open_kwargs)
except (ValueError, KeyError):
# ValueError in zarr-python 3.x, KeyError in 2.x.
if consolidated in [None, True]:
# open the root of the store, in case there is metadata consolidated there
group = open_kwargs.pop("path")

if consolidated:
# TODO: an option to pass the metadata_key keyword
zarr_root_group = zarr.open_consolidated(store, **open_kwargs)
elif consolidated is None:
# same but with more error handling in case no consolidated metadata found
try:
zarr_group = zarr.open_group(store, **open_kwargs)
emit_user_level_warning(
"Failed to open Zarr store with consolidated metadata, "
"but successfully read with non-consolidated metadata. "
"This is typically much slower for opening a dataset. "
"To silence this warning, consider:\n"
"1. Consolidating metadata in this existing store with "
"zarr.consolidate_metadata().\n"
"2. Explicitly setting consolidated=False, to avoid trying "
"to read consolidate metadata, or\n"
"3. Explicitly setting consolidated=True, to raise an "
"error in this case instead of falling back to try "
"reading non-consolidated metadata.",
RuntimeWarning,
)
except missing_exc as err:
raise FileNotFoundError(
f"No such file or directory: '{store}'"
) from err
elif consolidated:
# TODO: an option to pass the metadata_key keyword
zarr_group = zarr.open_consolidated(store, **open_kwargs)
zarr_root_group = zarr.open_consolidated(store, **open_kwargs)
except (ValueError, KeyError):
# ValueError in zarr-python 3.x, KeyError in 2.x.
try:
zarr_root_group = zarr.open_group(store, **open_kwargs)
emit_user_level_warning(
"Failed to open Zarr store with consolidated metadata, "
"but successfully read with non-consolidated metadata. "
"This is typically much slower for opening a dataset. "
"To silence this warning, consider:\n"
"1. Consolidating metadata in this existing store with "
"zarr.consolidate_metadata().\n"
"2. Explicitly setting consolidated=False, to avoid trying "
"to read consolidate metadata, or\n"
"3. Explicitly setting consolidated=True, to raise an "
"error in this case instead of falling back to try "
"reading non-consolidated metadata.",
RuntimeWarning,
)
except missing_exc as err:
raise FileNotFoundError(
f"No such file or directory: '{store}'"
) from err

# but the user should still receive a DataTree whose root is the group they asked for
if group and group != "/":
zarr_group = zarr_root_group[group.removeprefix("/")]
else:
zarr_group = zarr_root_group
else:
if _zarr_v3():
# we have determined that we don't want to use consolidated metadata
# so we set that to False to avoid trying to read it
open_kwargs["use_consolidated"] = False
zarr_group = zarr.open_group(store, **open_kwargs)

close_store_on_close = zarr_group.store is not store

# we use this to determine how to handle fill_value
Expand Down
3 changes: 1 addition & 2 deletions xarray/core/datatree.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@
TYPE_CHECKING,
Any,
Concatenate,
Literal,
NoReturn,
ParamSpec,
TypeVar,
Expand Down Expand Up @@ -1741,7 +1740,7 @@ def to_zarr(
consolidated: bool = True,
group: str | None = None,
write_inherited_coords: bool = False,
compute: Literal[True] = True,
compute: bool = True,
**kwargs,
):
"""
Expand Down
15 changes: 15 additions & 0 deletions xarray/tests/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,21 @@ def _importorskip(

has_array_api_strict, requires_array_api_strict = _importorskip("array_api_strict")

parametrize_zarr_format = pytest.mark.parametrize(
"zarr_format",
[
pytest.param(2, id="zarr_format=2"),
pytest.param(
3,
marks=pytest.mark.skipif(
not has_zarr_v3,
reason="zarr-python v2 cannot understand the zarr v3 format",
),
id="zarr_format=3",
),
],
)


def _importorskip_h5netcdf_ros3(has_h5netcdf: bool):
if not has_h5netcdf:
Expand Down
Loading
Loading