Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should consolidated metadata always be executed at the store root? #2920

Open
jhamman opened this issue Mar 19, 2025 · 3 comments
Open

Should consolidated metadata always be executed at the store root? #2920

jhamman opened this issue Mar 19, 2025 · 3 comments
Labels
bug Potential issues with the zarr-python library

Comments

@jhamman
Copy link
Member

jhamman commented Mar 19, 2025

Zarr version

v3.0.5

Numcodecs version

v0.15.1

Python Version

Python 3.11

Operating System

Mac

Installation

conda/pip

Description

Zarr-Python 2's consolidate_metadata included a path argument. However, regardless of the value provided to path, the .zmetadata JSON object was always placed at the root of the Store. This behavior changed in Zarr-Python 3. Was this intentional and if so, is it really what we want?

I know @TomAugspurger gave this some thought when implementing consolidated metadata for zarr-python 3. So perhaps we just need to document that this change was indeed intentional?

Steps to reproduce

# zarr-python 3
store = {}

root = zarr.group(store=store, zarr_format=2)
zarr.consolidate_metadata(store)
root.create_group("foo")
root.create_group("foo/spam")
zarr.consolidate_metadata(store, path="foo")
list(store)

# ['.zgroup',
#  '.zattrs',
#  '.zmetadata',
#  'foo/.zgroup',
#  'foo/.zattrs',
#  'foo/spam/.zgroup',
#  'foo/spam/.zattrs',
#  'foo/.zmetadata']  # <- from final consolidation
# zarr-python 2
store = {}

root = zarr.group(store=store)
zarr.consolidate_metadata(store)
root.create_group("foo")
root.create_group("foo/spam")
zarr.consolidate_metadata(store, path="foo")
list(store)
# ['.zgroup', '.zmetadata', 'foo/.zgroup', 'foo/spam/.zgroup']  # <- no foo/.zmetadata

Additional output

xref: pydata/xarray#10020

@TomAugspurger
Copy link
Contributor

#2113 (comment) touches on this. That sounds like it was intentional on my part, though I don't see any discussion there about it.

Reading that now, I'm not really sure I appreciated the magnitude of the change. IIRC I viewed it more as a relaxing of a requirement of zarr-v2.

@TomNicholas
Copy link
Member

Is there actually a use case for consolidated metadata that is not at the root? It seems to me that every time you want consolidated metadata you would like to have it for the whole store, i.e. at the root.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 21, 2025

I would need to check the v2 implementation, but my recollection is that it didn't create consolidated metadata for the entire store, unless you called it with the root group. If you do zarr.convenience.consolidate_metadata(store, path="path/to/group"), that would write the consolidated metadata for just that group (and its children) at the root of the store. If you do zarr.convenience.consolidate_metadata(store, path="path/to/another-group"), it would overwrite the metadata at the root of the store (https://zarr.readthedocs.io/en/v2.18.4/api/convenience.html#zarr.convenience.consolidate_metadata).

So if you want to consolidate for the entire store in v3, you can still have that with zarr.consoildate_metadata(store) (no path / group).

In xarray's case, I guess the hope / expectation is that you can consolidate a specific group and have it show up in the root of the store?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

No branches or pull requests

3 participants