Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft - WIP] Updates to Zarr V3 #788

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

norlandrhagen
Copy link
Contributor

@norlandrhagen norlandrhagen commented Feb 12, 2025

Branch to update pangeo-forge-recipes to Zarr v3.

  • pinned Zarr and Xarray to comply with Zarr V3.
  • fixed first of the zarr v2->v3 api changes. (zarr.storage.FSStore -> zarr.storage.FSSpecStore)

cc @keewis

@norlandrhagen
Copy link
Contributor Author

WIP note: @keewis and I pushed on this. We're close, but currently blocked by: TypeError: Filesystem needs to support async operations.

@keewis keewis mentioned this pull request Feb 15, 2025
@keewis
Copy link
Contributor

keewis commented Feb 15, 2025

it looks like "file" and "memory" filesystems don't have a async version, so passing that as fs won't work.

I've tried using AsyncFileSystemWrapper to work around this, but that doesn't appear to properly create the files (I get FileNotFoundErrors for various zarr.json files):

fs = fsspec.get_filesystem_class("file")()
path = str(tmpdir_factory.mktemp("target"))
return FSSpecTarget(fs, path)

becomes:

fs = fsspec.filesystem("file", auto_mkdir=True)
async_fs = AsyncFileSystemWrapper(fs)
...
FSSpecTarget(async_fs, path)

Edit: looks like setting auto_mkdir=True on the sync fs gets rid of that error... on to the next one: ValueError: could not convert string to float: 'AAAAAAAA+H8=', which is raised because _FillValue appears to have been inherited from zarr_array.attrs as something weird:

var_coded.encoding.update(zarr_array.attrs)

@norlandrhagen
Copy link
Contributor Author

Went down the exact same rabbit hole!

@keewis
Copy link
Contributor

keewis commented Feb 15, 2025

I'm pretty close now, I think: only this weird encoding issue left (see the last edit to #788 (comment))

Edit: as far as I can tell, this comes from somewhere within StoreToZarr (not sure, though, that's another rabbit hole)

@keewis
Copy link
Contributor

keewis commented Feb 16, 2025

I can reproduce the weird encoding with:

import xarray as xr
import zarr
import fsspec
import numpy as np
from fsspec.implementations.asyn_wrapper import AsyncFileSystemWrapper

fs = fsspec.filesystem("file", auto_mkdir=True)
async_fs = AsyncFileSystemWrapper(fs)
ds = xr.Dataset(
    coords={"lon": ("lon", [0.0, 1.1], {}, {"_FillValue": np.float64(np.nan)})}
)
store = zarr.storage.FsspecStore(fs=async_fs, path="store.zarr", read_only=False)
ds.to_zarr(store, mode="w", compute=False, encoding={}, consolidated=False)
zgroup = zarr.open_group(store)
print(dict(zgroup["lon"].attrs))

there's also a lot of warnings similar to:

.../zarr/core/group.py:2824: UserWarning: Object at lon is not recognized as a component of a Zarr hierarchy.

So the question is: how is zarr.Array.attrs defined? Are those the encoded values? I fear my knowledge of zarrv3 is very low.

@keewis
Copy link
Contributor

keewis commented Feb 16, 2025

it looks like the the source of the encoding value is xarray, which encodes floats as base64-strings. I would just remove the update of encoding here:

var_coded.encoding.update(zarr_array.attrs)
but I'm not sure what its purpose is so I may very well be missing something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants