Zarr reader #271

norlandrhagen · 2024-10-24T21:58:15Z

WIP PR to add a Zarr reader. Thanks to @TomNicholas for the how to write a reader guide.

Closes Add Zarr Reader(s) #262
Tests added
Tests passing
Full type hint coverage
Changes are documented in docs/releases.rst
optimizations (e.g. using async interface to list lengths of chunks for each variable concurrently)
New functionality has documentation
Read v3 Zarr

Future PR(s):

Read v2 Zarr
sharded v3 data

To Do:

norlandrhagen · 2024-10-25T18:12:46Z

#273

norlandrhagen · 2024-10-31T21:07:15Z

Bit of an update. With the help from @sharkinsspatial @abarciauskas-bgse and @maxrjones I got a Zarr loaded as a virtual dataset.

<xarray.Dataset> Size: 3kB
Dimensions:  (time: 10, lat: 9, lon: 18)
Coordinates:
    lat      (lat) float32 36B ManifestArray<shape=(9,), dtype=float32, chunk...
    lon      (lon) float32 72B ManifestArray<shape=(18,), dtype=float32, chun...
  * time     (time) datetime64[ns] 80B 2013-01-01 ... 2013-01-03T06:00:00
Data variables:
    air      (time, lat, lon) int16 3kB ManifestArray<shape=(10, 9, 18), dtyp...
Attributes:
    Conventions:  COARDS
    title:        4x daily NMC reanalysis (1948)
    description:  Data is from NMC initialized reanalysis\n(4x/day).  These a...
    platform:     Model
    references:   http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanaly...

Next up is how to deal with fill_values.

When I try to write it to Kerchunk JSON, I’m running into some fill_value dtype issues in the Zarray.

ZArray(shape=(10,), chunks=(10,), dtype='<f4', fill_value=np.float32(nan), order='C', compressor=None, filters=None, zarr_format=2)

Where fill_value=np.float32(nan) . When I try to write these to JSON via ds.virtualize.to_kerchunk(format="dict"), I get TypeError: np.float32(nan) is not JSON serializable.

Wondering how fill_values like np.float32(nan) should be handled.

There seems to be some conversion logic in @sharkinsspatial's HDF reader for converting fill_values . It also looks like there is some fill_value handling in zarr.py.

TomNicholas · 2024-11-05T16:11:54Z

I got a Zarr loaded as a virtual dataset.

Amazing!

When I try to write it to Kerchunk JSON, I’m running into some fill_value dtype issues in the Zarray.

ZArray(shape=(10,), chunks=(10,), dtype='<f4', fill_value=np.float32(nan), order='C', compressor=None, filters=None, zarr_format=2)

Where fill_value=np.float32(nan) . When I try to write these to JSON via ds.virtualize.to_kerchunk(format="dict"), I get TypeError: np.float32(nan) is not JSON serializable.

Wondering how fill_values like np.float32(nan) should be handled.

This seems like an issue that should actually be orthogonal to this PR (if it weren't for the ever-present difficulty of testing). Either the problem is in the ZArray class and what types it allows, or it's in the Kerchunk writer not knowing how to serialize a valid ZArray. Either way if np.float32(nan) is a valid fill_value for a zarr array then it's not the fault of the new zarr reader.

TomNicholas

This is a a great start! I think the main thing here is that we don't actually need kerchunk in order to test this reader.

virtualizarr/tests/test_integration.py

virtualizarr/readers/zarr.py

TomNicholas · 2024-11-08T16:11:33Z

get chunk size with zarr-python (zarr-developers/zarr-python#2426) instead of fsspec

I think we should just do this in this PR. We can point to Tom's PR for now in the CI, but I expect that will get merged before this does anyway. If you look at Tom's implementation it's basically what we're doing here.

Co-authored-by: Tom Nicholas <[email protected]>

…ualiZarr into zarr_reader

TomNicholas · 2024-11-14T21:26:24Z

Store.getsize was just merged so we can possibly just use the same upstream env we are already using zarr-python zarr-developers/zarr-python#2426

virtualizarr/readers/zarr.py

for more information, see https://pre-commit.ci

* Use ManifestStore in Zarr reader * Update virtualizarr/readers/zarr.py Co-authored-by: Raphael Hagen <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Raphael Hagen <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

TomNicholas

Thanks for your patience @norlandrhagen , and for your help @maxrjones .

I believe I spotted one small bug, but otherwise this looks great.

The implementation is very neat now!

TomNicholas · 2025-04-23T21:28:56Z

docs/faq.md

@@ -77,7 +77,7 @@ vds.virtualize.to_icechunk(icechunkstore)

 ### I already have some data in Zarr, do I have to resave it?

-No! VirtualiZarr can (well, [soon will be able to](https://github.com/zarr-developers/VirtualiZarr/issues/262)) create virtual references pointing to existing Zarr stores in the same way as for other file formats.
+No! VirtualiZarr can create virtual references pointing to existing Zarr stores in the same way as for other file formats. Note: Currently only reading Zarr V3 is supported.


Do we have an issue to track learning to read zarr v2?

docs/faq.md

virtualizarr/readers/zarr.py

virtualizarr/xarray.py

Co-authored-by: Tom Nicholas <[email protected]>

for more information, see https://pre-commit.ci

…mdate

conftest.py

norlandrhagen added 3 commits October 23, 2024 23:50

wip toward zarr v2 reader

26a94df

removed _ARRAY_DIMENSIONS and trimmed down attrs

cfb7b8d

WIP for zarr reader

2f26f03

norlandrhagen added the readers label Oct 24, 2024

norlandrhagen had a problem deploying to test-release October 24, 2024 21:58 — with GitHub Actions Failure

adding in the key piece, the reader

eab87a6

norlandrhagen temporarily deployed to test-release October 24, 2024 22:01 — with GitHub Actions Inactive

virtual dataset is returned! Now to deal with fill_value

13db375

norlandrhagen temporarily deployed to test-release October 31, 2024 20:59 — with GitHub Actions Inactive

TomNicholas reviewed Nov 5, 2024

View reviewed changes

TomNicholas mentioned this pull request Nov 6, 2024

Demo dataset with UFS-Replay earth-mover/icechunk#378

Open

2 tasks

TomNicholas mentioned this pull request Nov 11, 2024

open_virtual_dataset fails to open tiffs #291

Open

norlandrhagen and others added 2 commits November 12, 2024 11:28

Merge branch 'main' into zarr_reader

cc30ad7

Update virtualizarr/readers/zarr.py

a047ff9

Co-authored-by: Tom Nicholas <[email protected]>

norlandrhagen temporarily deployed to test-release November 12, 2024 18:31 — with GitHub Actions Inactive

Merge branch 'zarr_reader' of https://github.com/zarr-developers/Virt…

072bead

…ualiZarr into zarr_reader

replace fsspec ls with zarr.getsize

f7c9a3f

norlandrhagen had a problem deploying to test-release November 15, 2024 04:42 — with GitHub Actions Failure

lint

2024606

norlandrhagen had a problem deploying to test-release November 15, 2024 04:47 — with GitHub Actions Failure

wip test_zarr

443435b

norlandrhagen had a problem deploying to test-release November 15, 2024 07:22 — with GitHub Actions Failure

removed pdb

50fd8b5

norlandrhagen had a problem deploying to test-release November 15, 2024 07:24 — with GitHub Actions Failure

TomNicholas reviewed Nov 15, 2024

View reviewed changes

[pre-commit.ci] auto fixes from pre-commit.com hooks

a262c7d

for more information, see https://pre-commit.ci

pre-commit-ci bot had a problem deploying to test-release April 19, 2025 01:22 Failure

Fix bad merge commit

77074bd

maxrjones temporarily deployed to test-release April 19, 2025 01:26 — with GitHub Actions Inactive

maxrjones temporarily deployed to test-release April 22, 2025 17:08 — with GitHub Actions Inactive

filepath slash nit

bdf4d20

norlandrhagen temporarily deployed to test-release April 23, 2025 16:49 — with GitHub Actions Inactive

TomNicholas approved these changes Apr 23, 2025

View reviewed changes

Update docs/faq.md

ad01521

Co-authored-by: Tom Nicholas <[email protected]>

norlandrhagen temporarily deployed to test-release April 23, 2025 21:50 — with GitHub Actions Inactive

norlandrhagen and others added 5 commits April 23, 2025 15:50

Update virtualizarr/readers/zarr.py

4d0151e

Co-authored-by: Tom Nicholas <[email protected]>

Update virtualizarr/readers/zarr.py

243cd32

Co-authored-by: Tom Nicholas <[email protected]>

Update virtualizarr/readers/zarr.py

dc2a266

Co-authored-by: Tom Nicholas <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

1591412

for more information, see https://pre-commit.ci

adds back in todo

7805033

norlandrhagen temporarily deployed to test-release April 23, 2025 22:00 — with GitHub Actions Inactive

norlandrhagen mentioned this pull request Apr 23, 2025

[WIP] Tracking improvements / wishlist items for the Zarr reader #565

Open

adds wip test for scalar chunk testing

7d8f75a

norlandrhagen temporarily deployed to test-release April 23, 2025 23:54 — with GitHub Actions Inactive

adds test for scalar zarr + modifies get_chunk_mapping_prefix to acco…

ccc9279

…mdate

norlandrhagen temporarily deployed to test-release April 24, 2025 00:27 — with GitHub Actions Inactive

TomNicholas reviewed Apr 24, 2025

View reviewed changes

conftest.py Outdated Show resolved Hide resolved

update localstore to memorystore

a238177

norlandrhagen temporarily deployed to test-release April 24, 2025 15:32 — with GitHub Actions Inactive

Merge branch 'develop' into zarr_reader

9f851d1

norlandrhagen temporarily deployed to test-release April 24, 2025 15:33 — with GitHub Actions Inactive

TomNicholas approved these changes Apr 24, 2025

View reviewed changes

norlandrhagen merged commit ff1ddb4 into develop Apr 24, 2025
12 checks passed

norlandrhagen deleted the zarr_reader branch April 24, 2025 19:12

Zarr reader #271

Zarr reader #271

Uh oh!

Conversation

norlandrhagen commented Oct 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

norlandrhagen commented Oct 25, 2024

Uh oh!

norlandrhagen commented Oct 31, 2024

Uh oh!

TomNicholas commented Nov 5, 2024

Uh oh!

TomNicholas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TomNicholas commented Nov 8, 2024

Uh oh!

TomNicholas commented Nov 14, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TomNicholas left a comment

Choose a reason for hiding this comment

Uh oh!

TomNicholas Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

norlandrhagen Apr 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

norlandrhagen commented Oct 24, 2024 •

edited

Loading