Skip to content

Require zarr-python>=3.0 #453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from
Closed

Require zarr-python>=3.0 #453

wants to merge 9 commits into from

Conversation

TomNicholas
Copy link
Member

@TomNicholas TomNicholas commented Feb 27, 2025

This PR just merges the other development branch, and so is meant as a way to check the testing status of trying to pin zarr-python>=3.0 by merging @abarciauskas-bgse 's work in #429.

  • Tests passing

abarciauskas-bgse and others added 2 commits February 18, 2025 18:57
* Added zarray_to_v3metadata and test

* Working on manifest array tests

* Fix test_manifests/test_array#TestConcat tests

* Passing TestStack tests and add fixture

* All test_manifests/test_array tests passing

* Compressors should be list

* Passing dmrpp tests

* Passing test_hdf.py tests

* Start to work on kerchunk tests

* Add method to convert array v3 metadata to v2 metadata for kerchunk (not happy about this)

* Fix fixtures and mark xfail netcdf3

* Test for convert_v3_to_v2_metadata

* Deduplicate fixture for array v3 metadata

* Parse filters and compressors from v3 metdata for v2 metadata

* Rewrite extract_codecs

* Refactor convert_to_codec_pipeline

* Fix hdf integration tests

* Test for convert_to_codec_pipeline

* Refactor get_codecs and its tests

* Fix most integration tests and writer tests

* Fix xarray tests

* Working on integration tests

* Add expected type

* Mark datetime tests xfail

* Upgrade xarray for tests

* xfail some unsupported zarr-python 3 data types

* Require zarr

* Remove zarr dep

* import zarr, explicit dependency

Co-authored-by: Tom Nicholas <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add zarr as a dependency

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Min numcodecs version

* numcodecs>=0.15.1 in environment and upstream.yml conda env files

* Working on mypy errors

* Fix mypy errors and tests

* Remove ZArray class

* Just return metadata's shape

* Create update metadata function

* Fix typing for update_metadata

* Check for regular chunk grid in manifest instantiation

* Remove obsolete codecs code

* Fix chunks function and add docstring

* Remove custom zattrs type

* Move some imports and make update_metadata a private method

* Remove zarr.py

* Add zarr to other ci env files

* Fixture array_v3_metadata uses array_v3_metadata_dict

* No need for union type for CodecPipeline

* Use type alias

* Add comment

* Update virtualizarr/manifests/array_api.py

Co-authored-by: Tom Nicholas <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revised copy_and_replace_metadata to be in utils and called correctly

* Update virtualizarr/translators/kerchunk.py

Co-authored-by: Tom Nicholas <[email protected]>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Refactor create v3 array metadata

* Rename to create_v3_array_metadata

* Fix some codecs fixtures

* Use global vars and simple fixture for creating codec pipelines

* Remove redundant create_codec_pipeline fixture

* Fix docstring

* Use create_v3_array_metadata in from_kerchunk_refs

* Add links to zarr-python 3.0 issues for big endian, datetime and timedelta data types

* Reorganize conftest

* Remove obsolete comment

* Rename function numcodec_config_to_configurable

* Fix parameters in docstring for convert_to_codec_pipeline

* Revert change to pytest mark skipif for astropy

* Remove commented arguments

* Add classes to test_codecs and make zarr_array a fixture

* Add tests for extract_codecs

* Add test for get_codec_config

* Remove obsolete comment

* Add test for copy_and_replace_metadata

* Add release notes

* Attempt to fix rst links

* Move convert_v3_to_v2_metadata to utils

---------

Co-authored-by: Tom Nicholas <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@abarciauskas-bgse
Copy link
Collaborator

@TomNicholas thanks for opening this - I just looked at the test failures and it appears the failure is because in the kerchunk writer, we are expecting .zarray["filters"] to be [] but the value for .zarray["filters"] is actually null. Would you like me to investigate where this change is coming from or do you have an idea?

@TomNicholas
Copy link
Member Author

Without looking any further - I have no idea. So you're just as likely to be able to fix that as I am 😀

@abarciauskas-bgse abarciauskas-bgse self-assigned this Mar 5, 2025
Copy link

codecov bot commented Mar 6, 2025

Codecov Report

Attention: Patch coverage is 90.28571% with 17 lines in your changes missing coverage. Please review.

Project coverage is 89.09%. Comparing base (83f9be8) to head (09882f2).

Files with missing lines Patch % Lines
virtualizarr/writers/kerchunk.py 33.33% 12 Missing ⚠️
virtualizarr/manifests/array.py 87.50% 2 Missing ⚠️
virtualizarr/codecs.py 98.00% 1 Missing ⚠️
virtualizarr/translators/kerchunk.py 97.29% 1 Missing ⚠️
virtualizarr/utils.py 94.73% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #453      +/-   ##
==========================================
+ Coverage   88.70%   89.09%   +0.38%     
==========================================
  Files          28       27       -1     
  Lines        1753     1687      -66     
==========================================
- Hits         1555     1503      -52     
+ Misses        198      184      -14     
Files with missing lines Coverage Δ
virtualizarr/manifests/array_api.py 89.89% <100.00%> (ø)
virtualizarr/manifests/utils.py 89.23% <100.00%> (+3.77%) ⬆️
virtualizarr/readers/dmrpp.py 91.89% <100.00%> (-0.05%) ⬇️
virtualizarr/readers/hdf/hdf.py 95.33% <100.00%> (+0.02%) ⬆️
virtualizarr/writers/icechunk.py 91.83% <100.00%> (ø)
virtualizarr/codecs.py 96.15% <98.00%> (+18.37%) ⬆️
virtualizarr/translators/kerchunk.py 93.84% <97.29%> (+0.70%) ⬆️
virtualizarr/utils.py 85.07% <94.73%> (+3.44%) ⬆️
virtualizarr/manifests/array.py 67.30% <87.50%> (-1.32%) ⬇️
virtualizarr/writers/kerchunk.py 77.63% <33.33%> (-12.85%) ⬇️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@abarciauskas-bgse
Copy link
Collaborator

@TomNicholas working with @sharkinsspatial we found this change in zarr-python 3.0.4 release caused the change in filters from [] to null, causing the test failure: zarr-developers/zarr-python#2847. I have updated the test expectation so now tests are passing.

@TomNicholas
Copy link
Member Author

Woo! Does this mean we can merged this?? Or am I forgetting other things that would break (endianess?)

@TomNicholas
Copy link
Member Author

(Also @abarciauskas-bgse I'm not sure how commit credit really works in GH but if you wanted the repo history to definitely reflect all the work you did here I would be happy to close this PR in favour of you opening and merging one that does the same thing.)

@abarciauskas-bgse
Copy link
Collaborator

I can reopen it, thanks @TomNicholas, but it does still include xfailing tests for NetCDF3, FITS (which use big endian) and datetime data type. It sounded like @d-v-b could be working on adding more data type support?

@abarciauskas-bgse
Copy link
Collaborator

(I'll leave this PR open until we determine if we're ok with allowing those data types to go unsupported for now)

@TomNicholas
Copy link
Member Author

FITS is unimportant IMO, I doubt even a single person is using it.

The datetime data type matters, but will become less important once I change the default to load all 1D coordinates into memory, as I assume that most occurrences of the datetime type are in small coordinate variables that will or can be loaded.

netCDF3 is more annoying, as there is no workaround for that. But I'm also not sure it's worth holding this up, as merging this will enable lots of good stuff.

@maxrjones
Copy link
Member

Since we are working off develop now, I think this was superceded by #482

@maxrjones maxrjones deleted the zarr-python-3.0 branch March 7, 2025 22:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Development

Successfully merging this pull request may close these issues.

3 participants