Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic support for HDF5 filters #350

Merged
merged 4 commits into from
Aug 31, 2023
Merged

Conversation

florianziemen
Copy link
Contributor

So far only blosc and zstd.
The mapping between hdf5 filters and numcodec filters is not trivial.

So far only blosc and zstd.
The mapping between hdf5 filters and numcodec filters is not trivial.
kerchunk/hdf.py Outdated
def _decode_filters(self, h5obj: Union[h5py.Dataset, h5py.Group]):
if len(h5obj._filters.keys()) > 1:
raise RuntimeError(
f"{h5obj.name} uses multiple filters {list (h5obj._filters.keys())}. This is not supported by kerchunk."
Copy link
Member

@martindurant martindurant Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This came up in #342 - actually zarr does support multiple filters, except only the final one is called "compression"

raise RuntimeError(
f"{h5obj.name} uses multiple filters {list (h5obj._filters.keys())}. This is not supported by kerchunk."
)
for filter_id, properties in h5obj._filters.items():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am 95% sure that setting a list like

filters=[numcodecs.get_codec(self.decoders[filter_id](properties) for filter_id, properties in h5obj._filters.items()]

will work fine with compression=None (or we can make the last one the compression), if we include gzip/zlib in this function. Of course, would need to check the order assumed by numcodecs matches hdf.

@martindurant
Copy link
Member

What would you need to get unstuck here? We should be able to generate HDF files with various compressions even double-compression.

@martindurant martindurant marked this pull request as ready for review August 31, 2023 17:45
@martindurant
Copy link
Member

Merging this, thanks. Should support multiple filters, but this is not tested. It stuffs all filters into zarr's "filters" without calling any one of them the compression, which is fine.

@martindurant martindurant merged commit 4e42bdb into fsspec:main Aug 31, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants