-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic support for HDF5 filters #350
Conversation
So far only blosc and zstd. The mapping between hdf5 filters and numcodec filters is not trivial.
kerchunk/hdf.py
Outdated
def _decode_filters(self, h5obj: Union[h5py.Dataset, h5py.Group]): | ||
if len(h5obj._filters.keys()) > 1: | ||
raise RuntimeError( | ||
f"{h5obj.name} uses multiple filters {list (h5obj._filters.keys())}. This is not supported by kerchunk." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This came up in #342 - actually zarr does support multiple filters, except only the final one is called "compression"
raise RuntimeError( | ||
f"{h5obj.name} uses multiple filters {list (h5obj._filters.keys())}. This is not supported by kerchunk." | ||
) | ||
for filter_id, properties in h5obj._filters.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am 95% sure that setting a list like
filters=[numcodecs.get_codec(self.decoders[filter_id](properties) for filter_id, properties in h5obj._filters.items()]
will work fine with compression=None (or we can make the last one the compression), if we include gzip/zlib in this function. Of course, would need to check the order assumed by numcodecs matches hdf.
What would you need to get unstuck here? We should be able to generate HDF files with various compressions even double-compression. |
Merging this, thanks. Should support multiple filters, but this is not tested. It stuffs all filters into zarr's "filters" without calling any one of them the compression, which is fine. |
So far only blosc and zstd.
The mapping between hdf5 filters and numcodec filters is not trivial.