H5 context management #24

d-v-b · 2022-04-05T01:18:04Z

extends h5py.Dataset and h5py.Group to have __enter__ and __exit__ methods, where __exit__ closes the file.
use context-managed datasets and groups in tests
add a test for the h5 kwarg partitioning
(untested) access_h5 can optionally take an h5py.File instance as the first argument

mkitti

I like the general direction.

track_order may need special handling. If it is set, perhaps just distribute it to the File, Group, and Dataset as needed.

Consider allowing path to be empty or None. In the case of empty, returning h5f may make sense. None is used to create anonymous datasets.

Anonymous datasets allow for temporary storage within the file, the create of spacers, and has been used by HDF5-EOS for storing some Zarr compat metadata.

mkitti · 2022-04-05T12:27:02Z

src/fibsem_tools/io/h5.py

+    "allow_unknown_filter",
+)
+
+H5_GROUP_KWDS = ("name", "track_order")


Since track_order is a common keyword to Datasets, Group's, and Files perhaps this parameter should be passed to all objects when they are created.

mkitti · 2022-04-05T12:29:24Z

src/fibsem_tools/io/h5.py

-    for key in H5_DATASET_KWDS:
-        if key in file_kwargs:
+    for key in kwargs:
+        if key in H5_DATASET_KWDS:
            dataset_kwargs[key] = file_kwargs.pop(key)



Consider keeping track_order as a File keyword. Also return Group keywords that may contain track_order.

mkitti · 2022-04-05T12:38:31Z

src/fibsem_tools/io/h5.py

            dataset_kwargs[key] = file_kwargs.pop(key)

    return file_kwargs, dataset_kwargs


 def access_h5(
-    store: Pathlike, path: Pathlike, mode: str, **kwargs
+    store: Union[h5py.File, Pathlike], path: Pathlike, **kwargs


Suggested change

store: Union[h5py.File, Pathlike], path: Pathlike, **kwargs

store: Union[h5py.File, Pathlike], path: Union[Pathlike, NoneType], **kwargs

None is a valid dataset name and is used to create anonymous datasets.

NoneType returned in 3.10 https://docs.python.org/3/library/types.html?highlight=nonetype#types.NoneType

interesting, how is that different from naming a dataset with the empty string?

For one, providing an empty string results in a TypeError in h5py:

In [22]: with h5py.File("test.hdf5", "w") as h5f: ...: h5f.create_dataset(None, data = np.zeros((5,5))) ...: print(list(h5f.keys())) ...: ...: [] In [23]: with h5py.File("test.hdf5", "w") as h5f: ...: h5f.create_dataset("", data = np.zeros((5,5))) ...: print(list(h5f.keys())) ...: ...: --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-23-5230edf15b52> in <module> 1 with h5py.File("test.hdf5", "w") as h5f: ----> 2 h5f.create_dataset("", data = np.zeros((5,5))) 3 print(list(h5f.keys())) 4 5 ~\.julia\conda\3\lib\site-packages\h5py\_hl\group.py in create_dataset(self, name, shape, dtype, data, **kwds) 147 group = self.require_group(parent_path) 148 --> 149 dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds) 150 dset = dataset.Dataset(dsid) 151 return dset ~\.julia\conda\3\lib\site-packages\h5py\_hl\dataset.py in make_new_dset(parent, shape, dtype, data, name, chunks, compression, shuffle, fletcher32, maxshape, compression_opts, fillvalue, scaleoffset, track_times, external, track_order, dcpl, allow_unknown_filter) 140 141 --> 142 dset_id = h5d.create(parent.id, name, tid, sid, dcpl=dcpl) 143 144 if (data is not None) and (not isinstance(data, Empty)): h5py\_objects.pyx in h5py._objects.with_phil.wrapper() h5py\_objects.pyx in h5py._objects.with_phil.wrapper() h5py\h5d.pyx in h5py.h5d.create() TypeError: expected bytes, str found In [24]: with h5py.File("test.hdf5", "w") as h5f: ...: h5f.create_dataset(b"", data = np.zeros((5,5))) ...: print(list(h5f.keys())) ...: ...: --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-24-7f1869f34ebe> in <module> 1 with h5py.File("test.hdf5", "w") as h5f: ----> 2 h5f.create_dataset(b"", data = np.zeros((5,5))) 3 print(list(h5f.keys())) 4 5 ~\.julia\conda\3\lib\site-packages\h5py\_hl\group.py in create_dataset(self, name, shape, dtype, data, **kwds) 147 group = self.require_group(parent_path) 148 --> 149 dsid = dataset.make_new_dset(group, shape, dtype, data, name, **kwds) 150 dset = dataset.Dataset(dsid) 151 return dset ~\.julia\conda\3\lib\site-packages\h5py\_hl\dataset.py in make_new_dset(parent, shape, dtype, data, name, chunks, compression, shuffle, fletcher32, maxshape, compression_opts, fillvalue, scaleoffset, track_times, external, track_order, dcpl, allow_unknown_filter) 140 141 --> 142 dset_id = h5d.create(parent.id, name, tid, sid, dcpl=dcpl) 143 144 if (data is not None) and (not isinstance(data, Empty)): h5py\_objects.pyx in h5py._objects.with_phil.wrapper() h5py\_objects.pyx in h5py._objects.with_phil.wrapper() h5py\h5d.pyx in h5py.h5d.create() ValueError: Unable to create dataset (no name given)

mkitti · 2022-04-05T12:39:20Z

src/fibsem_tools/io/h5.py

+            if "name" in dataset_kwargs:
+                warnings.warn(
+                    '"Name" was provided to this function as a keyword argument. This value will be replaced with the second argument to this function.'
+                )
            dataset_kwargs["name"] = path
            result = h5f.create_dataset(**dataset_kwargs)
        else:
            result = h5f.require_group(path)


Allow track_order to be passed to the group as well.

mkitti · 2022-04-05T12:41:10Z

src/fibsem_tools/io/h5.py

+    else:
+        h5f = h5py.File(store, **file_kwargs)
+
+    if mode in ("r", "r+", "a"):


Perhaps if path is empty, just return the File, h5f.

If we are in r+ or a mode, perhaps the path does not exist because we are trying to create it. In those cases, perhaps you should catch the error.

mkitti · 2022-04-05T12:43:22Z

src/fibsem_tools/io/h5.py

        result.attrs.update(**attrs)

-        return result
+    if isinstance(result, h5py.Group):


A File is also a Group. In the case of a File, you do not need a new context manager.

mkitti · 2022-04-05T12:44:55Z

tests/test_h5.py

+        **dataset_kwargs, **file_kwargs
+    )
+    assert file_kwargs == file_kwargs_out
+    assert dataset_kwargs == dataset_kwargs_out


Test for common keyword arguments like track_order.

mkitti · 2022-04-05T13:08:52Z

Here is the reference for the use of anonymous datasets to store Zarr chunk information:
http://www.hdfeos.org/workshops/ws23/presentations/axj.pdf

d-v-b added 2 commits April 4, 2022 21:12

add h5 object wrappers and some tests

4655d27

remove mpi stuff

57e001a

d-v-b mentioned this pull request Apr 5, 2022

Use context manager for file handles in case assertions fail #23

Closed

black

a869511

mkitti reviewed Apr 5, 2022

View reviewed changes

mkitti mentioned this pull request May 9, 2022

Metadata in the conversion of DAT file to HDF5 file JaneliaSciComp/jeiss_fibsem_labview_control#2

Open

mkitti mentioned this pull request May 23, 2022

Minimal changes to the .DAT file format to avoid storge, bandwidth and processing overheads JaneliaSciComp/jeiss_fibsem_labview_control#4

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

H5 context management #24

H5 context management #24

d-v-b commented Apr 5, 2022

mkitti left a comment

mkitti Apr 5, 2022

mkitti Apr 5, 2022

mkitti Apr 5, 2022

mkitti Apr 5, 2022

d-v-b Apr 11, 2022

mkitti Apr 11, 2022

mkitti Apr 5, 2022

mkitti Apr 5, 2022

mkitti Apr 5, 2022

mkitti Apr 5, 2022

mkitti Apr 5, 2022

mkitti commented Apr 5, 2022

	store: Union[h5py.File, Pathlike], path: Pathlike, **kwargs
	store: Union[h5py.File, Pathlike], path: Union[Pathlike, NoneType], **kwargs

H5 context management #24

Are you sure you want to change the base?

H5 context management #24

Conversation

d-v-b commented Apr 5, 2022

mkitti left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mkitti commented Apr 5, 2022