Include chunk shape as a parameter in stream resource for HDF dataset #544

jwlodek · 2024-08-28T17:42:02Z

Tiled requires information on chunk sizes to read back data. See:
https://github.com/bluesky/bluesky/blob/978995b69c00cc0b9bef5103bdb31b665a6fbed9/src/bluesky/consolidators.py#L129

If chunk size is set to None, 0, or does not exist, assumption is made that the entire dataset is in one chunk.

Otherwise, the chunk size defines the number of elements in each chunk, of equal size with the exception of the last chunk which can be smaller.

Marking as a WIP until I have a chance to test w/ some real devices.

…desc

jwlodek · 2024-08-28T17:43:35Z

Note that in the tests the mock AD ophyd devices read the num frames in chunk signal as 0, with a real AD this cannot be set to below 1

tests/epics/adaravis/test_aravis.py

tests/fastcs/panda/test_hdf_panda.py

src/ophyd_async/core/_hdf_dataset.py

…unk-size-in-sres

jwlodek · 2024-09-04T18:37:52Z

Tested and working as expected with a real PandA and a real AD:

pass-315284 [21]: list(panda1._writer._file.stream_resources())
Out[21]: 
[{'uid': 'ea522eb1-e4b0-40e8-ada1-01f21bf12858',
  'data_key': 'Arm Timestamp',
  'mimetype': 'application/x-hdf5',
  'uri': 'file://localhost/nsls2/data/hex/proposals/2024-2/pass-315284/assets/panda1/2024/09/04/panda1_899280bb-5156-4558-989e-4a4de222b6bc.h5',
  'parameters': {'dataset': '/Arm Timestamp',
   'swmr': False,
   'multiplier': 1,
   'chunk_size': None}},
 {'uid': '8ecc70c5-c035-4c2e-98cc-f5b17951bdac',
  'data_key': 'Trigger Timestamp',
  'mimetype': 'application/x-hdf5',
  'uri': 'file://localhost/nsls2/data/hex/proposals/2024-2/pass-315284/assets/panda1/2024/09/04/panda1_899280bb-5156-4558-989e-4a4de222b6bc.h5',
  'parameters': {'dataset': '/Trigger Timestamp',
   'swmr': False,
   'multiplier': 1,
   'chunk_size': None}}]

pass-315284 [28]: list(kinetix1._writer._file.stream_resources())
Out[28]: 
[{'uid': '3f479788-341d-4ac0-913e-ec490bb6e084',
  'data_key': 'kinetix-det1',
  'mimetype': 'application/x-hdf5',
  'uri': 'file://localhost/home/xf27id1',
  'parameters': {'dataset': '/entry/data/data',
   'swmr': False,
   'multiplier': 1,
   'chunk_size': 1}}]

Co-authored-by: Eugene <[email protected]>

tests/epics/adaravis/test_aravis.py

tests/epics/advimba/test_vimba.py

tests/fastcs/panda/test_hdf_panda.py

coretl · 2024-09-05T11:41:36Z

I'm generally confused about chunk size.

h5py suggests that the chunk can be anything, so part of an image, or many images.

Tiled looks like it might be treating chunk_size as number of datums per chunk

AreaDetector uses NumFramesChunk to work out how many NDArrays to put in a chunk, but looks like only when ChunkSizeAuto is Yes

This PR gives chunk_size as an integer, but doesn't describe what it is. I suggest it is called something like datums_per_chunk or frames_per_chunk so that it is clear what it means.

We also need to document the parameters for a StreamResource mimetype somewhere, and refer to it in the HDFFile documentation.

@danielballan any opinions?

coretl · 2024-09-05T11:51:27Z

By the way, it would appear that h5py has selected some chunking for us in PandA:

$ h5ls -v panda2.h5
Opened "panda2.h5" with sec2 driver.
PCAP.GATE_DURATION.Value Dataset {467/Inf}
    Location:  1:195
    Links:     1
    Chunks:    {1024} 8192 bytes
    Storage:   3736 logical bytes, 8192 allocated bytes, 45.61% utilization
    Type:      native double
x-rbv                    Dataset {467/Inf}
    Location:  1:999
    Links:     1
    Chunks:    {1024} 8192 bytes
    Storage:   3736 logical bytes, 8192 allocated bytes, 45.61% utilization
    Type:      native double
x-rbv-max                Dataset {467/Inf}
    Location:  1:731
    Links:     1
    Chunks:    {1024} 8192 bytes
    Storage:   3736 logical bytes, 8192 allocated bytes, 45.61% utilization
    Type:      native double
x-rbv-min                Dataset {467/Inf}
    Location:  1:463
    Links:     1
    Chunks:    {1024} 8192 bytes
    Storage:   3736 logical bytes, 8192 allocated bytes, 45.61% utilization
    Type:      native double

So PandA shouldn't be 0, but 1024. Maybe we need a PANDA:DATA:NUM_PER_CHUNK PV that queries the chunking information from h5py and provides this?

jwlodek · 2024-09-05T12:56:25Z

For AD, the chunk size auto basically guarantees that the 2nd, 3rd, and for color images 4th chunking dimensions will just be the dimensions of the frame (you can set this with chunk size auto = No as well, you just need to explicitly define the chunking dimensions). Then Num frames chunks is basically number of frames in each chunk in the first dimension.

For PandA I admittedly have not looked at a dataset w/ a large amount of points - all the ones I have looked at had all data in one chunk, which for tiled purposes 0 or None would cover. If it is set to some preset size, I do think that exposing that via the IOC is the way to go, both to be able to tune the chunk size as desired, and to have accurate size info in the doc.

I would be OK with a frames_per_chunk name, I think datums_per_chunk doesn't work as well, since currently we can emit a single datum doc that lists all the indicies but still have many chunks. I.e. if I do a software flyscan w/ a StandardDetector and collect 200 points I get one datum doc that lists a start/stop of 0 and 199, but my chunk size is 1.

jwlodek · 2024-09-05T13:04:40Z

After looking at a PandA dataset with 1801 points, it does seem like the chunk size is being set to 1024 (presumably by h5py). I suggest we expose that as a PV in the IOC, and get the value from there - for now I will set it to 1024 for this PR with a TODO.

…unk-size-in-sres

coretl · 2024-09-05T13:44:42Z

Decided on chunk_size and it to be tuple[int, ...], with (<nchunks>, <det_h>, <det_w>) for the 2D areaDetector example

jwlodek · 2024-09-05T14:38:08Z

The test failures appear unrelated to the changes made here. They seem to be the same as the ones in the first commit in #559

src/ophyd_async/epics/adcore/_hdf_writer.py

coretl · 2024-09-06T15:44:09Z

src/ophyd_async/core/_hdf_dataset.py

+    # Represents explicit chunk size written to disk.
+    # The first dimension represents the number of frames/points
+    # per equally-sized chunk in the first chunking dimension.
+    chunk_size: tuple | None = None


Is empty tuple instead of None ok @genematx ?

Suggested change

chunk_size: tuple | None = None

chunk_size: tuple[int, ...] = ()

src/ophyd_async/fastcs/panda/_writer.py

tests/epics/adaravis/test_aravis.py

…fy ad standard det tests

jwlodek · 2024-09-09T14:31:03Z

The new changes make the tests use realistic values for shape/chunk size for standard detectors, and also makes sure that chunk size is always a tuple, potentially an empty one. I did find when testing with real devices that the tuples are currently not being handled correctly. I think a PR into the bluesky repo will need to accompany this one. I will work on that now.

…l get

…-async into chunk-size-in-sres

coretl

One comment on the tests, apart from that I think this looks good

coretl · 2024-09-09T17:30:37Z

tests/epics/adkinetix/test_kinetix.py

-from ophyd_async.epics import adkinetix
+from ophyd_async.epics.adkinetix import KinetixDetector


We actually imported adkinetix rather than from adkinetix import KinetixDetector as that is the pattern we are about to recommend in all the tutorials, so we updated all the tests to follow that pattern

mrakitin

Looks good to me and good to go after @coretl's comment about importing is addressed.

mrakitin · 2024-09-10T14:03:59Z

src/ophyd_async/fastcs/panda/_writer.py

+            # TODO: Update chunk size to read signal once available in IOC
+            # Currently PandA IOC sets chunk size to 1024 points per chunk


Please make it an issue and link here.

mrakitin · 2024-09-10T14:16:00Z

tests/epics/conftest.py

+    async def generate_ad_standard_det(ad_standard_detector_class, number=1):
+        detector_name = ad_standard_detector_class.__name__
+        if detector_name.endswith("Detector"):
+            detector_name = detector_name[:-8]


Is it because of the len("Detector") == 8?

Yep, I can maybe change it to detector_name[:-(len("Detector"))] to make it more obvious what it is doing

coretl · 2024-09-11T15:04:15Z

src/ophyd_async/core/_hdf_dataset.py

+    # Represents explicit chunk size written to disk.
+    # The first dimension represents the number of frames/points
+    # per equally-sized chunk in the first chunking dimension.


I don't think the second half of this is actually always true, we happen to make this so for areaDetector, but in the general case the chunk size is just that, the size of the HDF chunk. This may or may not correspond to a multiple of images, it might be half an image in one direction and 15 in the other, but its magnitude is in the base datatype of the dataset.

I wonder if chunk_shape might be a better name to make people think of shape in the output of describe()?

I actually had the same thought yesterday about calling it chunk_shape. If it's not too late, I'd +1 to that.

No preference from me about chunk_shape over chunk_size - perhaps lets discuss tomorrow at the sprint catchup, and we can settle on a naming scheme

…-async into chunk-size-in-sres

…unk-size-in-sres

jwlodek · 2024-09-17T21:23:11Z

I don't really understand the test failures here - I am seeing intermittent failures on sim detector, pilatus, and panda tests. Out of these, this PR only touches the PandA one, and even then it's a fairly minor change to just add chunk shape. Any ideas? @evalott100

Edit: The tests all pass if I change

await self.hdf.chunk_size_auto.set(True)

to

await asyncio.gather(self.hdf.chunk_size_auto.set(True))

I am not sure I understand why this behaves as it does, but will leave like this for now to facilitate merging.

jwlodek added 6 commits August 28, 2024 10:57

Adding record for num frames in chunk along with chunk_size field in …

2202413

…desc

Attributes are saved all in a single chunk

02332c3

Update tests to account for chunk_size datakey parameter

561da46

Chunk size should be in sres not desc

edc8f1c

Move chunk size to sres parameters

3cfb4e7

Refactor tests to reflect changes

8117e4b

jwlodek requested review from genematx and danielballan August 28, 2024 17:42

jwlodek requested a review from coretl August 28, 2024 17:43

genematx approved these changes Aug 28, 2024

View reviewed changes

tests/epics/adaravis/test_aravis.py Outdated Show resolved Hide resolved

tests/fastcs/panda/test_hdf_panda.py Outdated Show resolved Hide resolved

src/ophyd_async/core/_hdf_dataset.py Outdated Show resolved Hide resolved

Merge branch 'main' of https://github.com/bluesky/ophyd-async into ch…

3313d1a

…unk-size-in-sres

jwlodek marked this pull request as ready for review September 4, 2024 18:36

jwlodek and others added 2 commits September 4, 2024 14:40

chunk size can be int or none

ada0b15

Co-authored-by: Eugene <[email protected]>

Update chunk size signal to non-zero in one of the AD test sets

a835183

coretl reviewed Sep 5, 2024

View reviewed changes

tests/epics/adaravis/test_aravis.py Outdated Show resolved Hide resolved

tests/epics/advimba/test_vimba.py Outdated Show resolved Hide resolved

tests/fastcs/panda/test_hdf_panda.py Outdated Show resolved Hide resolved

jwlodek added 3 commits September 5, 2024 09:05

Merge branch 'main' of https://github.com/bluesky/ophyd-async into ch…

31dfcb1

…unk-size-in-sres

Use correct chunk size for PandA, make sure we use chunk size auto

54e42f0

Add comment on chunk size

ad8b63f

Make chunk_size a tuple that explicitly describes all chunk dims

9856e2e

coretl reviewed Sep 6, 2024

View reviewed changes

jwlodek added 2 commits September 9, 2024 10:23

Make sure chunk size is tuple even with one dim, update tests, simpli…

386e61b

…fy ad standard det tests

Make chunk_size always tuple of int, default to empty tuple

415c398

Merge branch 'main' into chunk-size-in-sres

deec8d1

jwlodek added 2 commits September 9, 2024 13:12

Use readback value to avoid disconnect between actual value and signa…

5ad4b7a

…l get

Merge branch 'chunk-size-in-sres' of https://github.com/jwlodek/ophyd…

09b51b3

…-async into chunk-size-in-sres

jwlodek mentioned this pull request Sep 9, 2024

Account for chunk size being a tuple instead of just an int bluesky/bluesky#1802

Open

coretl reviewed Sep 9, 2024

View reviewed changes

mrakitin approved these changes Sep 10, 2024

View reviewed changes

jwlodek added 3 commits September 10, 2024 13:02

Follow import convention for tests

578969c

Make use of slicing for detector name in ad_standard_det_factory clearer

00275f3

Merge branch 'main' into chunk-size-in-sres

025ac06

coretl reviewed Sep 11, 2024

View reviewed changes

jwlodek added 6 commits September 16, 2024 10:44

Rename chunk size to chunk shape

1600a2c

Merge branch 'main' into chunk-size-in-sres

4a76859

Add space for linting

f105e34

Merge branch 'chunk-size-in-sres' of https://github.com/jwlodek/ophyd…

395b58b

…-async into chunk-size-in-sres

Merge branch 'main' of https://github.com/bluesky/ophyd-async into ch…

3821fc6

…unk-size-in-sres

Fix test

398ac01

jwlodek changed the title ~~Include chunk size as a parameter in stream resource for HDF dataset~~ Include chunk shape as a parameter in stream resource for HDF dataset Sep 17, 2024

jwlodek added 4 commits September 18, 2024 10:04

Merge with upstream

9e28151

Fix merge conflict

0bd5ba2

Simplifying ad standard det factory fixture

4d01dae

Fix unawaited task issue

4f15ac1

jwlodek requested review from mrakitin, coretl and genematx September 18, 2024 21:10

kinetix fixture doesn't need to be async

4d4e12e

coretl approved these changes Sep 19, 2024

View reviewed changes

jwlodek merged commit 35c07fe into bluesky:main Sep 19, 2024
11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include chunk shape as a parameter in stream resource for HDF dataset #544

Include chunk shape as a parameter in stream resource for HDF dataset #544

jwlodek commented Aug 28, 2024

jwlodek commented Aug 28, 2024

jwlodek commented Sep 4, 2024

coretl commented Sep 5, 2024

coretl commented Sep 5, 2024

jwlodek commented Sep 5, 2024

jwlodek commented Sep 5, 2024 •

edited

Loading

coretl commented Sep 5, 2024 •

edited

Loading

jwlodek commented Sep 5, 2024

coretl Sep 6, 2024

jwlodek commented Sep 9, 2024

coretl left a comment

coretl Sep 9, 2024

mrakitin left a comment

mrakitin Sep 10, 2024

mrakitin Sep 10, 2024

jwlodek Sep 10, 2024 •

edited

Loading

coretl Sep 11, 2024

genematx Sep 11, 2024

jwlodek Sep 11, 2024

jwlodek commented Sep 17, 2024 •

edited

Loading

	chunk_size: tuple \| None = None
	chunk_size: tuple[int, ...] = ()

		from ophyd_async.epics import adkinetix
		from ophyd_async.epics.adkinetix import KinetixDetector

		# TODO: Update chunk size to read signal once available in IOC
		# Currently PandA IOC sets chunk size to 1024 points per chunk

Include chunk shape as a parameter in stream resource for HDF dataset #544

Include chunk shape as a parameter in stream resource for HDF dataset #544

Conversation

jwlodek commented Aug 28, 2024

jwlodek commented Aug 28, 2024

jwlodek commented Sep 4, 2024

coretl commented Sep 5, 2024

coretl commented Sep 5, 2024

jwlodek commented Sep 5, 2024

jwlodek commented Sep 5, 2024 • edited Loading

coretl commented Sep 5, 2024 • edited Loading

jwlodek commented Sep 5, 2024

coretl Sep 6, 2024

Choose a reason for hiding this comment

jwlodek commented Sep 9, 2024

coretl left a comment

Choose a reason for hiding this comment

coretl Sep 9, 2024

Choose a reason for hiding this comment

mrakitin left a comment

Choose a reason for hiding this comment

mrakitin Sep 10, 2024

Choose a reason for hiding this comment

mrakitin Sep 10, 2024

Choose a reason for hiding this comment

jwlodek Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

coretl Sep 11, 2024

Choose a reason for hiding this comment

genematx Sep 11, 2024

Choose a reason for hiding this comment

jwlodek Sep 11, 2024

Choose a reason for hiding this comment

jwlodek commented Sep 17, 2024 • edited Loading

jwlodek commented Sep 5, 2024 •

edited

Loading

coretl commented Sep 5, 2024 •

edited

Loading

jwlodek Sep 10, 2024 •

edited

Loading

jwlodek commented Sep 17, 2024 •

edited

Loading