-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Include chunk shape as a parameter in stream resource for HDF dataset #544
Conversation
Note that in the tests the mock AD ophyd devices read the num frames in chunk signal as 0, with a real AD this cannot be set to below 1 |
…unk-size-in-sres
Tested and working as expected with a real PandA and a real AD:
|
I'm generally confused about chunk size. h5py suggests that the chunk can be anything, so part of an image, or many images. Tiled looks like it might be treating AreaDetector uses This PR gives We also need to document the parameters for a StreamResource @danielballan any opinions? |
By the way, it would appear that
So PandA shouldn't be 0, but 1024. Maybe we need a |
For AD, the chunk size auto basically guarantees that the 2nd, 3rd, and for color images 4th chunking dimensions will just be the dimensions of the frame (you can set this with chunk size auto = No as well, you just need to explicitly define the chunking dimensions). Then Num frames chunks is basically number of frames in each chunk in the first dimension. For PandA I admittedly have not looked at a dataset w/ a large amount of points - all the ones I have looked at had all data in one chunk, which for tiled purposes 0 or None would cover. If it is set to some preset size, I do think that exposing that via the IOC is the way to go, both to be able to tune the chunk size as desired, and to have accurate size info in the doc. I would be OK with a |
Decided on |
The test failures appear unrelated to the changes made here. They seem to be the same as the ones in the first commit in #559 |
src/ophyd_async/core/_hdf_dataset.py
Outdated
# Represents explicit chunk size written to disk. | ||
# The first dimension represents the number of frames/points | ||
# per equally-sized chunk in the first chunking dimension. | ||
chunk_size: tuple | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is empty tuple instead of None ok @genematx ?
chunk_size: tuple | None = None | |
chunk_size: tuple[int, ...] = () |
The new changes make the tests use realistic values for shape/chunk size for standard detectors, and also makes sure that chunk size is always a tuple, potentially an empty one. I did find when testing with real devices that the tuples are currently not being handled correctly. I think a PR into the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One comment on the tests, apart from that I think this looks good
from ophyd_async.epics import adkinetix | ||
from ophyd_async.epics.adkinetix import KinetixDetector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually imported adkinetix
rather than from adkinetix import KinetixDetector
as that is the pattern we are about to recommend in all the tutorials, so we updated all the tests to follow that pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me and good to go after @coretl's comment about importing is addressed.
# TODO: Update chunk size to read signal once available in IOC | ||
# Currently PandA IOC sets chunk size to 1024 points per chunk |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make it an issue and link here.
tests/epics/conftest.py
Outdated
async def generate_ad_standard_det(ad_standard_detector_class, number=1): | ||
detector_name = ad_standard_detector_class.__name__ | ||
if detector_name.endswith("Detector"): | ||
detector_name = detector_name[:-8] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it because of the len("Detector") == 8
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I can maybe change it to detector_name[:-(len("Detector"))]
to make it more obvious what it is doing
src/ophyd_async/core/_hdf_dataset.py
Outdated
# Represents explicit chunk size written to disk. | ||
# The first dimension represents the number of frames/points | ||
# per equally-sized chunk in the first chunking dimension. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the second half of this is actually always true, we happen to make this so for areaDetector, but in the general case the chunk size is just that, the size of the HDF chunk. This may or may not correspond to a multiple of images, it might be half an image in one direction and 15 in the other, but its magnitude is in the base datatype of the dataset.
I wonder if chunk_shape
might be a better name to make people think of shape
in the output of describe()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually had the same thought yesterday about calling it chunk_shape
. If it's not too late, I'd +1 to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No preference from me about chunk_shape
over chunk_size
- perhaps lets discuss tomorrow at the sprint catchup, and we can settle on a naming scheme
…-async into chunk-size-in-sres
…unk-size-in-sres
I don't really understand the test failures here - I am seeing intermittent failures on sim detector, pilatus, and panda tests. Out of these, this PR only touches the PandA one, and even then it's a fairly minor change to just add chunk shape. Any ideas? @evalott100 Edit: The tests all pass if I change
to
I am not sure I understand why this behaves as it does, but will leave like this for now to facilitate merging. |
Tiled requires information on chunk sizes to read back data. See:
https://github.com/bluesky/bluesky/blob/978995b69c00cc0b9bef5103bdb31b665a6fbed9/src/bluesky/consolidators.py#L129
If chunk size is set to None, 0, or does not exist, assumption is made that the entire dataset is in one chunk.
Otherwise, the chunk size defines the number of elements in each chunk, of equal size with the exception of the last chunk which can be smaller.
Marking as a WIP until I have a chance to test w/ some real devices.