Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PandA dataset naming #330

Closed
coretl opened this issue May 23, 2024 · 13 comments · Fixed by #369
Closed

PandA dataset naming #330

coretl opened this issue May 23, 2024 · 13 comments · Fixed by #369
Assignees

Comments

@coretl
Copy link
Collaborator

coretl commented May 23, 2024

At the moment the PandA HDF writer will always write datasets like "INENC1.VAL.Mean", "COUNTER1.OUT.Diff". These are exposed in the descriptor as "panda1-inenc-1-val-Mean" and "panda1-counter-1-out-Diff". These don't make any scientific sense, so we should allow scientific names to be given to the datasets and set the ophyd name accordingly.

The options:

  1. Pass in HDFPanda(prefix, dp, dataset_names = {"inenc-1-val": "sample-x", "counter-1-out": "izero"})
  2. Post init but pre-connect we get the blocks, but don't know how many there are, and things like "INENC" are difficult as they are target specific
  3. Post init and post connect we get connected blocks and can do panda1.set_datsets({panda1.inenc[1].val: "sample-x"}) in the plan
@coretl
Copy link
Collaborator Author

coretl commented May 23, 2024

Decision is:

  • Keep type hinting, and have target specific flavours of HDF PandA
  • Drop type hinting, and keep one HDFPandA

@coretl
Copy link
Collaborator Author

coretl commented May 23, 2024

@callumforrester has my favourite suggestion so far:

  • Make HDFPanda.dataset_names: SignalRW[dict[str, str]], e.g. {"inenc-1-val": "sample-x", "counter-1-out": "izero"}
  • Keep the ophyd names as they are
  • Load/save the dataset names with the other attributes
  • At DetectorWriter.open then for each signal_name in dataset_names.keys():
    • Grab the value of signal_name + "_capture"
    • If it is "No" then ignore it
    • If it has a single value then give it the scientific name
    • If it is "Min Max" then expose scientific_name + "-min" and scientific_name + "-max"
    • If it is "Min Max Mean" then expose scientific_name + "-min", scientific_name + "-max" and scientific_name
  • We don't ever change the name of the signal, so bps.rd will still have panda1-inenc-1-val, this is because we would still have to bps.rd(panda1.enc[1].val) so would be surprising if it had a different name, plus it's difficult to get a soft signal to change the names of things outside itself without making spaghetti

@evalott100 evalott100 self-assigned this May 29, 2024
@evalott100
Copy link
Contributor

evalott100 commented May 29, 2024

As part of this issue, I'm going to move save and load to the ophyd level. It doesn't really make sense to me that they're currently plan stubs - they currently assume Locatable devices too.

I think Saveable and Loadable make sense as protocols.

@callumforrester
Copy link
Contributor

The current save/load functions don't need any protocols, they just walk the device and set signals. It's only when we need some intelligence in a device (i.e. it knows the special way to save/load itself) that we need a protocol. Is there a use case for that here?

@evalott100
Copy link
Contributor

Hmmm true. I supposes I could do Device::load_device and Device::save_device

@coretl
Copy link
Collaborator Author

coretl commented Jun 3, 2024

As part of this issue, I'm going to move save and load to the ophyd level. It doesn't really make sense to me that they're currently plan stubs - they currently assume Locatable devices too.

I think Saveable and Loadable make sense as protocols.

Please keep these as plan stubs, we considered new protocols, but we decided to add as few new protocols as possible. Either saving or loading will require application specific logic which is better placed in a plan

@evalott100
Copy link
Contributor

I'll be moving the current plan stubs from core

@evalott100
Copy link
Contributor

evalott100 commented Jun 4, 2024

Yesterday @coretl and I agreed that the dataset names should be set through the IOC. This turns this into three Issues...

  • pandablocks-client: Enable passing in of alternative names for devices when writing.
  • pandablock-ioc: For each capture record, enable giving a scientifically relevant name which given to the client for writing. These names will be set through a new PV (e.g COUNTER1:VAL:DATASET).
  • ophyd-async: Rather than obtaining records marked for capture by looking for names with _capture on the end, just look for which capture records have a scientifically important name set.

Then down the line we can...

  • pandablocks-ioc: Create a new PVI table structure consisting the all positions capture table.
  • ophyd-async: Use the above instead of walking.

@evalott100
Copy link
Contributor

evalott100 commented Jun 6, 2024

@callumforrester Instead of searching through device children for things ending in _capture,

def get_capture_signals(
block: Device, path_prefix: Optional[str] = ""
) -> Dict[str, SignalR]:
"""Get dict mapping a capture signal's name to the signal itself"""
if not path_prefix:
path_prefix = ""
signals: Dict[str, SignalR[Any]] = {}
for attr_name, attr in block.children():
# Capture signals end in _capture, but num_capture is a red herring
if attr_name == "num_capture":
continue
dot_path = f"{path_prefix}{attr_name}"
if isinstance(attr, SignalR) and attr_name.endswith("_capture"):
signals[dot_path] = attr
attr_signals = get_capture_signals(attr, path_prefix=dot_path + ".")
signals.update(attr_signals)
return signals

we can just look for devices with a DATASET field in their children:

            string rw PANDA1:COUNTER1:OUT:DATASET # will have a `dataset: SignalRW(str) attr

Additionally, putting to this will give it a different scientific name in the file output

@callumforrester
Copy link
Contributor

@evalott100 so to clarify: There will be a new PV PANDA1:COUNTER1:OUT:DATASET, and ophyd-async will automatically make a new signal on the correct block called "dataset", so we are now just looking for signals called "dataset" and making a mapping of the dataset name (the value of the PV) to those signals' parents?

@evalott100
Copy link
Contributor

evalott100 commented Jun 6, 2024

ophyd-async will automatically make a new signal on the correct block called "dataset", so we are now just looking for signals called "dataset"

Yep, once using PandABlocks/PandABlocks-client#91 and PandABlocks/PandABlocks-ioc#118

@callumforrester
Copy link
Contributor

Based on discussion with @abbiemery @evalott100 and @coretl, we are going to slightly change the plan and push more logic down to the IOC. The previous plan has ophyd-async putting dataset names to the panda and then reconstructing a data structure for .describe() from the various signals. It is simpler to build that data structure once, in the IOC and export it via a PV. Especially because the relationships between datasets and their min/max datasets are not actually important to ophyd-async, it just needs to know what datasets the panda is writing to HDF5 (and their data types).

So the new plan:

  • pandablocks-client: Enable passing in of alternative names for devices when writing, no change from previous plan.
  • pandablock-ioc: For each capture record, still enable giving a scientifically relevant name which given to the client for writing. These names will be set through a new PV (e.g COUNTER1:VAL:DATASET). Also create a top-level PV called DATA:DATASETS that exports a table of dataset names and HDF5 datatypes.
  • ophyd-async: No longer needs much complex logic, just reads the table out of the DATASETS PV and converts to a dictionary in the describe document.

Most of the complex logic now goes in the IOC, it needs to produce the DATASETS table and keep it up-to-date. The table will probably look something like this:

dataset hdf5_type
sample_x uint
sample_y float
sample_z_min float
sample_z_max float
sample_x float

In this case, sample_x, sample_y and sample_z are labels set via the DATASET PV on particular signals. The sample_z signal's capture PV is Max Min Mean, so all three datasets are included. Other datasets are other capture signals.
If the capture signal for a dataset is No, it should not be included in the table at all. The table only shows datasets that will be captured if the panda is armed.

The table also has to be kept up-to-date live, i.e. if a new value is put to a CAPTURE or a DATASET PV, the DATASETS PV should update and produce a new table. This requires a bit of extra logic in the IOC. The panda can report datatypes that map to float and uint, exactly how and what the mapping is I'll leave to @coretl

@coretl
Copy link
Collaborator Author

coretl commented Jun 7, 2024

The panda can report datatypes that map to float and uint

I suggest we use numpy datatypes, so float64 and uint32.

exactly how and what the mapping is I'll leave to @coretl
Looking at the code, timestamps are all now floating point, so the logic is:

  • ext_out bits is uint32
  • Everything else is float64

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants