Processing Parameter and File Management #373

jadball · 2025-01-10T14:28:46Z

Following on from #318, this issue concerns two problems:

How do we manage the files that are created as part of 3DXRD processing?
How do we remember which segmentation/indexing/mapping parameters were used to process the data?

For file management, we think the best solution is to be as minimal as possible.
Currently, the dataset class keeps track of all the files that are created, with embedded absolute paths in the H5. This is bad for several reasons (portability mainly).
However, it's convenient because we don't need to define all the paths at the top of each analysis notebook.

For file management, we propose the following:

The notebook/ewoks task is always in the same folder as the processed data that are generated
The notebook/ewoks task look for relative file names
Default file names should ensure automation (e.g the segmentation task will by default create peaks.h5, and the indexing task will by default look for a peaks.h5 file in the same folder).

For keeping track of the processing parameters, we propose the following:

In each output H5 file, we store the following in groups:
- The data itself (peaks/UBIs/etc)
- The path to the input file that created the data
- The path to the notebook/ewoks task that created the data
- The parameters used to create the data
  - e.g segmentation options
  - indexing options
- for grains.h5, we should save the spot3d_id for each grain

Filename templates seem to need:

PROCESSED_DATA/{sample}/{sample}{dataset}{version}peaks_table.h5
PROCESSED_DATA/{version}/{sample}/{sample}{dataset}_peaks_table.h5

Proposed structure of grains file:

grains.h5
- phase_1
  - Data
    - UBIs
    - Translations
    - spot3d_id
  - Parent
    - relative peaks_3d path
  - relative notebook path/ewoks task path
  - Indexing parameters
    - hkl_tol etc.
- phase_2
  - Data
    - UBIs
    - Translations
    - spot3d_id
  - Parent
    - relative peaks_3d path
  - relative notebook path/ewoks task path
  - indexing parameters
    - hkl_tol etc.

jonwright · 2025-01-10T16:21:41Z

#333 is related to this - what goes in which file and in which format.

jadball · 2025-01-10T19:21:08Z

Should wait on #371

jadball mentioned this issue Jan 10, 2025

Saving processing parameters & Nexus output, etc #318

Closed

jadball mentioned this issue Jan 10, 2025

New release #371

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Processing Parameter and File Management #373

Processing Parameter and File Management #373

jadball commented Jan 10, 2025

jonwright commented Jan 10, 2025

jadball commented Jan 10, 2025

Processing Parameter and File Management #373

Processing Parameter and File Management #373

Comments

jadball commented Jan 10, 2025

jonwright commented Jan 10, 2025

jadball commented Jan 10, 2025