Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processing Parameter and File Management #373

Open
jadball opened this issue Jan 10, 2025 · 2 comments
Open

Processing Parameter and File Management #373

jadball opened this issue Jan 10, 2025 · 2 comments

Comments

@jadball
Copy link
Contributor

jadball commented Jan 10, 2025

Following on from #318, this issue concerns two problems:

  • How do we manage the files that are created as part of 3DXRD processing?
  • How do we remember which segmentation/indexing/mapping parameters were used to process the data?

For file management, we think the best solution is to be as minimal as possible.
Currently, the dataset class keeps track of all the files that are created, with embedded absolute paths in the H5. This is bad for several reasons (portability mainly).
However, it's convenient because we don't need to define all the paths at the top of each analysis notebook.

For file management, we propose the following:

  • The notebook/ewoks task is always in the same folder as the processed data that are generated
  • The notebook/ewoks task look for relative file names
  • Default file names should ensure automation (e.g the segmentation task will by default create peaks.h5, and the indexing task will by default look for a peaks.h5 file in the same folder).

For keeping track of the processing parameters, we propose the following:

  • In each output H5 file, we store the following in groups:
    • The data itself (peaks/UBIs/etc)
    • The path to the input file that created the data
    • The path to the notebook/ewoks task that created the data
    • The parameters used to create the data
      • e.g segmentation options
      • indexing options
    • for grains.h5, we should save the spot3d_id for each grain

Filename templates seem to need:

PROCESSED_DATA/{sample}/{sample}{dataset}{version}peaks_table.h5
PROCESSED_DATA/{version}/{sample}/{sample}
{dataset}_peaks_table.h5

Proposed structure of grains file:

  • grains.h5
    • phase_1
      • Data
        • UBIs
        • Translations
        • spot3d_id
      • Parent
        • relative peaks_3d path
      • relative notebook path/ewoks task path
      • Indexing parameters
        • hkl_tol etc.
    • phase_2
      • Data
        • UBIs
        • Translations
        • spot3d_id
      • Parent
        • relative peaks_3d path
      • relative notebook path/ewoks task path
      • indexing parameters
        • hkl_tol etc.
@jonwright
Copy link
Member

#333 is related to this - what goes in which file and in which format.

@jadball jadball mentioned this issue Jan 10, 2025
@jadball
Copy link
Contributor Author

jadball commented Jan 10, 2025

Should wait on #371

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants