You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following on from #318, this issue concerns two problems:
How do we manage the files that are created as part of 3DXRD processing?
How do we remember which segmentation/indexing/mapping parameters were used to process the data?
For file management, we think the best solution is to be as minimal as possible.
Currently, the dataset class keeps track of all the files that are created, with embedded absolute paths in the H5. This is bad for several reasons (portability mainly).
However, it's convenient because we don't need to define all the paths at the top of each analysis notebook.
For file management, we propose the following:
The notebook/ewoks task is always in the same folder as the processed data that are generated
The notebook/ewoks task look for relative file names
Default file names should ensure automation (e.g the segmentation task will by default create peaks.h5, and the indexing task will by default look for a peaks.h5 file in the same folder).
For keeping track of the processing parameters, we propose the following:
In each output H5 file, we store the following in groups:
The data itself (peaks/UBIs/etc)
The path to the input file that created the data
The path to the notebook/ewoks task that created the data
The parameters used to create the data
e.g segmentation options
indexing options
for grains.h5, we should save the spot3d_id for each grain
Following on from #318, this issue concerns two problems:
For file management, we think the best solution is to be as minimal as possible.
Currently, the dataset class keeps track of all the files that are created, with embedded absolute paths in the H5. This is bad for several reasons (portability mainly).
However, it's convenient because we don't need to define all the paths at the top of each analysis notebook.
For file management, we propose the following:
For keeping track of the processing parameters, we propose the following:
Filename templates seem to need:
PROCESSED_DATA/{sample}/{sample}{dataset}{version}peaks_table.h5
PROCESSED_DATA/{version}/{sample}/{sample}{dataset}_peaks_table.h5
Proposed structure of grains file:
The text was updated successfully, but these errors were encountered: