MOUSEDataPipeline

MOUSEDataPipeline provides tools for the (automatic) processing of new MOUSE datafiles, offering a structured approach to manage and analyze scientific data generated by the MOUSE instrument.

prerequisites and assumptions

Nomenclature

Measurement Date: A rough timestamp indicating when measurements on a specific set of samples began. Each set of samples belonging together is grouped under a unique measurement date in the format YYYYMMDD.
Batch: Represents a set of measurements for a single sample. A batch includes all measurements across various configurations for that particular sample.
Repetition: Refers to an individual measurement within a specific configuration. This includes the measurement alongside the preceding direct beam and direct-beam-through-sample measurements, which are essential for determining the primary beam flux, beam position, and transmission factor.

expected directory structure

The data is organized under a predefined directory structure to ensure consistency and facilitate automated processing:

├─── Proposals
│   └─── 2025
└─── Measurements
    ├─── SAXS002
    │   ├─── logbooks
    │   └─── data
    │       └─── Masks
    │       └─── 2025
    │           └─── 20250101  # (measurement date)
    │               └─── 20250101_[batch]_[repetition] # directory with files
    │                   └───eiger_[number]_master.h5
    │                   └───eiger_[number]_data00001.h5
    │                   └───im_craw.nxs
    │                   └─── beam_profile
    │                       └─── eiger_[number]_master.h5
    │                       └─── eiger_[number]_data00001.h5
    │                       └─── im_craw.nxs
    │                   └───beam_profile_through_sample
    │                       └─── eiger_[number]_master.h5
    │                       └─── eiger_[number]_data00001.h5
    │                       └─── im_craw.nxs
    │               └─── 20250101_[batch]_[repetition]
    │               └─── ...
    │               └─── autoproc  # (processed datafiles)

Some flexibility is possible, there is a MOUSE_settings.yaml file that contains the paths to given sections in the tree. These can be adapted to point at the bits in your structure

usage example:

To process directories using specific configurations and steps, execute the following commands in your terminal:

python src/directory_processor.py --config MOUSE_settings.yaml --single_dir ~/Documents/BAM/Measurements/newMouseTest/Measurements/SAXS002/data/2025/20250101/20250101_21_22  --steps processstep_translator_step_1 processstep_translator_step_2 processstep_beamanalysis

Alternatively, specify measurement details directly:

python src/directory_processor.py --config MOUSE_settings.yaml --ymd 20250101 --batch 21 --repetition 22 --steps processstep_translator_step_1 processstep_translator_step_2 processstep_beamanalysis

top-level methods:

1. `directory_processor`

Processes all data for a specified measurement date (YYYYMMDD), batch, and repetition, or by a given directory path.
Executes the defined processing steps, which should ideally be wrappers around CLI-executable scripts, though this isn't strictly enforced.

2. `watcher`

WIP, not functional yet! This component aims to continuously monitor a measurement date directory for newly completed repetitions, automatically processing them as they become available.

functionality methods:

TBC...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MOUSEDataPipeline

prerequisites and assumptions

Nomenclature

expected directory structure

usage example:

top-level methods:

1. `directory_processor`

2. `watcher`

functionality methods:

Files

README.md

Latest commit

History

README.md

File metadata and controls

MOUSEDataPipeline

prerequisites and assumptions

Nomenclature

expected directory structure

usage example:

top-level methods:

1. directory_processor

2. watcher

functionality methods:

1. `directory_processor`

2. `watcher`