-
Notifications
You must be signed in to change notification settings - Fork 2
Hextof lab loader #534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Hextof lab loader #534
Conversation
else: | ||
raise ValueError(f"Unsupported core beamline: {core_beamline}") | ||
|
||
def _validate_h5_files(self, config, h5_paths: list[Path]) -> list[Path]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This validation was previously in BufferFilePaths and we had a discussion to move it from there. I find this location better (also was necessary due to restructure)
# TODO: move to config | ||
MULTI_INDEX = ["trainId", "pulseId", "electronId"] | ||
PULSE_ALIAS = MULTI_INDEX[1] | ||
FORMATS = ["per_electron", "per_pulse", "per_train"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these have now been moved to config/config model
c206130
to
788d189
Compare
Pull Request Test Coverage Report for Build 13419398366Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 10 out of 11 changed files in this pull request and generated no comments.
Files not reviewed (1)
- .cspell/custom-dictionary.txt: Language not supported
Comments suppressed due to low confidence (2)
src/sed/loader/flash/dataframe.py:423
- The docstring for the df_train property refers to channels of type [per pulse], but the implementation uses 'per_train'. Please update the comment to match the code and maintain clarity.
Returns a pandas DataFrame for given channel names of type [per pulse]
tests/data/loader/flash/config.yaml:57
- [nitpick] For consistency and to avoid potential YAML parsing issues, consider quoting the index values as strings (e.g. ['trainId', 'pulseId', 'electronId']).
index: [trainId, pulseId, electronId]
… as it is not available right now anyways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just few comments. Is this local metadata scheme also important for flash loader? Because then the code also needs to be updated there.
@@ -26,6 +26,7 @@ class PathsModel(BaseModel): | |||
|
|||
raw: DirectoryPath | |||
processed: Optional[Union[DirectoryPath, NewPath]] = None | |||
meta: Optional[Union[DirectoryPath, NewPath]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding a new entry to the config model, I'd suggest we just allow directory paths in
sed/src/sed/core/config_model.py
Line 327 in 4a6ec53
archiver_url: Optional[HttpUrl] = None |
what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine for me. I just thought as it anyway would be one of the main folders inside the beamtime folder.
processed_dir = Path( | ||
self._config["core"]["paths"].get("processed", raw_dir.joinpath("processed")), | ||
) | ||
meta_dir = Path( | ||
self._config["core"]["paths"].get("meta", raw_dir.joinpath("meta")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The path logic is confusing right now as there is too many possibilities. I'd put the default as archiver_url in lab default config, and one automatic option.
To me its not clear if the meta path is 'meta/' or 'meta/fabtrack/' right now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is also confusing for me, as don't really see how you can get from raw_dir to e.g. processed_dir with raw_dir.joinpath("processed")
- because this will give you beamtime_dir/raw_dir/processed instead of beamtime_dir/processed, or?
Currently, meta path is 'meta/fabtrack/' as it comes from Fabiano's code, but probably can be changed just to 'meta/' as soon as it will be accepted/generalized by IT guys.
self.metadata.update(self.parse_local_metadata()) | ||
else: | ||
print("Metadata taken from SciCat") | ||
self.metadata.update(self.parse_scicat_metadata(token) if collect_metadata else {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily a big issue but the parse_scicat_metadata is called twice in case it exists, once during if and once during else.
One way could be:
scicat_metadata = self.parse_scicat_metadata(token) if collect_metadata else {})
self.metadata.update(scicat_metadata)
if len(scicat_metadata) == 0:
print("No SciCat metadata available, checking local folder")
self.metadata.update(self.parse_local_metadata())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine for me. Just wanted to implement check if SciCat entries available then go for it, if not then check local folder to be compatible to older beamtimes.
burl=self.url, | ||
url="Datasets", | ||
url="datasets",#"Datasets", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did the api change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, all metadata was migrated to generalized scicat.desy.de with new api where 'Datasets' were changed to 'datasets' :)
Hopefully within next days it should be also available from outside DESY.
This PR adds the lab loader requested in #503 . I tried to make minimal changes to the
FlashLoader
to make this work. The only major addition is the loader specific dataframe class and everything else stays approximately the same. So the lab data works with the flash loader but withbeamline
config ascfel
.An example config is provided to make this work. Since I took out some hardcoded paramters (was in TODO) into the config, I updated the config model slightly.
Test data for this loading configuration still needs to be setup. I ask @kutnyakhov to provide a public file to perform this. Not sure if a tutorial is necessary or not.