-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excluding runs from 10301 in 10302 #211
Conversation
- ^27042022_BrnoKrios_Arctis_grid9_hGIS_Position_13$ | ||
- ^27042022_BrnoKrios_Arctis_grid9_hGIS_Position_33$ | ||
- ^27042022_BrnoKrios_Arctis_grid9_hGIS_Position_44$ | ||
- ^27042022_BrnoKrios_Arctis_grid9_hGIS_Position_7$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we exclude tomograms, rawtilts, etc from being imported by filtering out invalid runs from each sub-type, we're still processing runs that are invalid for this dataset. That's (barely) acceptable for the moment since we don't write any per-run data as output right now, but if we started writing run_metadata.json
files, this falls apart. It seems like adding support for multiple include/exclude regexes to SourceGlobFinder would let us skip these runs at the appropriate level instead of repeating this list for multiple child types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So would an alternative (but likely hacky/clunky and non-sustainable) solution be adapting our current runs.sources.source_glob
fields / regexes to use negative lookahead regex and filter out all of these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, in theory we could hack this right now with a super gnarly value in runs.sources.source_glob.match_regex
. It's probably not a ton of work to support multiple in/exclude regexes in that finder though 🤷♀️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would lean towards the exclude over the regex as it is more cleaner with an exclude.
8b55e5c
to
42c6bf4
Compare
ac32ec5
to
a9e986a
Compare
a9e986a
to
81f7f50
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
@@ -54,6 +56,8 @@ def __init__( | |||
name_regex = "(.*)" | |||
self.name_regex = re.compile(name_regex) | |||
|
|||
self.exclude_regexes = [re.compile(regex) for regex in exclude_regexes or []] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we're supporting an exclude_regexes
parameter for SourceGlobFinder as well as an exclude
filter for any source type? It seems like the exclude
filter would be enough and we don't need both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I initially added exclude_regexes
to SourceGlobFinder
. But, decided to generalize it to exclude for all the source types. Looks like the exclude_regexes
initialization got missed in clean up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we're supporting an exclude_regexes parameter for SourceGlobFinder as well as an exclude filter for any source type? It seems like the exclude filter would be enough and we don't need both?
Description
Fixes runs from dataset 10301 being included in dataset 10302 by explicitly excluding them.