Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excluding runs from 10301 in 10302 #211

Merged
merged 6 commits into from
Aug 22, 2024
Merged

Conversation

manasaV3
Copy link
Contributor

Description

Fixes runs from dataset 10301 being included in dataset 10302 by explicitly excluding them.

- ^27042022_BrnoKrios_Arctis_grid9_hGIS_Position_13$
- ^27042022_BrnoKrios_Arctis_grid9_hGIS_Position_33$
- ^27042022_BrnoKrios_Arctis_grid9_hGIS_Position_44$
- ^27042022_BrnoKrios_Arctis_grid9_hGIS_Position_7$
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we exclude tomograms, rawtilts, etc from being imported by filtering out invalid runs from each sub-type, we're still processing runs that are invalid for this dataset. That's (barely) acceptable for the moment since we don't write any per-run data as output right now, but if we started writing run_metadata.json files, this falls apart. It seems like adding support for multiple include/exclude regexes to SourceGlobFinder would let us skip these runs at the appropriate level instead of repeating this list for multiple child types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So would an alternative (but likely hacky/clunky and non-sustainable) solution be adapting our current runs.sources.source_glob fields / regexes to use negative lookahead regex and filter out all of these?

Copy link
Contributor

@jgadling jgadling Aug 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, in theory we could hack this right now with a super gnarly value in runs.sources.source_glob.match_regex. It's probably not a ton of work to support multiple in/exclude regexes in that finder though 🤷‍♀️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would lean towards the exclude over the regex as it is more cleaner with an exclude.

@manasaV3 manasaV3 force-pushed the mvenkatakrishnan/ds_10302_fix branch from 8b55e5c to 42c6bf4 Compare August 20, 2024 22:03
@daniel-ji daniel-ji force-pushed the mvenkatakrishnan/ds_10302_fix branch from ac32ec5 to a9e986a Compare August 20, 2024 23:10
@daniel-ji daniel-ji force-pushed the mvenkatakrishnan/ds_10302_fix branch from a9e986a to 81f7f50 Compare August 20, 2024 23:22
Copy link
Contributor

@daniel-ji daniel-ji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@@ -54,6 +56,8 @@ def __init__(
name_regex = "(.*)"
self.name_regex = re.compile(name_regex)

self.exclude_regexes = [re.compile(regex) for regex in exclude_regexes or []]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we're supporting an exclude_regexes parameter for SourceGlobFinder as well as an exclude filter for any source type? It seems like the exclude filter would be enough and we don't need both?

Copy link
Contributor Author

@manasaV3 manasaV3 Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I initially added exclude_regexes to SourceGlobFinder. But, decided to generalize it to exclude for all the source types. Looks like the exclude_regexes initialization got missed in clean up.

@manasaV3 manasaV3 requested a review from jgadling August 21, 2024 21:52
Copy link
Contributor

@jgadling jgadling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we're supporting an exclude_regexes parameter for SourceGlobFinder as well as an exclude filter for any source type? It seems like the exclude filter would be enough and we don't need both?

@manasaV3 manasaV3 merged commit 4417285 into main Aug 22, 2024
5 checks passed
@manasaV3 manasaV3 deleted the mvenkatakrishnan/ds_10302_fix branch August 22, 2024 23:00
@daniel-ji daniel-ji mentioned this pull request Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants