Excluding runs from 10301 in 10302 #211

manasaV3 · 2024-08-19T19:14:35Z

Description

Fixes runs from dataset 10301 being included in dataset 10302 by explicitly excluding them.

jgadling · 2024-08-19T19:25:30Z

ingestion_tools/dataset_configs/10302.yaml

+        - ^27042022_BrnoKrios_Arctis_grid9_hGIS_Position_13$
+        - ^27042022_BrnoKrios_Arctis_grid9_hGIS_Position_33$
+        - ^27042022_BrnoKrios_Arctis_grid9_hGIS_Position_44$
+        - ^27042022_BrnoKrios_Arctis_grid9_hGIS_Position_7$


If we exclude tomograms, rawtilts, etc from being imported by filtering out invalid runs from each sub-type, we're still processing runs that are invalid for this dataset. That's (barely) acceptable for the moment since we don't write any per-run data as output right now, but if we started writing run_metadata.json files, this falls apart. It seems like adding support for multiple include/exclude regexes to SourceGlobFinder would let us skip these runs at the appropriate level instead of repeating this list for multiple child types.

So would an alternative (but likely hacky/clunky and non-sustainable) solution be adapting our current runs.sources.source_glob fields / regexes to use negative lookahead regex and filter out all of these?

Yeah, in theory we could hack this right now with a super gnarly value in runs.sources.source_glob.match_regex. It's probably not a ton of work to support multiple in/exclude regexes in that finder though 🤷‍♀️

I would lean towards the exclude over the regex as it is more cleaner with an exclude.

daniel-ji

Looks good to me!

jgadling · 2024-08-21T16:30:01Z

ingestion_tools/scripts/common/finders.py

@@ -54,6 +56,8 @@ def __init__(
            name_regex = "(.*)"
        self.name_regex = re.compile(name_regex)

+        self.exclude_regexes = [re.compile(regex) for regex in exclude_regexes or []]


It looks like we're supporting an exclude_regexes parameter for SourceGlobFinder as well as an exclude filter for any source type? It seems like the exclude filter would be enough and we don't need both?

Good catch! I initially added exclude_regexes to SourceGlobFinder. But, decided to generalize it to exclude for all the source types. Looks like the exclude_regexes initialization got missed in clean up.

jgadling

It looks like we're supporting an exclude_regexes parameter for SourceGlobFinder as well as an exclude filter for any source type? It seems like the exclude filter would be enough and we don't need both?

manasaV3 requested review from jgadling and daniel-ji August 19, 2024 19:14

jgadling reviewed Aug 19, 2024

View reviewed changes

manasaV3 added 4 commits August 20, 2024 13:34

Excluding runs from 10301 in 10302

3086b68

Adding excludes to config

07d9197

Updating schema

14081dd

Updating schema

42c6bf4

manasaV3 force-pushed the mvenkatakrishnan/ds_10302_fix branch from 8b55e5c to 42c6bf4 Compare August 20, 2024 22:03

daniel-ji force-pushed the mvenkatakrishnan/ds_10302_fix branch from ac32ec5 to a9e986a Compare August 20, 2024 23:10

cleanup ingestion config

81f7f50

daniel-ji force-pushed the mvenkatakrishnan/ds_10302_fix branch from a9e986a to 81f7f50 Compare August 20, 2024 23:22

daniel-ji approved these changes Aug 20, 2024

View reviewed changes

jgadling reviewed Aug 21, 2024

View reviewed changes

manasaV3 requested a review from jgadling August 21, 2024 21:52

jgadling reviewed Aug 21, 2024

View reviewed changes

Cleaning up exclude_regexes from SourceGlobFinder

41edd4f

jgadling approved these changes Aug 22, 2024

View reviewed changes

manasaV3 merged commit 4417285 into main Aug 22, 2024
5 checks passed

manasaV3 deleted the mvenkatakrishnan/ds_10302_fix branch August 22, 2024 23:00

daniel-ji mentioned this pull request Aug 23, 2024

ingestion config fixes #220

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Excluding runs from 10301 in 10302 #211

Excluding runs from 10301 in 10302 #211

manasaV3 commented Aug 19, 2024

jgadling Aug 19, 2024

daniel-ji Aug 19, 2024

jgadling Aug 19, 2024 •

edited

Loading

manasaV3 Aug 19, 2024

daniel-ji left a comment

jgadling Aug 21, 2024

manasaV3 Aug 22, 2024 •

edited

Loading

jgadling left a comment

Excluding runs from 10301 in 10302 #211

Excluding runs from 10301 in 10302 #211

Conversation

manasaV3 commented Aug 19, 2024

Description

jgadling Aug 19, 2024

Choose a reason for hiding this comment

daniel-ji Aug 19, 2024

Choose a reason for hiding this comment

jgadling Aug 19, 2024 • edited Loading

Choose a reason for hiding this comment

manasaV3 Aug 19, 2024

Choose a reason for hiding this comment

daniel-ji left a comment

Choose a reason for hiding this comment

jgadling Aug 21, 2024

Choose a reason for hiding this comment

manasaV3 Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

jgadling left a comment

Choose a reason for hiding this comment

jgadling Aug 19, 2024 •

edited

Loading

manasaV3 Aug 22, 2024 •

edited

Loading