Skip to content

add an ExcludeFilter #106

Open
Open
@mathause

Description

@mathause

In my IPCC analyses I had to remove many simulations (model-variable-, model-scenario-ensemble-, or other combinations). Even for the cleaned 'new-generation' repos, mesmer has to remove some of the simulations. For IPCC I did that in the data processing loop but I think it would be better done in the 'find the simulations' part (i.e. in filefinder). So we should add an ExcludeFilter (better names always welcome). We'd need to think about the way metadata for the excluded simulations is passed.

For IPCC I have a function which identifies matching metadata:

https://github.com/IPCC-WG1/Chapter-11/blob/d1a3a99f242a568fb4cefc36a038c888a90b9d37/code/fixes/_fixes_common.py#L47

But maybe could also use pandas machinery, e.g. isin():

# NOTE: untested

conditions = [
    # remove AWI ocean data: has an unstructured grid
    {
        "table": ["Oday", "Ofx", "Omon", "SIday", "SImon"],
        "model": ["AWI-CM-1-1-MR", "AWI-ESM-1-1-LR"],
    },
    # tasmax and tasmin are wrong for CESM
    {
        "table": "day",
        "varn": ["tasmax", "tasmin"],
        "model": ["CESM2", "CESM2-WACCM"],
    },
    ...
]


to_keep = True

for condition in conditions:
    to_keep |= ~ all(df[key].isin(cond) for key, cond in condition.items()))


df = df.iloc[to_keep]

FYI @veni-vidi-vici-dormivi

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions