Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check time steps against expectations as part of the QC tests #99

Open
1 of 3 tasks
jbusecke opened this issue Feb 12, 2024 · 1 comment
Open
1 of 3 tasks

Check time steps against expectations as part of the QC tests #99

jbusecke opened this issue Feb 12, 2024 · 1 comment
Labels

Comments

@jbusecke
Copy link
Collaborator

jbusecke commented Feb 12, 2024

I just checked on the iids requested in #72 and noticed that there might have been a bug in the ingestion of previous LEAP ingested stores.

I ran the following:

iids = [
    'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-HH.highres-future.r1i1p1f1.Omon.so.gn.v20200514',
    'CMIP6.HighResMIP.NERC.HadGEM3-GC31-HH.hist-1950.r1i1p1f1.Omon.so.gn.v20200514',
    'CMIP6.HighResMIP.NERC.HadGEM3-GC31-HH.hist-1950.r1i1p1f1.Omon.thetao.gn.v20200514',
    'CMIP6.HighResMIP.MOHC.HadGEM3-GC31-HH.highres-future.r1i1p1f1.Omon.thetao.gn.v20200514'
]
import intake
url = "https://storage.googleapis.com/cmip6/cmip6-pgf-ingestion-test/catalog/catalog.json" # Only stores that pass current tests
col = intake.open_esm_datastore(url)  

CMIP6_naming_schema = "mip_era.activity_id.institution_id.source_id.experiment_id.member_id.table_id.variable_id.grid_label.version"

for iid in iids:
    print("==============================================================================")
    print(f'Checking for {iid=}')
    facet_dict = {k:v for k,v in zip(CMIP6_naming_schema.split('.'), iid.split('.'))}
    # we do not catalog the mip_era...which we probably should? TODO raise an issue
    del facet_dict['mip_era']
    cat = col.search(**facet_dict)
    if len(cat) == 0: 
        print('Not found in catalog')
    elif len(cat) == 1:
        path = cat.df['zstore'].tolist()[0]
        ddict = cat.to_dataset_dict()
        name, ds = ddict.popitem()
        print(f'{name=} Found in catalog at \n{path=}')
        display(ds)
    else:
        print(f"Found more than one entry. ⛔️ This should not happen.")

Which showed all iids already in the catalog (within the 'LEAP legacy' prefix 'gs://cmip6/CMIP6_LEAP_legacy/'). Each of these datasets has only 2 timesteps which makes me conclude that we have ingested pruned datasets into the main catalog. The way I have this set up currently this should not happen, but we need to overwrite (related to #76 ) these now and build better checks in the future.

Here are the steps I am taking next:

@jbusecke
Copy link
Collaborator Author

This is now possible due to the full API request injection (see #133 (comment)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant