Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove "collection" from individual CVs #57

Open
taylor13 opened this issue Feb 15, 2024 · 0 comments
Open

remove "collection" from individual CVs #57

taylor13 opened this issue Feb 15, 2024 · 0 comments

Comments

@taylor13
Copy link
Collaborator

I think we can avoid lots of headaches if we adopt the following terms: a dataset collection (DScollection) is a collection of datasets that all rely on a common collection of CVs (CVcollection). I introduced the concept of dataset collections a few years ago, and I have become convinced it is essential in thinking about the various WCRP-related datasets.

And why not define a CVcollection in a json file (rather than including the information in every CV included in a collection)? That is, remove the collection information from each of the CV's and instead list all the CV's that belong to each CV collection in a separate json file. A contributing CV could be included in multiple CV collections. Each time any of the contributing CVs were modified, a new CV collection would be defined which would be the same as the previous collection except for the presumably few contributing CVs that have changed.

Schematic of CVcollection file:

"CVcollection": {
      "CMIP6plus_CVcollection":{
                 "collection_version":"6.5.1.0", 
                 "CVcollection_modified":"2023-11-20T16:32:10Z",
                  "CVcollection_release":"??"
                  "contents":{
                            "CMIP6plus_DRS":"v6.5.0.8", (could name "MIPs_DRS" and indicate 6plus by the version: 6.5.?.?)
                            "MIPs_product":"v1.1.1.1",  ("MIPs" prefix indicates that this uses a CV for product that is 
                                                                                  likely useful across MIP phases 
                                .
                                .
      "CMIP6plus_CVcollection":{
                 "collection_version":"6.5.1.1",
                 "CVcollection_modified":"2024-02-20T10:32:00Z",
                  "CVcollection_release":"??"
                  "contents":{
                            "CMIP6plus_DRS":"v6.5.0.8", (version indicates CMIP6plus)
                            "MIPs_product":"v1.1.1.2",  (version indicates product is same 
                                                                 as version 1.1.1.1 , but with additional options for product)
                                .
                                .

This way of doing things makes it clear what has changed from one CV collection version to the next. Also different CV collections can draw on a common set of CVs (e.g., obs4MIPs and input4MIPs might rely on the same "frequency" CV as CMIP).

This also makes it easy for us to clearly indicate which CVs apply to each proposed "dataset collection" (DScollection). Each dataset in a particular DScollection would have to conform to the specifications found in a single CVcollection. (We could allow the least significant digits of the CVcollection version to be different (i.e. datasets conforming with CVcollection version 6.0.2.8 and 6.0.2.9 could be included in a single DScollection.) For example, adding a new source_id to a CV wouldn't disrupt datasets already published because the new CVcollection would be backward compatible with the old, only including additional options for source_id. Thus, datasets conforming with 6.0.2.8 and 6.0.2.9 could be included together in a single DScollection.

The individual CVs included in a CVcollection would not record what collections they belong to. So the "collection" portion of the "header" currently found in each CV would be omitted. Each individual CV then would be independently versioned.

Of course we could copy from the master CV repository all the CVs comprising a CVcollection and bundle those together to make the collection easy to obtain (by CMOR or ESGF, for example).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant