-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Embed experiment identifiers in model outputs #510
Comments
Would the commit hash for the Lines 653 to 654 in 68d8482
If so, the model method |
Yes.
Good point. And yes. I kinda thought I'd get it wrong and need you to say where we should put it.
Yes. It doesn't really make sense otherwise.
I like it. |
I think I might have changed my mind about concatenating the IDs together. The motivation was to make it simpler, just embed a single metadata item. But it makes everything else more complicated. Also the experiment ID will be used widely, in intake catalogues etc, so I think it makes sense to have that as a separate, unambiguous, easy to access metadata attribute. |
Ok, so are you saying there should be two fields added to outputs? An |
Yep. |
Embedding experiment ID and run commit hashes into model output diagnostics is essential for experiment provenance: it establishes a link between the outputs of an experiment and all the provenance data of the experiment. It means consumers of the data, regardless of where they find it, have the possibility of finding this essential information.
These identifying hashes then have the ability to become persistent identifiers (PIDs) once there is a service to resolve them and expose the related metadata to users. Such a service doesn't exist ... yet. But embedding this information is a necessary precursor.
Proposal
git
commit hash as a unique identifier (exptrunID
?) for each run of a model, where an experiment constitutes a number of such consecutive runs.exptrunID
as a metadata field in all model output diagnostics, e.g. global netCDF attribute.exptrunID
as an configuration input to the model so the metadata is added when the diagnostic is written. If this isn't possible add metadata after the run has completed.Implementation
Where possible the
exptrunID
should be added as a model configuration input option and written directly into the model outputs. This has two benefits:This may require code changes in the models themselves. This doesn't have to happen immediately, in the first case post-processing could be utilised until the code supported direct metadata injection. This would be tricky to manage, as it would be model version dependent.
Each model should take care of adding this metadata to the model diagnostic outputs. This means the
model
class should have a stub methodadd_output_metadata
that is either not implemented, or has some useful default like adding global attribute to netCDF files.add_output_metadata
should be called atsetup
andarchive
stages so thatexptrunID
can be added either before a run, or after it has completed. The method needs to have logic to decide if it runs atsetup
orarchive
. If there isn't a better way, like some call-graph inspection, then the stage should be passed to the method.Notes
mpp_write_meta
routine for MOM5For MOM6 Global attributes can be written by calling register_global_attribute. Scalar and 1d real and integers (32 and 64 bit) and scalar string values are supported
This interface can be used with any FMS2_io fileobj, but the open_file needs to be called before using it.
netCDF:
https://github.com/COSIMA/cice5/blob/edcfa6f9c76ed05b63196ce4b5355fa5a8f4fe3a/io_netcdf/ice_history_write.F90#L922-L978
pio:
https://github.com/COSIMA/cice5/blob/edcfa6f9c76ed05b63196ce4b5355fa5a8f4fe3a/io_pio/ice_history_write.F90#L877-L934
The text was updated successfully, but these errors were encountered: