-
Notifications
You must be signed in to change notification settings - Fork 0
Deprecating & Normalizing Specification Values & IDs
There are many reasons why a term may be rendered “obsolete” and consequently deprecated in favour of a different term. It may be that it was rehomed in a more appropriate ontology or that it was deemed redundant with an existing term. During specification development, we usually need to generate new terms quickly - which is relatively easy to do in ontologies we manage, but more difficult when requesting terms in ontologies within the greater OBO Foundry community. Long term, the formal deprecation of terms enables downstream ontologies that import them to pass this information on to database systems so they are in a position to update their own contents and therefore support federated querying without the need to map terms.
For our specification reference guide work we sometimes mint GenEpiO or FoodOn ontology IDs. These IDs are IRIs (Internationalized Resource Identifier) which form a permanent URL (PURL) when combined with "http://purl.obolibrary.org/obo/" ensure future access to users. While many are expected to be permanently housed in GenEpiO or FoodOn, some are temporary in that they will be made obsolete in favour of terms we have requested in other ontologies - acting as placeholders while we wait for their uptake.
Many of these terms are outside the domain of GenEpiO or FoodOn, making other domain specific ontologies a more appropriate home. E.g. GenEpiO created "cafeteria" to meet demands of another epidemiology term, but once an equivalent term was identified in the Environmental Ontology (ENVO), where much more consideration was put into its axioms and placement in the environmental hierarchy, we made the original obsolete in favour of it (Figure 1).
In the case of the field guide, we have identified terms that may be better suited in other ontologies. If the target ontologies agree and would like to house the terms we require, we will make said GenEpiO or FoodOn terms obsolete once the other ontology has completed our requests (which may take days to years). In the meantime, we are housing them in GenEpiO / FoodOn to ensure the specification terms are ontologized with IRIs now.
When we make a minted term "obsolete" we don't remove it from the ontology. The IRI/PURL remain, but "obsolete" is appended to the label and a new annotation is associated that redirects to the term deemed a more appropriate replacement. Consequently, obsolete terms never leave a user stuck with an outdated, dead-end resource. Database systems that are reusing ontologies have enough information from the “replaced by” deprecated term annotation of an updated ontology file to switch to using the replacement terms. Designing an automated term replacement system is a good strategy, although some manual final approval for such updates is usually required since 3rd party consumers of database content must be taken into consideration too.
In this context, normalization refers to consolidating and mapping term labels from various sources such that there is a single “primary” label used across datasets - increasing interoperability and findability. This occurs during the deprecation processes of ontology terms or when an ontology term is given “alternative label”, “synonym”, or “database cross reference” annotations. Since the specification may not be used in the context of a database system that can do this automatically, we are working on encoding this as a feature in the DataHarmonizer application, as well as adding these alternative annotations to GenEpiO.
For example, if one institution uses “Nasopharynx”, another uses “NP”, and another uses concept code "71836000” - we can normalize all these labels to the specification label (e..g, “Nasopharynx (NP)” or “Nasopharynx (NP) [UBERON:0001728]”). Within the GenEpiO application ontology we annotate these terms with the “alternative labels, so that a user searching for “71836000” within GenEpiO will find the appropriate term. In the DataHarmonizer, we intend to encode these annotations in a “deprecation table” so that DataHarmonizer users can input the labels normally used within their institution, validate, and then choose whether they want to “normalize” their labels to the specification.
To help users identify a term match and update their conceptual data when a specification term has changed, we have included “Deprecated Label” and “Deprecated ID” columns in all reference guides. This is to make it easier for users to “match” old terms with the new version without having to query an ontology. Please note that this is only done when strictly necessary, as we avoid deprecating terms whenever possible. If a term changes in its fundamental meaning (e.g., the core concept being described), a new picklist term will be created.