- About
- What are ontologies and how do they improve data quality?
- The MPox Contextual Data Specification Package
- Contacts
- License
- Acknowledgements
Labs collect, encode and store information in different ways. They use different fields, terms and formats, they categorize variables in different ways, and the meanings of words change depending on the focus of the organization (think of the word “plant”. To someone in agriculture, “plant” could mean an organism that carries out photosynthesis, while a food regulator might understand the word “plant” to mean a factory where food products are made). This variability makes comparing, integrating and analyzing data generated by different organizations like trying to compare apples, oranges and bananas, which is difficult to do.
Ontologies are collections of controlled vocabulary that are arranged in a hierarchy, where all the terms are linked using logical relationships. Ontologies are open source and meant to represent “universal truth” as much as possible (so not tied to one organization’s vocabulary of use case). Ontologies encode synonyms, which enables mapping between the specific languages used by different organizations, and every term in the ontology is assigned a globally unique and persistent identifier. Using ontology terms to standardize Mpox contextual data not only helps make data more interoperable by using a common language, it also helps to make contextual data FAIR (Findable, Accessible, Interoperable, Reusable).
This specification is implemented via the DataHarmonizer, a CIDGOH tool for applying the specification. Accompanying Field and Term reference guides (which provide definitions and additional specific guidance) and a curation Standard Operating Procedure (SOP) are available and can be used to support any other tools that can be tailored to implement the specification. New terms and/or term changes can be requested using issue request forms, with additional guidance on how to do so outline in the New Term Request (NTR) SOP. This resources are available in the files of this repository and listed below under Package Contents.
The MPox contextual data specification currently supports two primary use cases, which are built on the same schema:
- Canadian MPox contextual data: Contextual data for submitting labs in Canada. This specification has picklists specific to Canadian territories and organisations.
- Internation MPox contextual data: Contextual data for international users. This specification has simplified picklists and allows for more flexibility to support international use.
Please note that development of the specification is dynamic and it will be updated periodically to address user needs. Versioning is done in the format of x.y.z
.
x
= Field level changes
y
= Term value / ID level changes
z
= Definition, guidance, example, formatting, or other uncategorized changes
Descriptions of changes are provided in [release notes](https://github.com/cidgoh//releases) for every new version.
- Pathogen Genomics Package (MPox)
- Template schema files can be found as
.yaml
/.json
/.tsv
under pathogen-genomics-package/templates/MPox
- Template schema files can be found as
- DataHarmonizer App
- The DataHarmonizer is a standardized browser-based spreadsheet editor and validator.
- Instructions on "Getting Started" downloading and using the application can be found under DataHarmonizer Instructions and SOP below.
- Further information about application functionality can be found on the DataHarmonizer Wiki.
- XLSX version Please note that this format contains all fields and terms for both the Canadian and the International reference guides. You will need to filter appropriately
- PDF version
For more information and/or assistance, contact Emma Griffiths at [email protected] or submit a repository issue request.
Pending / To Be Determined
Brought to you by The Centre for Infectious disease Genomics and One Health