Skip to content

Commit

Permalink
17 rd cdm paper revision (#23)
Browse files Browse the repository at this point in the history
* updates

* updates

* updates

* updates

* update
  • Loading branch information
aslgraefe authored Oct 7, 2024
1 parent e958089 commit 55deddf
Show file tree
Hide file tree
Showing 14 changed files with 62 additions and 38 deletions.
38 changes: 24 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Ontology-Based Rare Disease Common Data Model

An ontology-based Rare Disease Common Data Model (RD CDM) to enable
international registry use, HL7 FHIR, and GA4GH Phenopackets.
An ontology-based Rare Disease Common Data Model harmonising international
registry use, FHIR, and the Phenopacket Schema

<!-- Python CI and Documentation Status Badges -->
[![Python CI](https://github.com/BIH-CEI/rd-cdm/actions/workflows/python_ci.yml/badge.svg)](https://github.com/BIH-CEI/rd-cdm/actions/workflows/python_ci.yml)
Expand All @@ -14,6 +14,8 @@ international registry use, HL7 FHIR, and GA4GH Phenopackets.
![CSV Created](https://img.shields.io/badge/CSV%20Created%20Successfully-6A5ACD)
![Validation Successful](https://img.shields.io/badge/Validation%20Successful-brightgreen)

![Latest Documentation](https://rd-cdm.readthedocs.io/en/latest/)

> **Attention:**
> The RD CDM paper is currently under review. As soon as it is published, we
> will update the version to 2.0.0 and provide a link to the paper here.
Expand All @@ -37,7 +39,7 @@ The Rare Disease Common Data Model (RD CDM) is designed to harmonize rare
disease data capture across international registries. It integrates standards
such as the ERDRI-CDS, HL7 FHIR, and GA4GH Phenopacket Schema, creating a
scalable, ontology-driven framework that supports advanced interoperability for
research and care. The RD CDM Version 2.0 consists of 66 data elements,
research and care. The RD CDM Version 2.0.0 consists of 78 data elements,
extending the ERDRI-CDS and allowing deeper insights into genetic findings,
phenotypic features, and family history of individuals.

Expand All @@ -53,10 +55,10 @@ phenotypic features, and family history of individuals.
- Cross-registry Compatibility: Enables data reuse across multiple registries
with consistent encoding and semantic alignment.


## Getting Started

This section provides instructions for getting started with the RD CDM.
This section provides instructions for getting started with the RD CDM. For more
detail please read our ![Documentation](https://rd-cdm.readthedocs.io/en/latest/)

### Prerequisites

Expand Down Expand Up @@ -90,15 +92,21 @@ consider reaching out to discuss collaboration opportunities.
## Resources

### Ontologies
- Human Phenotype Ontology (HP, Version 2024-08-13) [🔗](http://www.human-phenotype-ontology.org)
- Monarch Initiative Disease Ontology (MONDO, Version Version 2024-09-03) [🔗](https://mondo.monarchinitiative.org/)
- Online Mendelian Inheritance in Man (OMIM, Version 2024-09-12) [🔗](https://www.omim.org/)
- Orphanet Rare Disease Ontology (OPRHA, Version 2024-09-12) [🔗](https://www.orpha.net/)
- National Center for Biotechnology Information Taxonomy (NCBITaxon, Version 2024-07-03) [🔗](https://www.ncbi.nlm.nih.gov/taxonomy)
- Logical Observation Identifiers Names and Codes (LOINC, Version 2.78) [🔗](https://loinc.org/)
- HUGO Gene Nomenclature Committee (HGNC, Version 2024-08-23) [🔗](https://www.genenames.org/)
- Gene Ontology (GENO, Version 2023-10-08) [🔗](https://geneontology.org/)
- NCI Thesaurus OBO Edition (NCIT, Version Version 24.04e ) [🔗](https://obofoundry.org/ontology/ncit.html)
- Human Phenotype Ontology [🔗](http://www.human-phenotype-ontology.org)
- Monarch Initiative Disease Ontology [🔗](https://mondo.monarchinitiative.org/)
- Online Mendelian Inheritance in Man [🔗](https://www.omim.org/)
- Orphanet Rare Disease Ontology [🔗](https://www.orpha.net/)
- SNOMED CT [🔗](https://www.snomed.org/snomed-ct)
- ICD 11 [🔗](https://icd.who.int/en)
- ICD10CM [🔗](https://www.cdc.gov/nchs/icd/icd10cm.htm)
- National Center for Biotechnology Information Taxonomy [🔗](https://www.ncbi.nlm.nih.gov/taxonomy)
- Logical Observation Identifiers Names and Codes [🔗](https://loinc.org/)
- HUGO Gene Nomenclature Committee [🔗](https://www.genenames.org/)
- Gene Ontology[🔗](https://geneontology.org/)
- NCI Thesaurus OBO Edition [🔗](https://obofoundry.org/ontology/ncit.html)

For the versions used in a specific RD-CDM version, please see the [resources
in our documentation]('https://rd-cdm.readthedocs.io/en/latest/resources/resources_file.html').

### Submodules
- [RareLink](https://github.com/BIH-CEI/RareLink)
Expand All @@ -117,6 +125,8 @@ development of this RD CDM model.
- Authors:
- [Adam SL Graefe](https://github.com/aslgraefe)
- [Filip Rehburg](https://github.com/frehburg)
- Miriam Hübner
- Steffen Sander
- Prof. Peter N. Robinson
- Prof. Sylvia Thun
- Prof. Oya Beyan
Binary file added docs/_static/v2_0_0_dev0/RD CDM v2.0.0.xlsx
Binary file not shown.
Binary file not shown.
Binary file removed docs/_static/v2_0_0_dev0/rd_cdm_v2_0_0_dev0.xlsx
Binary file not shown.
22 changes: 12 additions & 10 deletions docs/background/background_file.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. _background_file:

RD CDM Background
Background
=================

.. attention::
Expand Down Expand Up @@ -30,8 +30,8 @@ and reusable medical records cannot be overstated. Interoperable data formats
allow for more efficient research, better care coordination, and a clearer
understanding of complex clinical cases. However, existing medical systems often
fail to support the depth of phenotypic and genotypic data required for rare
disease research and treatment, making interoperability a crucial enabler for
improving outcomes in RD care.
disease research and treatment, making interoperability key for improving
outcomes in RD care.

To address these needs, we introduce our RD CDM v2.0.0— a common data model
specifically designed for rare diseases. This RD CDM simplifies the capture,
Expand All @@ -57,7 +57,7 @@ Steps in the development of the ontology-based Rare Disease Common Data Model
steps were performed concurrently and overlapped across multiple sites, this
methodology should be considered a non-hierarchical approach. First, we included
and assessed previous RD data models, followed by mapping elements to FHIR
basic resources v4.0.175 and Phenopacket Schema v2.0 elements17. A clinical
basic resources v4.0.1 and Phenopacket Schema v2.0 elements. A clinical
evaluation was performed to assess the relevance of these elements while
balancing the data model’s scope and spectrum of data granularity. We then
performed ontology-based encoding to establish a common denominator between the
Expand Down Expand Up @@ -102,6 +102,9 @@ when reading the tables for each section of our RD CDM.
The table can be found in Figshare at the following link:
`RD CDM v2.0.0 Excel Table <https://figshare.com/articles/dataset/_b_Common_Data_Model_for_Rare_Diseases_b_based_on_the_ERDRI-CDS_HL7_FHIR_and_the_GA4GH_Phenopackets_Schema_v2_0_/26509150>`_.

or can be downloaded here:
:download:`RD CDM v2.0.0 Excel Table <../_static/v2_0_0_dev0/RD CDM v2.0.0.xlsx>`.


RD CDM Layers of harmonisation
------------------------------
Expand All @@ -119,13 +122,12 @@ Type Layer, (5) the Value Set Layer, and (6) the Value Set Choice Layer. All
layers and their selection criteria are depicted in the figure below.

While over 95% of all data elements are directly aligned with HL7 FHIR or GA4GH
Phenopackets, only one-third of terminology bindings and 80% of value types
match the specifications outlined by these standards. Our ontology-based
approach results in less than 41% of value sets being directly derived from HL7
FHIR and GA4GH Phenopacket Schema, with slightly more than 45% of value set
choices were encoded accordingly.
Phenopackets, only one-third of terminology bindings and 85% of value types
match the specifications outlined by these standards. More than 87% of value
sets being directly are aligned with the specifications defined by
HL7 FHIR and GA4GH Phenopacket Schema,

.. note::
.. attention::
The RD CDM paper is currently under review. As soon as it is published, we
will provide a link to the paper here and all tables and figures will be
available in the paper.
Expand Down
8 changes: 7 additions & 1 deletion docs/changelog.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. _changelog:

RD CDM Changelog
Changelog
================

.. attention::
Expand All @@ -23,3 +23,9 @@ Version 2.0.0.dev0 (2024-09-30)

- Initial release of the RD CDM.


Version 2.0.0 (tba)
-------------------

to be announced as soon as the paper is published.

10 changes: 8 additions & 2 deletions docs/contributing.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. _contributing:

RD CDM Contributing
Contributing
===================

As we are actively developing the RD CDM, we welcome contributions in the form
Expand All @@ -13,6 +13,12 @@ the GA4GH Phenopacket Schema, HL7 FHIR, and the International Patient Summary.
We encourage contributions to the RD CDM. These contributions can be in the
form of new resources, new concepts, relationships, or implementations.

.. attention::
The RD CDM paper is currently under review. As soon as it is published, we
will update the version to 2.0.0 and provide a link to the paper here.
The version 2.0.0.dev0 is the initial release of the RD CDM under review.


If you would like to contribute, please consider the following:

1. GitHub Issues
Expand Down Expand Up @@ -44,7 +50,7 @@ and any feedback you may have.
----------------

If you would like to contribute to the documentation, please feel free create
an issue in our `GitHub repository <https://github.com/BIH-CEI/rd-cdm/issues>_`
an issue in our `GitHub repository <https://github.com/BIH-CEI/rd-cdm/issues>`_
or reach out to us directly. We are always looking for ways to improve our
documentation and welcome any suggestions.

Expand Down
2 changes: 1 addition & 1 deletion docs/license.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. _license_file:

RD CDM License
License
==============

The RD CDM is licensed under the MIT License. The full text of the license can
Expand Down
10 changes: 8 additions & 2 deletions docs/resources/resources_file.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. _resources_file:

RD CDM Resources
Resources
=================

.. attention::
Expand All @@ -19,7 +19,7 @@ The table provides an overview of the table columns used to depict our Rare
Disease Common Data Model (RD CDM). You can download the RD CDM v2.0.0.dev0
in an Excel here:

- :download:`RD CDM v2.0.0 Excel Table <../_static/v2_0_0_dev0/rd_cdm_v2_0_0_dev0.xlsx>`
- :download:`RD CDM v2.0.0 Excel Table <../_static/v2_0_0_dev0/RD CDM v2.0.0.xlsx>`

or access it on: `Figshare <https://figshare.com/articles/dataset/_b_Common_Data_Model_for_Rare_Diseases_b_based_on_the_ERDRI-CDS_HL7_FHIR_and_the_GA4GH_Phenopackets_Schema_v2_0_/26509150>`_.

Expand All @@ -45,6 +45,12 @@ CSV Files Download
For additional details, see :ref:`background_file`.


RD CDM v2.0.0
-------------

to be updated as soon as the paper is published.





Expand Down
2 changes: 1 addition & 1 deletion res/v2_0_0_dev0/rd_cdm_v2_0_0_dev0.json
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
"description": "The ontology-based Rare Disease Common Data Model (RD CDM) to enable international registry use, HL7 FHIR, and GA4GH Phenopackets.",
"metadata": {
"author": "Author Name",
"creationDate": "2024-10-03"
"creationDate": "2024-10-07"
},
"codeSystems": [
{
Expand Down
2 changes: 0 additions & 2 deletions src/parsing/create_data_elements_json.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '../..')))

from data_model.utils import json_serializer
from src.data_model.data_elements import DataElement, DataElementModel
from src.data_model.base_types import CodeSystem, Coding

def load_data_element_definitions(version):
"""Dynamically load the data elements for a given version."""
Expand Down
1 change: 0 additions & 1 deletion src/parsing/create_value_sets_json.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
import os
import importlib
from data_model.utils import json_serializer
from src.data_model.value_set import ValueSet, ValueSetChoice
from src.data_model.base_types import Coding, CodeSystem

# Add the src directory to the system path
Expand Down
4 changes: 1 addition & 3 deletions src/v2_0_0_dev0/rd_cdm_v2_0_0_dev0_codesystems_versions.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,8 @@ class CODESYSTEMS_VERSIONS_V2_0_0_dev0:
"""Code system versions for v2_0_0_dev0."""
versions = {
"NCBITaxon": "2024-07-03",
"GENO": "2023-10-08",
"SO": "2.6",
"ICD10CM": "2024-09-01",
"SNOMED": "2024-09-01",
"SNOMED": "2024-10-01",
"ICD11": "2024-09-01",
"HL7FHIR": "v4.0.1",
"GA4GH": "v2.0",
Expand Down
1 change: 0 additions & 1 deletion tests/v2/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
import data_model
import json
import os

def validate_schemas(version):
base_path = f'../res/v{version}/' # Adjusted for subfolder
Expand Down

0 comments on commit 55deddf

Please sign in to comment.