Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingest OMOP2OBO mappings #7

Open
matentzn opened this issue Feb 24, 2022 · 3 comments
Open

Ingest OMOP2OBO mappings #7

matentzn opened this issue Feb 24, 2022 · 3 comments
Assignees

Comments

@matentzn
Copy link
Contributor

Lets focus on MONDO/ICD10 related ones for now.

cc @callahantiff

@callahantiff
Copy link

Sounds great!

Just so it's recorded here and since the way we import might be impacted by this. The mappings I sent this morning are the most "confident" i.e. those that are an exact match to a string in a label, definition, or synonym or those that were obtained from an existing dbxref from one the ontology or a support resource. There are other ways to get mappings (e.g., hierarchical search/traverse for parents or children and some fancy new recursive search that we can also leverage) and we can explore those in the future if you think they would be useful.

I also want to get your feedback on what I have included in the file since I opted to include a lot of information that makes the file sizes larger and that might not actually be helpful.

@callahantiff
Copy link

callahantiff commented Feb 24, 2022

Last thing. In case it is helpful, here are all of the sources that the first version includes mappings from to a Mondo. The number is the count of unique Mondo concepts mapped to each source. There are duplicates here as I am reporting the original way a source has named each vocabulary (when I process these they are normalized on the backend).

Summary tables <style> </style>
2021AA - UMLS Metathesaurus 7442
AI/RHEUM, 1993 81
Alcohol and Other Drug Thesaurus, 2000 662
Alternative Billing Concepts, 2009 1
American College of Cardiology/American Heart Association Clinical Data Terminology, 2009D 55
Anatomical Therapeutic Chemical Classification System, ATC_2021 1
Authorized Osteopathic Thesaurus, 2003 4
Beth Israel Vocabulary, 1.0 430
BioCarta online maps of molecular pathways, adapted for NCI use, 2009D 1
Biomedical Research Integrated Domain Group Model, 3.0.3, 2009D 5
CDISC Glossary Terminology, 2009D 1
COSTAR, 1989-1995 588
COSTART, 1995 662
CRISP Thesaurus, 2006 1031
Cancer Data Standards Registry and Repository, 2009D 393
Cancer Research Center of Hawaii Nutrition Terminology, 2009D 5
Cancer Therapy Evaluation Program - Simple Disease Classification, 2009D 150
Canonical Clinical Problem Statement System, 1999 800
Cellosaurus, 2009D 760
Clinical Care Classification, 2_5_2018 10
Clinical Classifications Software Refined for ICD-10-CM, 2021 66
Clinical Classifications Software, 2005 150
Clinical Data Interchange Standards Consortium, 2009D 463
Clinical Terms Version 3 (CTV3) (Read Codes), 1999 1734
Clinical Trial Data Commons, 2009D 5
Clinical Trials Reporting Program, 2009D 675
Common Terminology Criteria for Adverse Events 3.0, 2009D 71
Common Terminology Criteria for Adverse Events 5.0, 2009D 249
Common Terminology Criteria for Adverse Events, 2009D 236
Consumer Health Vocabulary, 2011_02 1905
Content Archive Resource Exchange Lexicon, 2009D 8
Current Procedural Terminology, 2021 1
DXplain, 1994 706
Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5), 2015 49
Diseases Database, 2000 18
DrugBank, 5.0_2016_06_22, 5.0_2021_01_29 74
European Directorate for the Quality of Medicines & Healthcare, 2009D 4
FDB MedKnowledge (formerly NDDF Plus), 2021_02_10 5
Foundational Model of Anatomy Ontology, 4_15 128
Gene Ontology, 2020_05_02 85
Geopolitical Entities, Names, and Codes (GENC) Standard Edition 1, 2009D 89
Global Alignment of Immunization Safety Assessment in pregnancy, 2009D 6
HCPCS Version of Current Dental Terminology (CDT), 2021 1
HL7 Vocabulary Version 2.5, 2003_08_30 66
HL7 Vocabulary Version 3.0, 2020_11 93
HUGO Gene Nomenclature Committee, 2020_05 895
Healthcare Common Procedure Coding System, 2021 1
Human Phenotype Ontology, 2020_10_12 1159
ICD10, 1998 789
ICD10, 2016 7991
ICD10, American English Equivalents, 1998 93
ICPC-2 PLUS 842
ICPC2 - ICD10 Thesaurus, 200412 797
ICPC2 - ICD10 Thesaurus, American English Equivalents, 0412 1
International Classification for Nursing Practice, 2019 29
International Classification of Diseases, 10th Edition, Clinical Modification, 2021 2057
International Classification of Diseases, Ninth Revision, Clinical Modification, 2014 1028
International Classification of Diseases, Ninth Revision, Clinical Modification, Metathesaurus additional entry terms, 2014 752
International Classification of Functioning, Disability and Health for Children and Youth, 2008 6
International Classification of Functioning, Disability and Health, 2008_12_19 6
International Classification of Primary Care 2nd Edition, Electronic, 2E, 200203 105
International Classification of Primary Care 2nd Edition, Electronic, 2E, American English Equivalents, 200203 11
International Classification of Primary Care, 1993 87
International Conference on Harmonization, 2009D 11
International Neonatal Consortium, 2009D 7
International Statistical Classification of Diseases and Related Health Problems, 10th Revision, Australian Modification, January 2000 Release 866
International Statistical Classification of Diseases and Related Health Problems, Australian Modification, Americanized English Equivalents, 2000 148
Jackson Laboratories Mouse Terminology, adapted for NCI use, 2009D 3
KEGG Pathway Database, 2009D 33
LOINC, 269 523
Library of Congress Subject Headings, 1990 508
Library of Congress Subject Headings, Northwestern University subset, 2013 692
MEDCIN, 3_2020_12_15 1698
Medical Dictionary for Regulatory Activities Terminology (MedDRA), 23.1 1925
Medical Entities Dictionary, 2003 6
Medical Subject Headings, 2021_2021_01_25 2217
Medication Reference Terminology, 2021_03_01 3
MedlinePlus Health Topics, 20201125 628
Metathesaurus FDA Structured Product Labels, 2021_02_19 10
Metathesaurus Source Terminology Names 12
Metathesaurus Version of Minimal Standard Terminology Digestive Endoscopy, 2001 56
Multum MediSource Lexicon, 2021_02_01 31
NANDA-I Taxonomy II, 2018-2020 164
NCBI Taxonomy, 2020_05_21 90
NCI Dictionary of Cancer Terms, 2009D 588
NCI Genomic Data Commons Terms, 2009D 685
NCI HUGO Gene Nomenclature, 2009D 93
NCI Health Level 7, 2009D 6
NCI Integrated Canine Data Commons Terms, 2009D 3
NCI Thesaurus, 2020_09D 2334
National Council for Prescription Drug Programs, 2009D 2
National Institute of Child Health and Human Development, 2009D 1106
Neuronames Brain Hierarchy, 2020_05_28 143
Nursing Outcomes Classification (NOC), 6 82
Omaha System, 2005 13
Online Congenital Multiple Anomaly/Mental Retardation Syndromes, 1999 391
Online Mendelian Inheritance in Man, 2021_02_08 2119
Patient Care Data Set, 1997 12
Pediatric Cancer Data Commons, 2009D 33
Perioperative Nursing Data Set, 4_2018 2
Physician Data Query, 2018_10_27 739
QMR clinically related terms from Randolph A. Miller, 1999 10
Quick Medical Reference (QMR), 1996 236
Read thesaurus Americanized Synthesized Terms, 1999 20
Read thesaurus, American English Equivalents, 1999 628
Read thesaurus, Synthesized Terms, 1999 24
RxNorm Vocabulary, 20AA_210301F 6
SNOMED International, 1998 1471
SNOMED-2, 2 1246
Source of Payment Typology, 9.2 2
Thesaurus of Psychological Index Terms, 2004 294
U.S. Centers for Disease Control and Prevention, 2009D 1
U.S. Food and Drug Administration, 2009D 166
UMDNS: product category thesaurus, 2021 6
UMLS Metathesaurus 270
US Edition of SNOMED CT, 2021_03_01 8363
USP Compendial Nomenclature, 2021_02_15 1
USP Medicare Model Guidelines, 2020 1
UltraSTAR, 1993 7
Unified Code for Units of Measure, 2009D 23
University of Washington Digital Anatomist, 1.7.3 27
Vaccines Administered, 2017_02_08, 2021_01_29 21
Veterans Health Administration National Drug File, 2021_01_29 11
WHO Adverse Reaction Terminology, 1997 567
csp 35
dermo 1
doid 9115
efo 2639
gard 5326
gtr 33
hgnc 43
hp 517
icd-10 1
icd10 8849
icd10cm 9
icd11 1
icd9 4110
icd9cm 1
icdo 655
ido 1
kegg 33
loinc 1
meddra 1316
medgen 26
mesh 7555
mfomd 3
mondo 107
mp 3
mpath 1
mth 1
ncit 6647
ndfrt 1
nifstd 18
obi 1
ogms 1
omim 9619
omimps 493
omop 5
oncotree 517
orphanet 10292
pato 1
pmid 26
reactome 1
scdo 2
scitd 1
sctid 8413
sctid_2010_1_31 4
snomedct 1
umls 14440
umls_cui 3
wikidata 2
wikipedia 82

@matentzn matentzn changed the title Ingest OMOP2OWL mappings Ingest OMOP2OBO mappings Apr 10, 2022
@joeflack4
Copy link

joeflack4 commented Apr 12, 2022

Just documenting here per Nico's request.

Tiffany recently produced and explained these ICD10::Mondo mappings:

My basic understanding is that OMOP2OBO was used to generate ICD10/ICD10CM::Mondo mappings. I think an input file (perhaps Mondo itself) was used, because there are some DBXREFs in there, which I imagine were obtained from Mondo. In the absence of direct cross references, exact string matches were used.

In addition to direct mappings (first tab in the file) there were also mappings done between Mondo terms and ICD term ancestors (first tab), and children (second tab). Sometimes ICD terms were mapped to Mondo children (second tab). I assume that in mapping to ancestors or children, there needed to be a starting place, so I imagine that came from the original set of mappings (from Mondo?) used as an input to this process.

@callahantiff If you can correct any of my misunderstanding, that would be great.


Here's the raw text from Tiffany's explanation:

The file has two tabs. Note that the first tab (i.e., “OMOP2OBO_ICD10_ICD10CM_ExactMap”) contains the primary mappings (19,139 Mondo concepts  6,588 ICD10/CM concepts). These mappings were created using the tested and most confident parts of the new functionality that will become available with the next release. Note that I have only included the exact string matches (to labels, synonyms, and definitions) and dbXRefs. Whenever possible mappings were created at the concept-level, but if a mapping could not be established at this level, then a mapping was attempted at the ancestor level. Currently, this works by traversing the hierarchy, where all parent concepts are searched until a match is achieved. An improvement over the initial release, when a concept is mapped at the ancestor level it will include an integer that specifies you how many levels (i.e., parent, grandparent, etc) above the concept the mapping was made. For example, the Mondo concept alopecia, isolated (MONDO_0000005) was mapped to the ICD10 concept nonscarring hair loss (L65.9) via it’s grandparent concept alopecia (MONDO_0004907). The evidence string provided for this mapping is: “OBO Ancestor: MONDO_0004907 - 2 level(s) above MONDO_0000005 on icd10:L65.9”. I’d love to know if you find this helpful. In the future, I think it could provide useful context for helping to generate a confidence score for the mapping (not something that I have yet, but I would love to implement this in the future).

A few important things to note:

  • I am still working on the best phrasing for the mapping evidence. Hopefully it makes sense, I tried to make the mappings as transparent as possible
  • The file contains duplicate rows this is intentional and was done to keep the evidence pieces for the different ways a mapping can be created between an ICD and Mondo concept separate. You can totally collapse the rows by combining the mappings, I just thought you might prefer to have it separate for now as you might prefer certain types of mappings over others (although this should not have an impact on the resulting mapping) and this would ensure that the file can be easily filtered. If you need help aggregating the file in this way, just let me know.

The second tab (i.e., “OMOP2OBO_ICD10_ICD10CM_ChildMap”) contains mappings from a beta feature that I have been working on and I included it just in case it might be helpful to you. These mappings are meant to help address the issue that ICD10 tends to be more granular than Mondo. Thus, these mappings take advantage of the ontologies descendant hierarchy. See examples in screenshot below.
download
In contrast to the approach used when mapping a concept at the ancestor level, here we are searching for more specific mappings in an effort to try and capture the loss of granularity between ICD and Mondo. So, you can see from above that we are able to extend the Mondo concept inflammatory diarrhea by mapping it to several more specific, but related ICD10 concepts. The string in the map_evidence column provides an explanation. Take the first row, the mapping evidence states that ICD10 A03 was mapped to MONDO_0000252 via it’s descendant concept MONDO_0019345, which is two levels below MONDO_0000252. I included in this figure one additional example – Piedra. Please note that I have not manually verified all of these mappings. I did perform a sport-check to remove many of the obviously incorrect mappings. There is still a chance that some errors may exist, but many of the mappings also look pretty good. You can be most confident of the mappings with map_type “DBXREF”. Let me know if you have any questions about these and please don’t feel like you have to use them, I included them because I thought they might potentially be useful to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants