Skip to content

Commit 189c0ed

Browse files
authored
De-duplicate ontology descriptions (OBOFoundry#1966)
Closes OBOFoundry#1965
1 parent 75b9dd8 commit 189c0ed

17 files changed

+49
-36
lines changed

ontology/aism.md

-2
Original file line numberDiff line numberDiff line change
@@ -33,5 +33,3 @@ activity_status: active
3333
repository: https://github.com/insect-morphology/aism
3434
preferredPrefix: AISM
3535
---
36-
37-
The AISM contains terms used in insect biodiversity research for describing structures of the exoskeleton and the skeletomuscular system. It aims to serve as the basic backbone of generalized terms to be expanded with order-specific terminology.

ontology/apo.md

-2
Original file line numberDiff line numberDiff line change
@@ -30,5 +30,3 @@ publications:
3030
- id: https://www.ncbi.nlm.nih.gov/pubmed/20157474
3131
title: "New mutant phenotype data curation system in the Saccharomyces Genome Database"
3232
---
33-
34-
A structured controlled vocabulary for the phenotypes of Ascomycete fungi

ontology/cheminf.md

-2
Original file line numberDiff line numberDiff line change
@@ -27,5 +27,3 @@ activity_status: active
2727
repository: https://github.com/semanticchemistry/semanticchemistry
2828
preferredPrefix: CHEMINF
2929
---
30-
31-
Includes terms for the descriptors commonly used in cheminformatics software applications and the algorithms which generate them.

ontology/cto.md

-2
Original file line numberDiff line numberDiff line change
@@ -20,5 +20,3 @@ repository: https://github.com/ClinicalTrialOntology/CTO
2020
preferredPrefix: CTO
2121
domain: health
2222
---
23-
24-
The core Ontology of Clinical Trials (CTO) will serve as a structured resource integrating basic terms and concepts in the context of clinical trials. Thereby covering clinicaltrails.gov. CoreCTO will serve as a basic ontology to generate extended versions for specific applications such as annotation of variables in study documents from clinical trials.

ontology/fbbi.md

-2
Original file line numberDiff line numberDiff line change
@@ -25,5 +25,3 @@ build:
2525
activity_status: active
2626
repository: https://github.com/CRBS/Biological_Imaging_Methods_Ontology
2727
---
28-
29-
A structured controlled vocabulary of sample preparation, visualization and imaging methods used in biomedical research.

ontology/hao.md

-2
Original file line numberDiff line numberDiff line change
@@ -31,5 +31,3 @@ publications:
3131
- id: https://www.ncbi.nlm.nih.gov/pubmed/21209921
3232
title: "A gross anatomy ontology for hymenoptera"
3333
---
34-
35-
A structured controlled vocabulary of the anatomy of the Hymenoptera (bees, wasps, and ants)

ontology/mamo.md

-2
Original file line numberDiff line numberDiff line change
@@ -23,5 +23,3 @@ activity_status: active
2323
preferredPrefix: MAMO
2424
domain: simulation
2525
---
26-
27-
The Mathematical Modelling Ontology (MAMO) is a classification of the types of mathematical models used mostly in the life sciences, their variables, relationships and other relevant features.

ontology/mpath.md

-2
Original file line numberDiff line numberDiff line change
@@ -27,5 +27,3 @@ activity_status: active
2727
repository: https://github.com/PaulNSchofield/mpath
2828
preferredPrefix: MPATH
2929
---
30-
31-
A structured controlled vocabulary of mutant and transgenic mouse pathology phenotypes

ontology/oarcs.md

-2
Original file line numberDiff line numberDiff line change
@@ -20,5 +20,3 @@ products:
2020
activity_status: active
2121
preferredPrefix: OARCS
2222
---
23-
24-
OArCS is an ontology describing the Arthropod ciruclatory system.

ontology/ohmi.md

-2
Original file line numberDiff line numberDiff line change
@@ -21,5 +21,3 @@ repository: https://github.com/ohmi-ontology/ohmi
2121
preferredPrefix: OHMI
2222
domain: organisms
2323
---
24-
25-
The Ontology of Host-Microbiome Interactions aims to ontologically represent and standardize various entities and relations related to microbiomes, microbiome host organisms (e.g., human and mouse), and the interactions between the hosts and microbiomes at different conditions.

ontology/opmi.md

-2
Original file line numberDiff line numberDiff line change
@@ -21,5 +21,3 @@ repository: https://github.com/OPMI/opmi
2121
preferredPrefix: OPMI
2222
domain: investigations
2323
---
24-
25-
The Ontology of Precision Medicine and Investigation (OPMI) aims to ontologically represent and standardize various entities and relations associated with precision medicine and related investigations at different conditions.

ontology/rs.md

-2
Original file line numberDiff line numberDiff line change
@@ -36,5 +36,3 @@ preferredPrefix: RS
3636
depicted_by: http://rgd.mcw.edu/common/images/rgd_LOGO_blue_rgd.gif
3737
domain: organisms
3838
---
39-
40-
Ontology of rat strains

ontology/spd.md

-2
Original file line numberDiff line numberDiff line change
@@ -30,5 +30,3 @@ publications:
3030
- id: https://doi.org/10.3390/d11100202
3131
title: "The Spider Anatomy Ontology (SPD)—A Versatile Tool to Link Anatomy with Cross-Disciplinary Data"
3232
---
33-
34-
An ontology for spider comparative biology including anatomical parts (e.g. leg, claw), behavior (e.g. courtship, combing) and products (i.g. silk, web, borrow).

ontology/wbbt.md

-2
Original file line numberDiff line numberDiff line change
@@ -41,5 +41,3 @@ usages:
4141
activity_status: active
4242
repository: https://github.com/obophenotype/c-elegans-gross-anatomy-ontology
4343
---
44-
45-
A structured controlled vocabulary of the anatomy of <i>Caenorhabditis elegans</i>.

ontology/wbls.md

-2
Original file line numberDiff line numberDiff line change
@@ -43,5 +43,3 @@ usages:
4343
activity_status: active
4444
repository: https://github.com/obophenotype/c-elegans-development-ontology
4545
---
46-
47-
A structured controlled vocabulary of the development of <i>Caenorhabditis elegans</i>.

ontology/wbphenotype.md

-2
Original file line numberDiff line numberDiff line change
@@ -51,5 +51,3 @@ usages:
5151
activity_status: active
5252
repository: https://github.com/obophenotype/c-elegans-phenotype-ontology
5353
---
54-
55-
A structured controlled vocabulary of <i>Caenorhabditis elegans</i> phenotypes

tests/test_integrity.py

+49-4
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
DOI_PREFIX = "https://doi.org/"
2121
CHEMRXIV_DOI_PREFIX = "https://doi.org/10.26434/chemrxiv"
2222

23+
2324
def get_data():
2425
"""Get ontology data."""
2526
ontologies = {}
@@ -32,6 +33,7 @@ def get_data():
3233

3334
# Load the data like it is YAML
3435
data = yaml.safe_load("\n".join(lines[1:idx]))
36+
data["long_description"] = "".join(lines[idx:])
3537
ontologies[data["id"]] = data
3638
return ontologies
3739

@@ -83,8 +85,14 @@ def test_publications(self):
8385

8486
for i, usage in enumerate(data.get("usages", [])):
8587
for j, publication in enumerate(usage.get("publications", [])):
86-
self.assertIn("user", usage, msg=f"Malformed usage missing a user in {ontology}")
87-
with self.subTest(ontology=ontology, user=usage["user"], id=publication["id"]):
88+
self.assertIn(
89+
"user",
90+
usage,
91+
msg=f"Malformed usage missing a user in {ontology}",
92+
)
93+
with self.subTest(
94+
ontology=ontology, user=usage["user"], id=publication["id"]
95+
):
8896
self.assert_valid_publication_id(
8997
publication,
9098
msg=f"{ontology} usage {i} publication {j} has unexpected identifier: {publication['id']}",
@@ -127,7 +135,12 @@ def assert_valid_publication_id(self, publication, msg=None):
127135
)
128136

129137
# Make sure that the unversioned DOI is used
130-
if is_arxiv or is_biorxiv or is_medrxiv or identifier.startswith(CHEMRXIV_DOI_PREFIX):
138+
if (
139+
is_arxiv
140+
or is_biorxiv
141+
or is_medrxiv
142+
or identifier.startswith(CHEMRXIV_DOI_PREFIX)
143+
):
131144
for v in range(1, 100):
132145
self.assertFalse(
133146
identifier.endswith(f".v{v}"), msg="Please use an unversioned DOI"
@@ -147,13 +160,45 @@ def test_schema_mandatory(self):
147160
}
148161
self.assertEqual(required - skip_keys, high_level - skip_keys)
149162

163+
@staticmethod
164+
def skip_inactive(record) -> bool:
165+
"""Check if should skip for inactive records."""
166+
return record.get("activity_status") != "active"
167+
150168
def test_preferred_prefix(self):
151169
"""Test all preferred prefixes."""
152170
for prefix, record in self.ontologies.items():
153171
with self.subTest(prefix=prefix):
154-
if record.get("activity_status") != "active":
172+
if self.skip_inactive(record):
155173
continue
156174
preferred_prefix = record.get("preferredPrefix")
157175
self.assertIsNotNone(preferred_prefix)
158176
self.assertLessEqual(2, len(preferred_prefix))
159177
self.assertNotIn(" ", preferred_prefix)
178+
179+
def test_redundant_descriptions(self):
180+
"""Test that the description field is not redundant of the long form description."""
181+
for prefix, record in self.ontologies.items():
182+
if self.skip_inactive(record):
183+
continue
184+
description = record.get("description")
185+
long_description = record["long_description"]
186+
if description is None:
187+
continue
188+
with self.subTest(prefix=prefix):
189+
self.assertNotEqual(
190+
_string_norm(description),
191+
_string_norm(long_description),
192+
msg=f"Effectively the same description was reused in the short and long-form field for {prefix}",
193+
)
194+
195+
196+
def _string_norm(s: str) -> str:
197+
return (
198+
s.strip()
199+
.lower()
200+
.replace("\n", "")
201+
.replace(" ", "")
202+
.replace(".", "")
203+
.replace("-", "")
204+
)

0 commit comments

Comments
 (0)