Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in match-mondo-sources-all-lexical.py #394

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion python-requirements-apple-silicon.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ charset-normalizer==3.3.2
class-resolver==0.4.2
click==8.1.7
colorama==0.4.6
curies==0.7.4
curies==0.7.6
Deprecated==1.2.14
deprecation==2.1.0
distlib==0.3.7
Expand Down
2 changes: 1 addition & 1 deletion python-requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ class-resolver==0.4.2
click==8.1.7
colorama==0.4.6
commonmark==0.9.1
curies==0.6.4
curies==0.7.6
decorator==5.1.1
Deprecated==1.2.13
deprecation==2.1.0
Expand Down
2 changes: 1 addition & 1 deletion src/ontology/config/prefixes.csv
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ rdf,http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs,http://www.w3.org/2000/01/rdf-schema#
xsd,http://www.w3.org/2001/XMLSchema#
owl,http://www.w3.org/2002/07/owl#
oio,http://www.geneontology.org/formats/oboInOwl#
Copy link
Contributor Author

@joeflack4 joeflack4 Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bugfix~: oio removed from prefixes.csv @matentzn

@matentzn It looks like this has unintended consequences; may want to revert.

Details

Overview

The advantage of removing this is that lexmatch succeeds without needing the changes that Charlie and I introduced to clean_prefix_map(), namely converter.get_prefixes(include_synonyms=True).

The negative is that now OAK is not checking for lexical matches on synonyms identified by the oio:hasExactSynonym field.

You can see a sample of the differences in the outputs by looking at the "b4" and "after" spreadsheets in the outputs GSheet. Comparing the before/after for the first of the outputs, called "oak output prefiltered", the 'after' version has 8 less entries of a total 198 entries in the 'before' set, a reduction of 4%.

Choices

a. Leave oio out of prefixes.csv. Forfeit matching on oio:hasExactSynonym.
b. Keep oio in prefixes.csv in tandem with [Charlie and Joe's bugfix](the changes that Charlie and I introduced) which fixes the perceived missing oio prefix (which is now a prefix synonym of oboInOwl).
c. Change OAK so that even if we remove oio from prefixes.csv, there is no lossiness (more below).

I think we should choose 'b', and possibly do 'c' separately, unless matching on oio:hasExactSynonym is considered unimportant, in which case we can just go with 'a'.

Additional discussion

I want to expand on 'c', possible OAK changes. I think we want OAK to somehow treat oio and oboInOwl as the same, but this is tricky and I can only begin to think about what to be done here. I can only guess and can't speak to the full nature of why, when oio is left out of prefixes.csv, lexmatches on oio:hasExactSynonym go away.

I wonder if it might have something to do with this, at the top of OAK's lexical_indexer.py:

QUALIFIER_DICT = {
    "exact": "oio:hasExactSynonym",
    "broad": "oio:hasBroadSynonym",
    "narrow": "oio:hasNarrowSynonym",
    "related": "oio:hasRelatedSynonym",
}

If you guys think OAK or SemanticSQL needs any sort of update, maybe you should tag Chris here.

Copy link
Member

@matentzn matentzn Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The negative is that now OAK is not checking for lexical matches on synonyms identified by the oio:hasExactSynonym field.

This makes no sense. The oio:hasExactSynonym should not exist, if the input prefix map does not contain it.

I will look into it...

Copy link
Contributor Author

@joeflack4 joeflack4 Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matentzn If I understand you correctly, that's what I'm saying. If prefixes.csv contains oio, then when I run the pipeline, it does matches on just on rdfs:label but also oio:hasExactSynonym, and you can see that reflected in my subsample outputs. And if I remove oio from prefixes.csv I no longer see those in the outputs.

So if we want to keep matching on oio:hasExactSynonym, it seems that we should keep oio in prefixes.csv, even though that causes it to be non-bijective. It's only passed to SemanticSQL anyway. Alternative is to ask OAK to rename oio to oboInOwl.

Also, if you change your mind and want to keep oio in prefixes.csv, then I think we need to merge mapping-commons/sssom-py#485 first, otherwise error will be thrown. So maybe best to merge this PR now and then if we want to put oio back, we can do it in a subsequent PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of today here's where we are:
Nico has looked into further and opened: INCATools/ontology-access-kit#698

Joe looked to see other places in which merged.db is used for identifying possible side effects of changing oio to oboInOwl in prefixes.csv when constructing merged.db. These other usages were found, but do not know if unintended side effects could occur:

  • lexmatch-sssom-compare.py
  • mappings.ipynb
  • oak-lexmatch-sssom-compare.ipynb
  • oak-lexmatch-sssom-compare-checkpoint.ipynb
  • remove-mappings-test.ipynb
  • synchronizeMappingsWithMondo-checkpoint.ipynb

Nico said he would put in some time to solve this, so Joe is pausing activity on this.

oboInOwl,http://www.geneontology.org/formats/oboInOwl#
dce,http://purl.org/dc/elements/1.1/
dct,http://purl.org/dc/terms/
foaf,http://xmlns.com/foaf/0.1/
Expand Down
2 changes: 1 addition & 1 deletion src/ontology/metadata/mondo.sssom.config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ curie_map:
semapv: https://w3id.org/semapv/vocab/
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
sssom: https://w3id.org/sssom/
oio: http://www.geneontology.org/formats/oboInOwl#
matentzn marked this conversation as resolved.
Show resolved Hide resolved
# oio: http://www.geneontology.org/formats/oboInOwl#
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
GTR: "http://purl.obolibrary.org/obo/mondo/mappings/unknown_prefix/GTR/"
NCI: "http://purl.obolibrary.org/obo/mondo/mappings/unknown_prefix/NCI/"
NIFSTD: "http://purl.obolibrary.org/obo/mondo/mappings/unknown_prefix/NIFSTD/"
Expand Down
40 changes: 21 additions & 19 deletions src/scripts/match-mondo-sources-all-lexical.py
Original file line number Diff line number Diff line change
@@ -1,17 +1,19 @@
# Basic matching pipeline that takes in
"""Match mondo sources, all lexical
Basic matching pipeline that takes in

# Input:
# 1. MERGED_ONTOLOGY = tmp/merged.owl
# 2. SSSOM_CONFIG = metadata/mondo.sssom.config.yml
# 3. OUTPUT_SSSOM = mapping/mondo-sources-all-lexical.sssom.tsv
Input:
1. MERGED_ONTOLOGY = tmp/merged.owl
2. SSSOM_CONFIG = metadata/mondo.sssom.config.yml
3. OUTPUT_SSSOM = mapping/mondo-sources-all-lexical.sssom.tsv

# I would try some basic things first:

# Use synonymiser
# Use oak.mapping() pipeline
I would try some basic things first:

Use synonymiser
Use oak.mapping() pipeline
"""
import logging
from pathlib import Path
from curies import Converter
from oaklib.resource import OntologyResource
from oaklib.implementations.sqldb.sql_implementation import SqlImplementation
from oaklib.utilities.lexical.lexical_indexer import (
Expand All @@ -25,11 +27,11 @@
import yaml
import pandas as pd

from sssom.constants import SUBJECT_ID, OBJECT_ID, PREDICATE_MODIFIER
from sssom.constants import SUBJECT_ID, OBJECT_ID
from sssom.util import filter_prefixes, is_curie, is_iri
from sssom.parsers import parse_sssom_table
from sssom.writers import write_table
from sssom.io import get_metadata_and_prefix_map, filter_file
from sssom.io import filter_file
from bioregistry import curie_from_iri

SRC = Path(__file__).resolve().parents[1]
Expand All @@ -49,6 +51,7 @@
)


# todo: duplicated code fragment w/ lexmatch-sssom-compare: solution, move to a lexmatch_utils.py and import to both
@click.group()
@click.option("-v", "--verbose", count=True)
@click.option("-q", "--quiet")
Expand Down Expand Up @@ -83,11 +86,11 @@ def main(verbose: int, quiet: bool):
)
@output_option
def run(input: str, config: str, rules: str, rejects: str, output: str):
# Implemented `meta` param in `lexical_index_to_sssom`

meta = get_metadata_and_prefix_map(config)
"""Run the script"""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AttributeError: 'Converter' object has no attribute 'items' (fixed)

You can mark this resolved. It's fixed.

Details

I thought this was ready to merge because it ran successfully through the debugger. However I'm got error in OAK now: AttributeError: 'Converter' object has no attribute 'items'.

ls Traceback (most recent call last):
  File "/work/src/ontology/../scripts/[match-mondo-sources-all-lexical.py](http://match-mondo-sources-all-lexical.py/)", line 176, in <module>
    main()
  File "/usr/local/lib/python3.10/dist-packages/click/[core.py](http://core.py/)", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/click/[core.py](http://core.py/)", line 1078, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/dist-packages/click/[core.py](http://core.py/)", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/dist-packages/click/[core.py](http://core.py/)", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/dist-packages/click/[core.py](http://core.py/)", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/work/src/ontology/../scripts/[match-mondo-sources-all-lexical.py](http://match-mondo-sources-all-lexical.py/)", line 122, in run
    msdf = lexical_index_to_sssom(oi, lexical_index, ruleset=ruleset, prefix_map=converter)
  File "/usr/local/lib/python3.10/dist-packages/oaklib/utilities/lexical/[lexical_indexer.py](http://lexical_indexer.py/)", line 317, in lexical_index_to_sssom
    {k: v for k, v in prefix_map.items() if k not in meta.prefix_map.keys()}
AttributeError: 'Converter' object has no attribute 'items'

I upgraded OAK within ODK but that didn't work. However, upgrading my ODK image did.

# Get metadata config
with open(config, "r") as f:
yml = yaml.safe_load(f)
converter = Converter.from_extended_prefix_map(yml.pop('extended_prefix_map', {}))

# Get mondo.sssom.tsv
mapping_msdf = parse_sssom_table(SSSOM_MAP_FILE)
joeflack4 marked this conversation as resolved.
Show resolved Hide resolved
Expand All @@ -108,8 +111,6 @@ def run(input: str, config: str, rules: str, rejects: str, output: str):
# .reset_index(drop=True)
# )

prefix_of_interest = yml["subject_prefixes"]

resource = OntologyResource(slug=f"sqlite:///{Path(input).absolute()}")
oi = SqlImplementation(resource=resource)
ruleset = load_mapping_rules(rules)
Expand All @@ -118,9 +119,9 @@ def run(input: str, config: str, rules: str, rejects: str, output: str):
save_lexical_index(lexical_index, OUT_INDEX_DB)

if rules:
msdf = lexical_index_to_sssom(oi, lexical_index, ruleset=ruleset, meta=meta)
msdf = lexical_index_to_sssom(oi, lexical_index, ruleset=ruleset, prefix_map=converter)
Copy link
Contributor Author

@joeflack4 joeflack4 Jan 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bugfix: meta -> converter

Along with removing oio from config/prefixes.csv, this is the essential bugfix. This allows us to use the new extended prefix map, and solves the .pop() error in the original issue #390.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundancy with #397

The main difference is that #397 does passes meta=None to lexical_index_to_sssom(), whereas #394 passes a Converter. Supposedly #397 approach is fine.

Everything else in that #394 is now cosmetic, so i can merge just those changes afterwards or ignore and close.

else:
msdf = lexical_index_to_sssom(oi, lexical_index, meta=meta)
msdf = lexical_index_to_sssom(oi, lexical_index, prefix_map=converter)

# msdf.prefix_map = sssom_yaml['curie_map']
# msdf.metadata = sssom_yaml['global_metadata']
Expand All @@ -131,8 +132,9 @@ def run(input: str, config: str, rules: str, rejects: str, output: str):
# msdf.df[OBJECT_ID] = msdf.df[OBJECT_ID].apply(
# lambda x: iri_to_curie(x) if x.startswith("<http") else x
# )
prefixes_of_interest = yml["subject_prefixes"]
msdf.df = filter_prefixes(
df=msdf.df, filter_prefixes=prefix_of_interest, features=[SUBJECT_ID, OBJECT_ID]
df=msdf.df, filter_prefixes=prefixes_of_interest, features=[SUBJECT_ID, OBJECT_ID]
)
msdf.remove_mappings(mapping_msdf)

Expand Down