Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EntityLinker knowledge base returns CUIs not MeSH IDs when 'mesh' is selected #355

Open
xegulon opened this issue May 17, 2021 · 6 comments

Comments

@xegulon
Copy link

xegulon commented May 17, 2021

I'm using scispaCy entity linker using this snippet:

from scispacy.linking import EntityLinker
import spacy, scispacy

config = {
    "resolve_abbreviations": True,  
    "name": "mesh", 
    "max_entities_per_mention":1
}

nlp = spacy.load("en_core_sci_sm")
nlp.add_pipe("scispacy_linker", config=config) 

linker = nlp.get_pipe("scispacy_linker")

def mesh_extractor(text):
    doc = nlp(text)
    for e in doc.ents:
        if e._.kb_ents:
            cui = e._.kb_ents[0][0]
            print(e, cui)

text = "Give him three injection of paracetamol"

​Then when I use it:

>> mesh_extractor(text)
Give C1947971
injection C0021485

But, in the README of scispaCy, I see that for MeSH, it should not return UMLS CUIs, but the specific MeSH IDs (for example, D003435). How to fix this? Did I understand something badly?

@dakinggg
Copy link
Collaborator

ahh, the config parameter is called linker_name, not name. If you set linker_name instead, it should work.

@xegulon
Copy link
Author

xegulon commented May 19, 2021

Thanks a lot!

@xegulon xegulon closed this as completed May 19, 2021
@Braianpp
Copy link

Braianpp commented Jul 4, 2023

I am getting the same error eve using linker_name in the configurator:

config = {
"resolve_abbreviations": True,
"linker_name": "mesh",
"max_entities_per_mention":5
}

nlp = spacy.load("en_core_sci_md")

nlp.add_pipe("scispacy_linker", config=config)

linker = nlp.get_pipe("scispacy_linker")

doc = nlp("Pre-diabetes Obesity Type-2 Diabetes Mellitus Obesity Overweight")

for e in doc.ents:
if e..kb_ents:
cui = e.
.kb_ents[0][0]
print(e, cui)

and I get:
Pre-diabetes C0362046
Obesity C0028754
Diabetes Mellitus C0011849
Obesity C0028754
Overweight C0497406

I also used other Scispacy model: nlp = spacy.load("en_ner_bionlp13cg_md") in the same script, I don't know if it matters

@dakinggg dakinggg reopened this Jul 5, 2023
@dakinggg
Copy link
Collaborator

dakinggg commented Jul 5, 2023

Hi, it looks like the original mesh linker was created with a separate kb, rather than just a subset of UMLS. The process for creating the linker may have been lost. When I recreated the linkers for the latest UMLS release, I just used a subset of UMLS to produce the mesh linker. I'll have to look into this and decide whether to just stick to the current UMLS ids, or try to recreate the old version of the linker. Sorry about that. For now you will need to map between UMLS id and mesh id yourself.

@Braianpp
Copy link

Braianpp commented Jul 5, 2023

I see, maybe I will try using the previous scispacy version (0.5.1) that should work. Thank you very much for answering my question!

@JohnGiorgi
Copy link
Contributor

Also facing this problem, but I am able to map to MeSH from UMLS CUIs using the MRCONSO.RRF file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants