SI spaCy Spousal NER Model

From William's description in Slack:

The smithsonian_domain model is pretrained to recognize and extract SPOUSAL entities only. It's not what I would call SOTA, but it's enough to use the ner.correct recipe in Prodigy to speed up the annotation of a gold training set.

I included all the extra functions I've been writing the past few days in that folder as well under functions.py . The demo.py shows you how to use it. The patterns I created were based on regex rules that I passed to the EntityRuler in spaCy that then cultivated a good training set based on known patterns of 250 or so paragraphs. It then trained a spaCy ML NER model on that training set. You can include PERSON tags easily as well if you get a list of personal names. I believe the spaCy entity ruler if it finds 2 patterns that match, it defaults to the longer one's label, so that should keep SPOUSAL tags separate from PERSON tags.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
texts		texts
tmp		tmp
word_vectors		word_vectors
Demo.ipynb		Demo.ipynb
README.md		README.md
demo.py		demo.py
environment.yml		environment.yml
functions.py		functions.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SI spaCy Spousal NER Model

About

Releases

Packages

Languages

sidatasciencelab/si_spacy_spousal

Folders and files

Latest commit

History

Repository files navigation

SI spaCy Spousal NER Model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages