Skip to content

Latest commit

 

History

History
19 lines (15 loc) · 1.04 KB

README.md

File metadata and controls

19 lines (15 loc) · 1.04 KB

SI spaCy Spousal NER Model

Binder

From William's description in Slack:

The smithsonian_domain model is pretrained to recognize and extract SPOUSAL entities only. It's not what I would call SOTA, but it's enough to use the ner.correct recipe in Prodigy to speed up the annotation of a gold training set.

I included all the extra functions I've been writing the past few days in that folder as well under functions.py . The demo.py shows you how to use it. The patterns I created were based on regex rules that I passed to the EntityRuler in spaCy that then cultivated a good training set based on known patterns of 250 or so paragraphs. It then trained a spaCy ML NER model on that training set. You can include PERSON tags easily as well if you get a list of personal names. I believe the spaCy entity ruler if it finds 2 patterns that match, it defaults to the longer one's label, so that should keep SPOUSAL tags separate from PERSON tags.