You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hmm, thanks @iacopy! Most of these look like tokenization errors, leading to misclassification. Some of them also look like reasonable entities to me also. If you can consistently recognise an issue with the tokenization, you can add exceptions to the spacy tokenizer, or re-tokenize after the fact to fix them.
Yeah, I remember I had some code in the tokenizer to deal with parentheses a bit better, but at some point spacy changed from the regex package to the re package, and that code required variable width lookbehinds, which re does not support, so it was commented out. Not sure thats the entirety of the problem, but given how many of these have unbalanced parens, i think it is part of it.
Hi, I just report problematic named entities I found using
en_core_sci_sm
, to improve the model. Most of them contain unbalanced brackets.The text was updated successfully, but these errors were encountered: