-
Notifications
You must be signed in to change notification settings - Fork 358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Newer versions of Spacy transformer model backends failing #246
Comments
Thank you for sharing this issue! If I'm not mistaken, it seems this is a result of an updated version of SpaCy. I believe there should be an additional check here to see which version of SpaCy is being used and update it to then use DocTransformerOutput. If you are interested, a PR would be great. If you do not have the time, I can start working on it. |
I'll give it a look. https://spacy.io/api/curatedtransformer#doctransformeroutput-lasthiddenlayerstate |
Hello again. Some notes: _spacy.py
my test file
Outputs from test file: |
Checking the documentation it seems that you can access the embedding layer as follows: https://spacy.io/api/curatedtransformer#doctransformeroutput-embeddinglayer. Which we can then perhaps use to average all tokens in order to create an embedding for the entire document. Having said that, it would be preferred if we could perhaps find the [cls] token to use but I cannot seem to find it in the documentation. |
I use spacy's transformer model for other purposes (such as NER), so re-using the same model made sense.
Looks like Spacy made some tweaks to their syntax which are breaking KeyBERT's spacy backend.
Sample code:
Expected behavior:
prints [("test", ...)]
Observed behavior:
Package versions:
cupy-cuda11x 12.3.0
curated-tokenizers 0.0.9
curated-transformers 0.1.1
en-core-web-trf 3.7.3
keybert 0.8.5
keyphrase-vectorizers 0.0.13
safetensors 0.4.4
scikit-learn 1.5.1
scipy 1.13.1
sentence-transformers 3.0.1
spacy 3.7.5
spacy-alignments 0.9.1
spacy-curated-transformers 0.2.2
spacy-legacy 3.0.12
spacy-loggers 1.0.5
spacy-transformers 1.3.5
thinc 8.2.5
tokenizers 0.15.2
transformers 4.36.2
The text was updated successfully, but these errors were encountered: