-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor phonemizer to operate at Doc level #16
Comments
another possibility is leaving out non-phonetic tokens entirely and using an doc.text
>>> "北冥有魚,其名為鯤。"
doc._.phonemes
>>> "pok meang hjuwX ngjo tshen mjieng sjew kwon"
doc[4].text
>>> ","
doc[4]._.phonemes
>>> None |
|
might need to subclass
|
phonologizer
training
tokens
data (
|
spacy's general design philosophy is that the
Doc
owns the data andSpan
s andToken
s are just views of this data. it makes sense to replicate this, especially to handle cases where the phoneme data doesn't cleanly align toToken
s (for which we could maybe even employAlignment
).The text was updated successfully, but these errors were encountered: