You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 23, 2024. It is now read-only.
I want to use YouTokenToMe for fast id encoding, but I need to do it with embeddings taken from here : https://nlp.h-its.org/bpemb/
Obviously, there is a pre-defined vocab there. Right now I don't see out-of-the-box way to "befriend" YouTokenToMe model with pre-defined vocab.
Are there any plans to implement something like build_from_vocab classmethod? If not, can I get any starter points on how to do it myself? Right now the model file looks a bit obscure to me, so I can't easily get started on building my own model file from vocab I have.
The text was updated successfully, but these errors were encountered:
Right now, you can't use external vocab to define your bpe model.
We plan to support converting different subword formats into yttm format in the future, but it seems to be slightly hard to implement.
Right now, you can't use external vocab to define your bpe model.
We plan to support converting different subword formats into yttm format in the future, but it seems to be slightly hard to implement.
Thank you for quick answer!
Can I at least get some pointers at where to look so I could try to make ad hoc solution myself? Like what does each line in .model file from you tutorial mean? I could try to look source code, but I'm not proficient in c++ and honestly, any sources with code-less (or at least pseudo-code) explanation of how your model gets loaded from file and works would help.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
I want to use YouTokenToMe for fast id encoding, but I need to do it with embeddings taken from here : https://nlp.h-its.org/bpemb/
Obviously, there is a pre-defined vocab there. Right now I don't see out-of-the-box way to "befriend" YouTokenToMe model with pre-defined vocab.
Are there any plans to implement something like
build_from_vocab
classmethod? If not, can I get any starter points on how to do it myself? Right now the model file looks a bit obscure to me, so I can't easily get started on building my own model file from vocab I have.The text was updated successfully, but these errors were encountered: