Using YouTokenToMe with pre-defined vocab and embeddings #84

alexbalandi · 2021-02-16T09:45:14Z

I want to use YouTokenToMe for fast id encoding, but I need to do it with embeddings taken from here : https://nlp.h-its.org/bpemb/
Obviously, there is a pre-defined vocab there. Right now I don't see out-of-the-box way to "befriend" YouTokenToMe model with pre-defined vocab.
Are there any plans to implement something like build_from_vocab classmethod? If not, can I get any starter points on how to do it myself? Right now the model file looks a bit obscure to me, so I can't easily get started on building my own model file from vocab I have.

The text was updated successfully, but these errors were encountered:

kefirski · 2021-02-16T09:55:39Z

Hi @alexbalandi!

Right now, you can't use external vocab to define your bpe model.
We plan to support converting different subword formats into yttm format in the future, but it seems to be slightly hard to implement.

alexbalandi · 2021-02-16T10:51:10Z

Hi @alexbalandi!

Right now, you can't use external vocab to define your bpe model.
We plan to support converting different subword formats into yttm format in the future, but it seems to be slightly hard to implement.

Thank you for quick answer!
Can I at least get some pointers at where to look so I could try to make ad hoc solution myself? Like what does each line in .model file from you tutorial mean? I could try to look source code, but I'm not proficient in c++ and honestly, any sources with code-less (or at least pseudo-code) explanation of how your model gets loaded from file and works would help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using YouTokenToMe with pre-defined vocab and embeddings #84

Using YouTokenToMe with pre-defined vocab and embeddings #84

alexbalandi commented Feb 16, 2021

kefirski commented Feb 16, 2021

alexbalandi commented Feb 16, 2021

Using YouTokenToMe with pre-defined vocab and embeddings #84

Using YouTokenToMe with pre-defined vocab and embeddings #84

Comments

alexbalandi commented Feb 16, 2021

kefirski commented Feb 16, 2021

alexbalandi commented Feb 16, 2021