You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 23, 2024. It is now read-only.
I want to train a GPT2 model with new vocabulary. I am following instructions given here: https://github.com/mgrankin/ru_transformers. YTTM tokenizer outputs a yt.model file that has the new vocab. However the run_generation.py script requires vocab.json and merges.txt files. I can see the vocab with below command:
yttm vocab --model yt.model
But I don't know how to convert it into vocab.json and merges.txt format. Shouldn't this have been a common problem?
The text was updated successfully, but these errors were encountered:
I want to train a GPT2 model with new vocabulary. I am following instructions given here: https://github.com/mgrankin/ru_transformers. YTTM tokenizer outputs a yt.model file that has the new vocab. However the run_generation.py script requires vocab.json and merges.txt files. I can see the vocab with below command:
yttm vocab --model yt.model
But I don't know how to convert it into vocab.json and merges.txt format. Shouldn't this have been a common problem?
The text was updated successfully, but these errors were encountered: