We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
transformers
@ArthurZucker and @itazap
examples
When I load and then save the tokenizer with OLMO models, the tokenizer.json files appear different, particularly with the merge key.
merge
The code to reproduce that is :
from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("allenai/OLMo-1B-0724-hf") tokenizer.save_pretrained("saved_tokenizer")
The original tokenizer.json and the saved tokenizer.json should be the same.
tokenizer.json
The text was updated successfully, but these errors were encountered:
Hey @zzf1130, I believe this change is to make the tokenizer.json more flexible/future-proof, and is therefore a voluntary change.
I will let @ArthurZucker comment on it, thank you!
Sorry, something went wrong.
No branches or pull requests
System Info
transformers
version: 4.45.0Who can help?
@ArthurZucker and @itazap
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
When I load and then save the tokenizer with OLMO models, the tokenizer.json files appear different, particularly with the
merge
key.The code to reproduce that is :
Expected behavior
The original
tokenizer.json
and the savedtokenizer.json
should be the same.The text was updated successfully, but these errors were encountered: