-
Notifications
You must be signed in to change notification settings - Fork 641
Hugging Faces Tokenizer
afiaka87 edited this page Apr 15, 2021
·
1 revision
This repository supports Huggingface Tokenizers if you wish to use it instead of the default simple tokenizer. Simply pass in an extra --bpe_path when invoking train_dalle.py and generate.py, with the path to your BPE json file.
The only requirement is that you use 0 as the padding during tokenization
ex.
$ python train_dalle.py --image_text_folder ./path/to/data --bpe_path ./path/to/bpe.json