if split_special_tokens==True，fast_tokenizer is slower than slow_tokenizer #1700

gongel · 2024-12-12T10:02:11Z

from transformers import LlamaTokenizer, LlamaTokenizerFast
import time
tokenizer1 = LlamaTokenizer.from_pretrained("./Llama-2-7b-chat-hf", split_special_tokens=True) # LlamaTokenizer
tokenizer2 = LlamaTokenizerFast.from_pretrained("./Llama-2-7b-chat-hf", split_special_tokens=True) # LlamaTokenizer
print(tokenizer1, tokenizer2)

s_time = time.time()
for i in range(1000):
    tokenizer1.tokenize("你好，where are you?"*100)
print(f"slow: {time.time() - s_time}")

s_time = time.time()
for i in range(1000):
    tokenizer2.tokenize("你好，where are you?"*100)
print(f"fast: {time.time() - s_time}")

output:
slow: 0.6021890640258789
fast: 0.7353882789611816

Narsil · 2025-01-10T09:06:14Z

If I use * 1000 instead of * 100 this is what I get on my small machine:

slow: 7.805477857589722
fast: 7.280818223953247

In general we don't look too heavily into micro benchmarks (unless it's a 10x), They don't usually tell a super compelling story.
For instance you could be using batch tokenization which should be much faster on Fast here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

if split_special_tokens==True，fast_tokenizer is slower than slow_tokenizer #1700

if split_special_tokens==True，fast_tokenizer is slower than slow_tokenizer #1700

gongel commented Dec 12, 2024 •

edited

Loading

Narsil commented Jan 10, 2025

if split_special_tokens==True，fast_tokenizer is slower than slow_tokenizer #1700

if split_special_tokens==True，fast_tokenizer is slower than slow_tokenizer #1700

Comments

gongel commented Dec 12, 2024 • edited Loading

Narsil commented Jan 10, 2025

gongel commented Dec 12, 2024 •

edited

Loading