You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello there.
I tried to use m2_bert_80M_2k to generate embeddings for text strings with lengths around 500 tokens following your example on Huggingface](https://huggingface.co/togethercomputer/m2-bert-80M-2k-retrieval). However, the outputs = model(**input_ids) line tooks over 15s on average, slower than expected. Could you please help me find the issue here?
I also tested your example for 12 tokens. The model forwarding process is still slow (over 5s for 12tokens & padding="longest", over 16s for 12tokens & padding="max_length"(=2048).
Thanks in advance!
The text was updated successfully, but these errors were encountered:
Hm interesting! This is definitely slower than it should be :)
Can you give some more details on your environment? (GPU, version of PyTorch, etc)
If you're using FlashFFTConv, a known issue (that will hopefully go away with a refactor this summer) is that the very first call to FlashFFTConv is very slow. So the way you can get around this is that you call the model once, and then keep it live for your actual workloads.
Hello there.
I tried to use m2_bert_80M_2k to generate embeddings for text strings with lengths around 500 tokens following your example on Huggingface](https://huggingface.co/togethercomputer/m2-bert-80M-2k-retrieval). However, the
outputs = model(**input_ids)
line tooks over 15s on average, slower than expected. Could you please help me find the issue here?I also tested your example for 12 tokens. The model forwarding process is still slow (over 5s for 12tokens & padding="longest", over 16s for 12tokens & padding="max_length"(=2048).
Thanks in advance!
The text was updated successfully, but these errors were encountered: