Embedding speed seems slow #30

YourSaDady · 2024-05-11T15:50:58Z

Hello there.
I tried to use m2_bert_80M_2k to generate embeddings for text strings with lengths around 500 tokens following your example on Huggingface](https://huggingface.co/togethercomputer/m2-bert-80M-2k-retrieval). However, the outputs = model(**input_ids) line tooks over 15s on average, slower than expected. Could you please help me find the issue here?

I also tested your example for 12 tokens. The model forwarding process is still slow (over 5s for 12tokens & padding="longest", over 16s for 12tokens & padding="max_length"(=2048).

Thanks in advance!

The text was updated successfully, but these errors were encountered:

DanFu09 · 2024-05-11T22:51:20Z

Hm interesting! This is definitely slower than it should be :)

Can you give some more details on your environment? (GPU, version of PyTorch, etc)

If you're using FlashFFTConv, a known issue (that will hopefully go away with a refactor this summer) is that the very first call to FlashFFTConv is very slow. So the way you can get around this is that you call the model once, and then keep it live for your actual workloads.

YourSaDady · 2024-05-12T06:22:46Z

Sure.
PyTorch version: 2.3.0+cu121
GPU device: NVIDIA RTX A6000
I am not using FlashFFTConv.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embedding speed seems slow #30

Embedding speed seems slow #30

YourSaDady commented May 11, 2024

DanFu09 commented May 11, 2024

YourSaDady commented May 12, 2024 •

edited

Loading

Embedding speed seems slow #30

Embedding speed seems slow #30

Comments

YourSaDady commented May 11, 2024

DanFu09 commented May 11, 2024

YourSaDady commented May 12, 2024 • edited Loading

YourSaDady commented May 12, 2024 •

edited

Loading