Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embedding speed seems slow #30

Open
YourSaDady opened this issue May 11, 2024 · 2 comments
Open

Embedding speed seems slow #30

YourSaDady opened this issue May 11, 2024 · 2 comments

Comments

@YourSaDady
Copy link

Hello there.
I tried to use m2_bert_80M_2k to generate embeddings for text strings with lengths around 500 tokens following your example on Huggingface](https://huggingface.co/togethercomputer/m2-bert-80M-2k-retrieval). However, the outputs = model(**input_ids) line tooks over 15s on average, slower than expected. Could you please help me find the issue here?

I also tested your example for 12 tokens. The model forwarding process is still slow (over 5s for 12tokens & padding="longest", over 16s for 12tokens & padding="max_length"(=2048).
12tok_longest
12tok_max_length
Thanks in advance!

@DanFu09
Copy link
Collaborator

DanFu09 commented May 11, 2024

Hm interesting! This is definitely slower than it should be :)

Can you give some more details on your environment? (GPU, version of PyTorch, etc)

If you're using FlashFFTConv, a known issue (that will hopefully go away with a refactor this summer) is that the very first call to FlashFFTConv is very slow. So the way you can get around this is that you call the model once, and then keep it live for your actual workloads.

@YourSaDady
Copy link
Author

YourSaDady commented May 12, 2024

Sure.
PyTorch version: 2.3.0+cu121
GPU device: NVIDIA RTX A6000
I am not using FlashFFTConv.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants