Parallel inference with whisper #2424
Unanswered
SaidiSouhaieb
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello, i am trying to deploy whisper in production server in a websocket. However my problem is that each payload is being transcribed sequentially, i want to make it so whisper utilizes more of my gpu (NVIDIA RTX 4000 Ada Generation) and run inference on multiple requests in parallel, after trying to use multithreading(worked on cpu), and switched to multiple libraries but nothing worked.
Any help would be appreciated
p.s: if it takes multiple gpus can someone guide me though that path
Thank you
Beta Was this translation helpful? Give feedback.
All reactions