Parallel inference with whisper #2424

SaidiSouhaieb · 2024-11-07T08:13:11Z

SaidiSouhaieb
Nov 7, 2024

Hello, i am trying to deploy whisper in production server in a websocket. However my problem is that each payload is being transcribed sequentially, i want to make it so whisper utilizes more of my gpu (NVIDIA RTX 4000 Ada Generation) and run inference on multiple requests in parallel, after trying to use multithreading(worked on cpu), and switched to multiple libraries but nothing worked.

Any help would be appreciated

p.s: if it takes multiple gpus can someone guide me though that path
Thank you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel inference with whisper #2424

{{title}}

Replies: 0 comments

Select a reply

Parallel inference with whisper #2424

SaidiSouhaieb Nov 7, 2024

Replies: 0 comments

SaidiSouhaieb
Nov 7, 2024