Speech To Text processing timeout #1032
-
Seeking Support for Optimizing Long Audio Transcriptions with Whisper Large ModelHello everyone, We are working on a project to transcribe our internally produced video content, which currently amounts to about 40 hours per month. Our ultimate goal is to scale this system for external users, which could substantially increase our transcription volume. At present, we’re using the Whisper Large model, primarily because of its support for Arabic transcription and speaker diarization — both are critical features for our workflow. However, we’re running into some significant challenges due to processing constraints:
What We’re Looking ForWe’re seeking guidance or suggestions on the following:
Any insights, workarounds, or suggestions would be greatly appreciated. Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 2 replies
-
Thanks for asking your question. Please be sure to reply with as much detail as possible so the community can assist you efficiently. |
Beta Was this translation helpful? Give feedback.
-
Hey there! It looks like you haven't connected your GitHub account to your Deepgram account. You can do this at https://community.deepgram.com - being verified through this process will allow our team to help you in a much more streamlined fashion. |
Beta Was this translation helpful? Give feedback.
-
It looks like we're missing some important information to help debug your issue. Would you mind providing us with the following details in a reply?
|
Beta Was this translation helpful? Give feedback.
-
Hi @Hadymohammed, while we offer a managed Whisper service for broader language coverage, unfortunately the Whisper models are significantly less efficient than Deepgram's own. However, our models don't support Arabic, so you'll need to use Whisper. I would recommend submitting requests at off-peak hours, when we have lower traffic load and can better serve Whisper requests. That would be US nights and weekends. How long of audio are you sending? I would recommend sending shorter audio to Whisper (such as under 60 minutes, or under 30 minutes if possible). I wouldn't recommend sending audio over an hour or so in length, as it will be more likely to time out. Ultimately, there is going to be a limit to which you can scale using Deepgram's managed Whisper. Processing times are slow, and we have a rate limit of 5 concurrent Whisper requests. |
Beta Was this translation helpful? Give feedback.
Hi @Hadymohammed, unfortunately Deepgram doesn't currently have a roadmap for supporting right-to-left languages, including Arabic.
Is the 40 hours per month transcribed in a single time-bound batch, or can you develop a strategy for doing more, smaller batches? Given 40 hours of audio, in files of 2-4 hours duration, that should amount to 10-20 total API requests. With Deepgram's rate limit of 5 concurrent Whisper requests, that could be done in about 2-4 batches of 5 requests. For instance, if the 40 hours was in weekly batches, then you could likely even do all requests in a single weekly batch.
I'll add that your initial estimate of 1 second of audio processed in 15 seconds sounds ver…