ideal video length that can be transcribed by whisper? #136
-
i've read it processes videos in 30-second chunks. my question is how long can the video be transcribed by whisper? 1-hour long, 8-hour long? should a video be used or audio for faster transcription? i have tried https://huggingface.co/spaces/sensahin/YouWhisper on a 15 mins video & it accurately transcribed it into ~4117 words. is there an upper limit? does the program |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
The model natively supports 30-second inputs, and for audio inputs longer than that, we're using a set of (hacky) heuristics to perform transcription on sliding windows. The details are described in Section 4.5 of the paper and implemented in The script takes a number of manually tuned hyperparameters which works well on easy inputs but may need more adjustments for audio that is more difficult to transcribe or in a language that Whisper doesn't perform very well. |
Beta Was this translation helpful? Give feedback.
The model natively supports 30-second inputs, and for audio inputs longer than that, we're using a set of (hacky) heuristics to perform transcription on sliding windows. The details are described in Section 4.5 of the paper and implemented in
transcribe.py
.The script takes a number of manually tuned hyperparameters which works well on easy inputs but may need more adjustments for audio that is more difficult to transcribe or in a language that Whisper doesn't perform very well.