ideal video length that can be transcribed by whisper? #136

deadcoder0904 · 2022-09-26T10:19:29Z

deadcoder0904
Sep 26, 2022

i've read it processes videos in 30-second chunks.

my question is how long can the video be transcribed by whisper? 1-hour long, 8-hour long? should a video be used or audio for faster transcription?

i have tried https://huggingface.co/spaces/sensahin/YouWhisper on a 15 mins video & it accurately transcribed it into ~4117 words.

is there an upper limit? does the program youwhisper cut it into chunks 30-second chunks? i haven't used python in like 5 years so not totally sure about the code.

Answered by jongwook

Sep 26, 2022

The model natively supports 30-second inputs, and for audio inputs longer than that, we're using a set of (hacky) heuristics to perform transcription on sliding windows. The details are described in Section 4.5 of the paper and implemented in transcribe.py.

The script takes a number of manually tuned hyperparameters which works well on easy inputs but may need more adjustments for audio that is more difficult to transcribe or in a language that Whisper doesn't perform very well.

View full answer

jongwook · 2022-09-26T10:33:26Z

jongwook
Sep 26, 2022
Maintainer

The model natively supports 30-second inputs, and for audio inputs longer than that, we're using a set of (hacky) heuristics to perform transcription on sliding windows. The details are described in Section 4.5 of the paper and implemented in transcribe.py.

The script takes a number of manually tuned hyperparameters which works well on easy inputs but may need more adjustments for audio that is more difficult to transcribe or in a language that Whisper doesn't perform very well.

1 reply

FurkanGozukara Sep 26, 2022

I am waiting my graphic card to test it on Turkish. Do you have any plans to add translation from english to other languages?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ideal video length that can be transcribed by whisper? #136

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

ideal video length that can be transcribed by whisper? #136

deadcoder0904 Sep 26, 2022

Replies: 1 comment · 1 reply

jongwook Sep 26, 2022 Maintainer

FurkanGozukara Sep 26, 2022

deadcoder0904
Sep 26, 2022

Replies: 1 comment 1 reply

jongwook
Sep 26, 2022
Maintainer