Skip to content

ideal video length that can be transcribed by whisper? #136

Answered by jongwook
deadcoder0904 asked this question in Q&A
Discussion options

You must be logged in to vote

The model natively supports 30-second inputs, and for audio inputs longer than that, we're using a set of (hacky) heuristics to perform transcription on sliding windows. The details are described in Section 4.5 of the paper and implemented in transcribe.py.

The script takes a number of manually tuned hyperparameters which works well on easy inputs but may need more adjustments for audio that is more difficult to transcribe or in a language that Whisper doesn't perform very well.

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@FurkanGozukara
Comment options

Answer selected by jongwook
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants