Why is the audio padded with N_SAMPLES
instead of HOP_LENGTH
#2422
MahmoudAshraf97
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello, I noticed that if we pad the audio with
HOP_LENGTH
instead ofN_SAMPLES
the resulting features should be identical since the extra padding does not contribute to the STFT calculationand the feature frames that correspond to the extra padding are cropped anyway here:
whisper/whisper/transcribe.py
Line 140 in 271445b
whisper/whisper/transcribe.py
Lines 281 to 282 in 271445b
I want to hear thoughts about whether this is a valid idea or not, I will open a PR if it is, this will make the feature extraction faster and will use less resources especially for short audios
Beta Was this translation helpful? Give feedback.
All reactions