Replies: 3 comments 2 replies
-
I think we are having the same issue. It seems to be the shape of the audio file tensor returned by torchaudio.load() is different from what whisper.transcribe() is expecting. I worked around it but I am not sure if there is a better solution. Here is my code:
tensor squeeze: https://pytorch.org/docs/stable/generated/torch.squeeze.html |
Beta Was this translation helpful? Give feedback.
-
If this is still relevant to anyone, I got this error with my use of the "large" model and got past it by specifying language = "en" in my call to model.transcribe(), i.e. |
Beta Was this translation helpful? Give feedback.
-
Guys, if you are trying to load a waveform Tensor to
I can show you a simple code sample: import whisper
import torchaudio
import torchaudio.transforms as T
model = whisper.load_model("base", device=torch.device("cuda")) # initialize whisper model
waveform_original, sample_rate_original = torchaudio.load("xxx.mp3") # load audio with torchaudio
sample_rate_target = 16000
waveform_16khz = T.Resample(sample_rate_original, sample_rate_target)(waveform_original) # sample rate to 16khz
waveform_16khz_mono = waveform_16khz.mean(dim=0, keep_dim=True) # make mono audio
result = model.transcribe(waveform_16khz_mono.squeeze(0), language="en") # squeeze the first dimension, and set the transcribing language to English By doing the above, Tensor could be used for transcription. However, I am not sure if this could work for different languages since I arbitrarily set the transcribing language to English. |
Beta Was this translation helpful? Give feedback.
-
Hello, I am trying to transcribe audio from a Tensor got using torchaudio library but it is not working. I am using Flask to load the audio given an endpoint. Any solution? Here is the code:
The error displayed is:
decode_options["language"] = max(probs, key=probs.get) AttributeError: 'list' object has no attribute 'get'
in transcribe function.Thanks in advance.
Beta Was this translation helpful? Give feedback.
All reactions