Transcribe from a Tensor is not working. #1291

ruliworst · 2023-04-27T18:46:16Z

ruliworst
Apr 27, 2023

Hello, I am trying to transcribe audio from a Tensor got using torchaudio library but it is not working. I am using Flask to load the audio given an endpoint. Any solution? Here is the code:

MODEL = whisper.load_model('base')
@app.route('/uploader', methods=['POST'])
def upload_audio():
    audio_file = request.files['audio']
    audio_file = io.BytesIO(audio_file.read())

    waveform, sr = torchaudio.load(audio_file)

    result = MODEL.transcribe(waveform)

    # print the recognized text
    return result["text"]

The error displayed is:
decode_options["language"] = max(probs, key=probs.get) AttributeError: 'list' object has no attribute 'get' in transcribe function.

Thanks in advance.

mitchsayre · 2023-04-27T22:42:33Z

mitchsayre
Apr 27, 2023

I think we are having the same issue. It seems to be the shape of the audio file tensor returned by torchaudio.load() is different from what whisper.transcribe() is expecting. I worked around it but I am not sure if there is a better solution. Here is my code:

file = open(audio_path, 'rb')
waveform, sample_rate = torchaudio.load(file)
waveform = waveform.squeeze()
result = model.transcribe(waveform)
print(result["text"])

tensor squeeze: https://pytorch.org/docs/stable/generated/torch.squeeze.html

1 reply

ruliworst Apr 30, 2023
Author

Hi, first of all thanks for your response.

I tried that solution but when transcribe method is run it gives a kind of array as text result:
3, 2, 1. 3, 4, 1. 3, 4, 1. 4, 4, 4, 5. 4, 5, 5. 4, 5, 5. 4, 5, 5. 4, 5, 6. 4, 5, 6. 4, 5. 4, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5.
So, it is not giving a proper response because it does not transcribe the audio to text.
Anyway, thanks for your response, again. I am still trying to find a solution.

RealHandy · 2024-09-10T20:47:57Z

RealHandy
Sep 10, 2024

If this is still relevant to anyone, I got this error with my use of the "large" model and got past it by specifying language = "en" in my call to model.transcribe(), i.e.
model.transcribe(audio = waveform, verbose = True, language = "en")

0 replies

KALEIDOSCOPEIP · 2025-01-17T06:00:40Z

KALEIDOSCOPEIP
Jan 17, 2025

Guys, if you are trying to load a waveform Tensor to whisper.transcribe via torchaudio.load, you might need to do the following:

Transform the sampling rate of your waveform to 16kHz.
Convert to mono audio.

I can show you a simple code sample:

import whisper
import torchaudio
import torchaudio.transforms as T

model = whisper.load_model("base", device=torch.device("cuda"))  # initialize whisper model
waveform_original, sample_rate_original = torchaudio.load("xxx.mp3")  # load audio with torchaudio

sample_rate_target = 16000
waveform_16khz = T.Resample(sample_rate_original, sample_rate_target)(waveform_original)  # sample rate to 16khz
waveform_16khz_mono = waveform_16khz.mean(dim=0, keep_dim=True)  # make mono audio

result = model.transcribe(waveform_16khz_mono.squeeze(0), language="en")  # squeeze the first dimension, and set the transcribing language to English

By doing the above, Tensor could be used for transcription. However, I am not sure if this could work for different languages since I arbitrarily set the transcribing language to English.

1 reply

KALEIDOSCOPEIP Jan 17, 2025

@RealHandy @ruliworst @mitchsayre You guys can give it a try to see if it works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Transcribe from a Tensor is not working. #1291

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Transcribe from a Tensor is not working. #1291

ruliworst Apr 27, 2023

Replies: 3 comments · 2 replies

mitchsayre Apr 27, 2023

ruliworst Apr 30, 2023 Author

RealHandy Sep 10, 2024

KALEIDOSCOPEIP Jan 17, 2025

KALEIDOSCOPEIP Jan 17, 2025

ruliworst
Apr 27, 2023

Replies: 3 comments 2 replies

mitchsayre
Apr 27, 2023

ruliworst Apr 30, 2023
Author

RealHandy
Sep 10, 2024

KALEIDOSCOPEIP
Jan 17, 2025