What is the proper way to convert PyAV audio frames to bytes? #1545

gokula-krishna-dev · 2024-09-24T08:39:21Z

gokula-krishna-dev
Sep 24, 2024

Hi all,

I am trying to convert a PyAV Audio frame from aiortc into a byte string and pass it to Deepgram speech-to-text API via WebSocket streaming.

Here's my pseudo-code, I am using a speech audio file instead of WebRTC. I am trying to send the audio frame data to the Deepgram WebSocket (Please refer to this code).

import av
import numpy as np

container = av.open('harvard.wav')
audio_stream = container.streams.audio[0]

for frame in container.decode(audio=0):
  print(frame.to_ndarray().tobytes())
  deepgram_websocket.send(frame.to_ndarray().tobytes())

container.close()

The data is accepted by Deepgram without any error but it couldn't recognize any speech. I guess that is because all frames are sent as zeros. Here is the sample output for most of the packets.

b'\x01\x00\x00\x00\xff\xff\xff\xff\xff\xff\x00\x00\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x
00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x01\x00...'

Please let me know the proper way to convert the PyAV audio frames into bytes.

Answered by WyattBlue

Sep 24, 2024

That is the correct way to turn PyAV frames into bytes.
Deepgram however, probably wants format information rather than just raw sample data that you're currently giving it. I recommend reviewing "Transcribing a Local File" https://developers.deepgram.com/docs/getting-started-with-pre-recorded-audio

View full answer

WyattBlue · 2024-09-24T09:04:40Z

WyattBlue
Sep 24, 2024
Collaborator

That is the correct way to turn PyAV frames into bytes.
Deepgram however, probably wants format information rather than just raw sample data that you're currently giving it. I recommend reviewing "Transcribing a Local File" https://developers.deepgram.com/docs/getting-started-with-pre-recorded-audio

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the proper way to convert PyAV audio frames to bytes? #1545

{{title}}

Replies: 1 comment

{{title}}

Select a reply

What is the proper way to convert PyAV audio frames to bytes? #1545

gokula-krishna-dev Sep 24, 2024

Replies: 1 comment

WyattBlue Sep 24, 2024 Collaborator

gokula-krishna-dev
Sep 24, 2024

WyattBlue
Sep 24, 2024
Collaborator