Getting duration as 0 on live deepgram transcriptions api #981
-
I am trying to create the live transcriptions for the twilio call with the help of deepgram, the problem is I am not getting error, though I am not getting any transcriptions, it is giving me as my audio duration is 0. Here is the rails code, which I am using: require 'websocket-client-simple'
require 'base64'
require 'wavefile'
module ApplicationCable
class Connection < ActionCable::Connection::Base
identified_by :current_user
def connect
self.current_user = find_verified_user
@ws = WebSocket::Client::Simple.connect(
"wss://api.deepgram.com/v1/listen?" +
"punctuate=true&" +
"channels=2&" +
"sample_rate=8000&" +
"encoding=mulaw&" +
"multichannel=true",
headers: {
"Authorization" => "Token #{ENV['DEEPGRAM_API_KEY']}"
}
)
@inbuffer = ''.b
@outbuffer = ''.b
@inbound_chunks_started = false
@outbound_chunks_started = false
@latest_inbound_timestamp = 0
@latest_outbound_timestamp = 0
puts "Connected to Deepgram #{@ws}"
end
def subscribed
stream_from "twilio_#{params[:call_sid]}"
end
def receive(data)
chunk = JSON.parse(data)
Rails.logger.debug "Received chunk: #{chunk}"
Rails.logger.debug "Event: #{chunk['event']}"
if chunk["event"] == "start"
call_sid = chunk["start"]["callSid"]
Rails.logger.debug call_sid
# ActionCable.server.broadcast("call_#{call_sid}", call_sid: call_sid)
elsif chunk["event"] == "media"
media = chunk["media"]
transcribe_stream(media)
prepare_audio(media)
elsif chunk["event"] == "stop"
ActionCable.server.broadcast("call_#{params[:call_sid]}", event: "stop")
end
end
private
def find_verified_user
verified_user = "test_user"
verified_user
end
def transcribe_stream(base64_audio)
@ws.on :open do
end
@ws.on :message do |msg|
begin
if msg.type == :text
result = JSON.parse(msg.data)
Rails.logger.debug result
if result['is_final']
transcript = result.dig('channel', 'alternatives', 0, 'transcript')
if transcript && !transcript.empty?
puts "Transcript: #{transcript}"
# broadcast to channels from here
end
end
end
rescue JSON::ParserError => e
puts "Error parsing message: #{e.message}"
end
end
@ws.on :error do |e|
Rails.logger.error "WebSocket Error: #{e.message}"
end
@ws.on :close do |e|
Rails.logger.debug "WebSocket closed"
end
end
def prepare_audio(media)
return unless @ws.open?
chunk_size = 20 * 160
puts "Buffer before processing: #{@inbuffer}"
chunk = Base64.decode64(media['payload'])
if media['track'] == 'inbound'
if @inbound_chunks_started && (@latest_inbound_timestamp + 20 < media['timestamp'].to_i)
bytes_to_fill = 8 * (media['timestamp'].to_i - (@latest_inbound_timestamp + 20))
@inbuffer << "\xFF" * bytes_to_fill
else
@inbound_chunks_started = true
end
@latest_inbound_timestamp = media['timestamp'].to_i
@inbuffer << chunk
end
if media['track'] == 'outbound'
if @outbound_chunks_started && (@latest_outbound_timestamp + 20 < media['timestamp'].to_i)
bytes_to_fill = 8 * (media['timestamp'].to_i - (@latest_outbound_timestamp + 20))
@outbuffer << "\xFF" * bytes_to_fill
else
@outbound_chunks_started = true
end
@latest_outbound_timestamp = media['timestamp'].to_i
@outbuffer << chunk
end
puts "Inbound buffer size: #{@inbuffer.bytesize}"
puts "Outbound buffer size: #{@outbuffer.bytesize}"
if @inbuffer.bytesize >= chunk_size || @outbuffer.bytesize >= chunk_size
buffer = @inbuffer.slice!(0, chunk_size)
audio = AudioSegment.new(buffer).to_wav
@ws.send(audio)
else
Rails.logger.debug "Not enough data to send"
end
end
end
end |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 4 replies
-
@jkroll-deepgram please help me here |
Beta Was this translation helpful? Give feedback.
-
@aaditya25052002, an audio duration of 0 with a 101 response code means that you successfully opened a websocket connection to Deepgram, but we did not receive any audio through the connection. You'll need to troubleshoot the streaming audio that you're sending through your websocket connection. What do your application logs indicate as to the amount of bytes you're sending? Can you test by sending to another local source and then inspecting the difference between the audio you intended to stream, and the audio that you actually received? |
Beta Was this translation helpful? Give feedback.
Hi @aaditya25052002, Deepgram requests can pass through a flow of Pending -> Unknown -> Lost (doc link). This occurs when metadata cannot be retrieved about the request, and there is no cost to you. For streaming, typically this occurs when a websocket is improperly initiated in some way.