Getting duration as 0 on live deepgram transcriptions api #981

aaditya25052002 · 2024-10-30T12:21:13Z

aaditya25052002
Oct 30, 2024

I am trying to create the live transcriptions for the twilio call with the help of deepgram, the problem is I am not getting error, though I am not getting any transcriptions, it is giving me as my audio duration is 0.
I see a lot of folks has the same problem, but everyone of them has a different solution, though none of the solution is working for me>

Here is the rails code, which I am using:

require 'websocket-client-simple'
require 'base64'
require 'wavefile'

module ApplicationCable
  class Connection < ActionCable::Connection::Base
    identified_by :current_user

    def connect
      self.current_user = find_verified_user
      @ws = WebSocket::Client::Simple.connect(
        "wss://api.deepgram.com/v1/listen?" +
          "punctuate=true&" +
          "channels=2&" +
          "sample_rate=8000&" +
          "encoding=mulaw&" +
          "multichannel=true",
        headers: {
          "Authorization" => "Token #{ENV['DEEPGRAM_API_KEY']}"
        }
      )
      @inbuffer = ''.b
      @outbuffer = ''.b
      @inbound_chunks_started = false
      @outbound_chunks_started = false
      @latest_inbound_timestamp = 0
      @latest_outbound_timestamp = 0
      puts "Connected to Deepgram #{@ws}"
    end

    def subscribed
      stream_from "twilio_#{params[:call_sid]}"
    end

    def receive(data)
      chunk = JSON.parse(data)
      Rails.logger.debug "Received chunk: #{chunk}"
      Rails.logger.debug "Event: #{chunk['event']}"
      if chunk["event"] == "start"
        call_sid = chunk["start"]["callSid"]
        Rails.logger.debug call_sid
        # ActionCable.server.broadcast("call_#{call_sid}", call_sid: call_sid)
      elsif chunk["event"] == "media"
        media = chunk["media"]
        transcribe_stream(media)
        prepare_audio(media)
      elsif chunk["event"] == "stop"
        ActionCable.server.broadcast("call_#{params[:call_sid]}", event: "stop")
      end
    end

    private
      def find_verified_user
        verified_user = "test_user"
        verified_user
      end

      def transcribe_stream(base64_audio)
        @ws.on :open do
        end

        @ws.on :message do |msg|
          begin
            if msg.type == :text
              result = JSON.parse(msg.data)
              Rails.logger.debug result

              if result['is_final']
                transcript = result.dig('channel', 'alternatives', 0, 'transcript')
                if transcript && !transcript.empty?
                  puts "Transcript: #{transcript}"
                  # broadcast to channels from here
                end
              end
            end
          rescue JSON::ParserError => e
            puts "Error parsing message: #{e.message}"
          end
        end

        @ws.on :error do |e|
          Rails.logger.error "WebSocket Error: #{e.message}"
        end

        @ws.on :close do |e|
          Rails.logger.debug "WebSocket closed"
        end
      end

    def prepare_audio(media)
      return unless @ws.open?
      chunk_size = 20 * 160
      puts "Buffer before processing: #{@inbuffer}"
  
      chunk = Base64.decode64(media['payload'])
  
      if media['track'] == 'inbound'
        if @inbound_chunks_started && (@latest_inbound_timestamp + 20 < media['timestamp'].to_i)
          bytes_to_fill = 8 * (media['timestamp'].to_i - (@latest_inbound_timestamp + 20))
          @inbuffer << "\xFF" * bytes_to_fill
        else
          @inbound_chunks_started = true
        end
        @latest_inbound_timestamp = media['timestamp'].to_i
        @inbuffer << chunk
      end
  
      if media['track'] == 'outbound'
        if @outbound_chunks_started && (@latest_outbound_timestamp + 20 < media['timestamp'].to_i)
          bytes_to_fill = 8 * (media['timestamp'].to_i - (@latest_outbound_timestamp + 20))
          @outbuffer << "\xFF" * bytes_to_fill
        else
          @outbound_chunks_started = true
        end
        @latest_outbound_timestamp = media['timestamp'].to_i
        @outbuffer << chunk
      end
  
      puts "Inbound buffer size: #{@inbuffer.bytesize}"
      puts "Outbound buffer size: #{@outbuffer.bytesize}"
  
      if @inbuffer.bytesize >= chunk_size || @outbuffer.bytesize >= chunk_size
        buffer = @inbuffer.slice!(0, chunk_size)
        audio = AudioSegment.new(buffer).to_wav
        @ws.send(audio)
      else
        Rails.logger.debug "Not enough data to send"
      end
    end
  end
end

Answered by jkroll-deepgram

Nov 1, 2024

Hi @aaditya25052002, Deepgram requests can pass through a flow of Pending -> Unknown -> Lost (doc link). This occurs when metadata cannot be retrieved about the request, and there is no cost to you. For streaming, typically this occurs when a websocket is improperly initiated in some way.

View full answer

aaditya25052002 · 2024-10-30T12:21:48Z

aaditya25052002
Oct 30, 2024
Author

0 replies

aaditya25052002 · 2024-10-30T17:01:14Z

aaditya25052002
Oct 30, 2024
Author

@jkroll-deepgram please help me here

0 replies

jkroll-deepgram · 2024-10-30T18:36:03Z

jkroll-deepgram
Oct 30, 2024
Collaborator

@aaditya25052002, an audio duration of 0 with a 101 response code means that you successfully opened a websocket connection to Deepgram, but we did not receive any audio through the connection. You'll need to troubleshoot the streaming audio that you're sending through your websocket connection. What do your application logs indicate as to the amount of bytes you're sending? Can you test by sending to another local source and then inspecting the difference between the audio you intended to stream, and the audio that you actually received?

4 replies

aaditya25052002 Oct 31, 2024
Author

@jkroll-deepgram, I see, but other times I feel it goes to status pending, what could that mean?

aaditya25052002 Oct 31, 2024
Author

after a while it went to status unknown

jkroll-deepgram Nov 1, 2024
Collaborator

Hi @aaditya25052002, Deepgram requests can pass through a flow of Pending -> Unknown -> Lost (doc link). This occurs when metadata cannot be retrieved about the request, and there is no cost to you. For streaming, typically this occurs when a websocket is improperly initiated in some way.

Answer selected by deepgram-community

aaditya25052002 Nov 1, 2024
Author

Hi @jkroll-deepgram, I have been banging my head from last 4 days around deegram tutorials to discussion pages, but I am not able to find the solution to my problem, no matter what I do I keep getting that 101 OK response
Is there any way I could get a support if whats the matter around this.

Here is the latest code I am using, modified via tons of refactoring after going through tutorials from different languages:

require 'websocket-client-simple'
require 'base64'
require 'thread'

module ApplicationCable
  class Connection < ActionCable::Connection::Base
    identified_by :current_user

    def connect
      self.current_user = find_verified_user
      @inbuffer = ''.b
      @outbuffer = ''.b
      @inbound_chunks_started = false
      @outbound_chunks_started = false
      @latest_inbound_timestamp = 0
      @latest_outbound_timestamp = 0

      @audio_queue = Queue.new
      @worker_thread = start_audio_worker

      setup_websocket_connection
      start_keepalive_timer
      puts "Connected to Deepgram"
    end

    def disconnect
      @worker_thread&.kill
      Rails.logger.debug "Worker thread stopped on disconnect."
    end

    def subscribed
      stream_from "twilio_#{params[:call_sid]}"
    end

    def receive(data)
      chunk = JSON.parse(data)
      case chunk["event"]
      when "start"
        call_sid = chunk["start"]["callSid"]
        Rails.logger.debug call_sid
      when "media"
        media = chunk["media"]
        enqueue_audio(media)
      when "stop"
        ActionCable.server.broadcast("call_#{params[:call_sid]}", event: "stop")
      end
    end

    def setup_websocket_connection
      @ws = WebSocket::Client::Simple.connect(
        "wss://api.deepgram.com/v1/listen?" +
          "punctuate=true&channels=2&sample_rate=8000&encoding=mulaw&model=nova-2&multichannel=true",
        headers: { "Authorization" => "Token #{ENV['DEEPGRAM_API_KEY']}" }
      )

      @ws.on :open do
        Rails.logger.debug "WebSocket connection opened."
      end

      @ws.on :message do |msg|
        handle_websocket_message(msg)
      end

      @ws.on :error do |e|
        Rails.logger.error "WebSocket Error: #{e.message}"
        reconnect_websocket
      end

      @ws.on :close do |e|
        Rails.logger.debug "WebSocket closed, attempting reconnection."
        reconnect_websocket
      end
    end

    def reconnect_websocket
      sleep 1
      setup_websocket_connection
    end

    def start_keepalive_timer
      Thread.new do
        loop do
          sleep 5
          @ws&.send({ type: "KeepAlive" }.to_json) if @ws&.open?
          Rails.logger.debug "KeepAlive message sent."
        end
      end
    end

    private

      def find_verified_user
        "test_user"
      end

      def enqueue_audio(media)
        @audio_queue.push(media)
        Rails.logger.debug "Audio chunk enqueued"
      end

      def start_audio_worker
        Thread.new do
          loop do
            media = @audio_queue.pop
            prepare_audio(media) if media
          end
        end
      end

      def prepare_audio(media)
        return unless @ws.open?
        
        chunk_size = 4 * 160
        chunk = Base64.decode64(media['payload']).force_encoding('ASCII-8BIT')
      
        if media['track'] == 'inbound'
          handle_audio_track(:inbound, media['timestamp'].to_i, chunk)
        elsif media['track'] == 'outbound'
          handle_audio_track(:outbound, media['timestamp'].to_i, chunk)
        end
      
        send_combined_audio(chunk_size) if @inbuffer.bytesize >= chunk_size || @outbuffer.bytesize >= chunk_size
      end
      
      def handle_audio_track(type, timestamp, chunk)
        buffer = type == :inbound ? @inbuffer : @outbuffer
        latest_timestamp = type == :inbound ? @latest_inbound_timestamp : @latest_outbound_timestamp
        started = type == :inbound ? @inbound_chunks_started : @outbound_chunks_started
      
        if started && (latest_timestamp + 20 < timestamp)
          bytes_to_fill = 8 * (timestamp - (latest_timestamp + 20))
          buffer << ("\xFF" * bytes_to_fill).force_encoding('ASCII-8BIT')
        else
          type == :inbound ? @inbound_chunks_started = true : @outbound_chunks_started = true
        end
      
        latest_timestamp = timestamp
        buffer << chunk
      end
      

      def send_combined_audio(chunk_size)
        buffer = @inbuffer.slice!(0, chunk_size) + @outbuffer.slice!(0, chunk_size)
        @ws.send(buffer, type: :binary)
        Rails.logger.debug "Audio buffer sent"
      rescue => e
        Rails.logger.error "Error sending audio buffer: #{e.message}"
      end

      def handle_websocket_message(msg)
        return unless msg.type == :text

        begin
          result = JSON.parse(msg.data)
          broadcast_transcription(result) if result['is_final']
        rescue JSON::ParserError => e
          Rails.logger.error "Error parsing message: #{e.message}"
        end
      end

      def broadcast_transcription(result)
        transcript = result.dig('channel', 'alternatives', 0, 'transcript')
        if transcript && !transcript.empty?
          Rails.logger.debug "Transcript: #{transcript}"
          # ActionCable.server.broadcast("transcriptions", { transcript: transcript })
        end
      end
  end
end

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepgram

Getting duration as 0 on live deepgram transcriptions api #981

{{title}}

Replies: 3 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Deepgram

Getting duration as 0 on live deepgram transcriptions api #981

aaditya25052002 Oct 30, 2024

Replies: 3 comments · 4 replies

aaditya25052002 Oct 30, 2024 Author

aaditya25052002 Oct 30, 2024 Author

jkroll-deepgram Oct 30, 2024 Collaborator

aaditya25052002 Oct 31, 2024 Author

aaditya25052002 Oct 31, 2024 Author

jkroll-deepgram Nov 1, 2024 Collaborator

aaditya25052002 Nov 1, 2024 Author

aaditya25052002
Oct 30, 2024

Replies: 3 comments 4 replies

aaditya25052002
Oct 30, 2024
Author

aaditya25052002
Oct 30, 2024
Author

jkroll-deepgram
Oct 30, 2024
Collaborator

aaditya25052002 Oct 31, 2024
Author

aaditya25052002 Oct 31, 2024
Author

jkroll-deepgram Nov 1, 2024
Collaborator

aaditya25052002 Nov 1, 2024
Author