Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User Audio Input transcription issue #624

Open
12 tasks
kamjony opened this issue Dec 20, 2024 · 9 comments
Open
12 tasks

User Audio Input transcription issue #624

kamjony opened this issue Dec 20, 2024 · 9 comments
Labels
p:openai_realtime_dart openai_realtime_dart package t:bug Something isn't working

Comments

@kamjony
Copy link

kamjony commented Dec 20, 2024

System Info

Dart

Related Components

  • doc-loaders
  • doc-transformers
  • prompts
  • llms
  • chat-models
  • output-parsers
  • chains
  • memory
  • stores
  • embeddings
  • retrievers
  • agents

Reproduction

await client.connect();
await client.updateSession(
    instructions: gptInstructions,
    modalities: [Modality.audio],
    voice: Voice.shimmer,
    inputAudioFormat: AudioFormat.pcm16,
    outputAudioFormat: AudioFormat.pcm16,
    inputAudioTranscription: InputAudioTranscriptionConfig(
      // enabled: true,
      model: 'whisper-1',
    )
);

Expected behavior

The event "type:"conversation.item.input_audio_transcription.completed", is never fired. But if I Use chatgpt playground, I can see this event has been fired. I need audio transcription from user audio. How to achieve it?

@kamjony kamjony added the t:bug Something isn't working label Dec 20, 2024
@github-project-automation github-project-automation bot moved this to 📋 Backlog in LangChain.dart Dec 20, 2024
@davidmigloz davidmigloz added the p:openai_realtime_dart openai_realtime_dart package label Dec 21, 2024
@davidmigloz
Copy link
Owner

There seems to be an issue with the input_audio_transcription param:

{type: error, event_id: event_AgoCpMJ7LkKOPOCfcNLn9, error: {type: invalid_request_error, code: unknown_parameter, message: Unknown parameter: 'session.input_audio_transcription.enabled'., param: session.input_audio_transcription.enabled, event_id: evt_nAh8N2BDAnkzVHMeH}}

I'll look further into it.

@davidmigloz
Copy link
Owner

davidmigloz commented Dec 21, 2024

I'll remove the enable parameter which is not required anymore. Apart from that, the request looks to be according to the spec.

It seems more people are facing this issue:
https://community.openai.com/t/realtime-api-session-update-doesnt-change-input-audio-format/967077

Most of the issues when not getting a transcription have to do with the input audio that is being passed.
Write it to a file and listen to it, maybe you’ll spot some errors.
Make sure that the samplerate is 24000 Hz as the API requires this.
Make sure that the audio doesn’t sound distorted, cut out, speed up or down or pitched up/down.

Let me know if you manage to solve it.

@kamjony
Copy link
Author

kamjony commented Dec 23, 2024

@davidmigloz I did everything that you suggested above but was unable to get "type:"conversation.item.input_audio_transcription.completed".
So, I just created my own client and consumed the realtime api just to see if it was actually a problem with the package. After several hours of debugging, I found out that, if you send full audio in conversation.create, transcription is not generated by openAI. But, if you append the audio PCM using input_audio_buffer.append and then commit the audio using "type": "input_audio_buffer.commit", then the transcription is generated. I will try with the package client soon and report back.

@davidmigloz
Copy link
Owner

Interesting, thanks for sharing! Let me know the results when you try it with openai_realtime_dart client

@dominikmucklow
Copy link

Has anyone been able to resolve this issue? We are receiving the audio just fine but cannot locate a transcription in any of the event logs.

@kamjony
Copy link
Author

kamjony commented Jan 7, 2025

@davidmigloz @dominikmucklow With this package, I was unable to get transcription as there are no separate function to commit the audio before generating a gpt response. So, I have manually implemented chatgpt realtime following their docs. Now, I get transcription and have a freedom to generate response when I want to.

@davidmigloz
Copy link
Owner

Thanks for the feedback @kamjony! I'll review their latest spec, maybe they added that method after my initial implementation.

@dominikmucklow
Copy link

@kamjony Thanks for the info, and glad to know that it can work. Manual implementation is over my head, so I'll keep following updates to this package. @davidmigloz keep up the good work! Here's a lunch on me.

@davidmigloz
Copy link
Owner

Thanks! I'll try to look into it this weekend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p:openai_realtime_dart openai_realtime_dart package t:bug Something isn't working
Projects
Status: 📋 Backlog
Development

No branches or pull requests

3 participants