-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Adding Input Audio Streaming Provider (S2T + VAD) #35
Comments
May I ask what problem you're trying to solve? |
|
How is that different from what happens now? |
If I'm correct, the current implementation starts audio recording when clicking on the record button and the audio is available as a binary file as an input attachement when clicking on "Send" button. If I'm using another custom LLMProvider and that LLM does not take audio as an input, I need to "SpeechToText" the recording audio before sending the Text (from Audio) to the LLM. |
The behavior is as you are hoping for - the audio is translated and turned into text for the user to edit. It's not provided as an audio file when the user presses the Submit button. Give it a try. See what you think . |
Is it sendMessageStream being called to translate the audio or generateStream? It should be the latter. |
generateStream. Did you confirm I need to use my own audio translation library/code in the generateStream method if I'M NOT USING the google_generative_ai package (this package is doing the audio translation ?) ? If I'm correct the Text2Audio is managed from the ChatInput Widget and the translation starts when the Stop button is clicked.
|
The way the chat works is that it uses the provider's generateStream implementation to translate audio. So far I haven't noticed a large latency that requires streaming the audio. |
Tx a lot for the details. |
You can do that today with your own custom provider that forwards to your model of choice. |
Is it possible to add a SpeechToText AudioStreaming provider as an input of the "Audio recording" button ?
It would be fine to plug different S2T audio provider and a Voice Activity Detection (VAD)
The text was updated successfully, but these errors were encountered: