Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom WebSocket Audio Interface Input Callback Not Working #495

Open
Al-aminI opened this issue Mar 4, 2025 · 2 comments
Open

Custom WebSocket Audio Interface Input Callback Not Working #495

Al-aminI opened this issue Mar 4, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@Al-aminI
Copy link

Al-aminI commented Mar 4, 2025

Description

Description:
I am attempting to create a custom WebSocket-based AudioInterface for the ElevenLabs Conversational API. The goal is to process audio chunks and pass them to the input_callback function. However, when sending audio chunks to the process_audio_chunk method, they do not seem to be forwarded to the input_callback.

Despite buffering and finalizing the audio input, the input_callback does not receive any data, and the expected behavior of transmitting the full audio chunk does not occur. There are no error messages, but logging suggests that the audio is being buffered correctly.

Steps to Reproduce:
Implement the WebSocketAudioInterface as shown in the code snippet below.
Start the interface with start(input_callback), passing a valid callback function.
Send audio chunks via process_audio_chunk(audio_chunk).
Call finalize_audio_input() to process and send the buffered audio.
Observe that input_callback is not triggered with the expected audio data.
Expected Behavior:
The input_callback should receive the buffered audio when finalize_audio_input() is called.
Actual Behavior:
The input_callback never receives the audio data, even when finalize_audio_input() is explicitly called.

Code example

class WebSocketAudioInterface(AudioInterface):
def init(self, sid):
self.sid = sid
self.input_callback = None
self.audio_buffer = bytearray()
self.output_queue = queue.Queue()
self.should_stop = threading.Event()
self.output_thread = None
print(f"Created WebSocketAudioInterface for session {sid}")

def start(self, input_callback):
    self.input_callback = input_callback
    self.audio_buffer.clear()
    self.output_thread = threading.Thread(target=self._output_thread, daemon=True)
    self.output_thread.start()
    socketio.emit('interface_ready', {'status': 'ready'}, room=self.sid)

def process_audio_chunk(self, audio_chunk: bytes):
    if not self.input_callback:
        print(f"[WARNING] No input_callback registered for {self.sid}")
        return

    if not isinstance(audio_chunk, (bytes, bytearray)):
        print(f"[ERROR] Invalid audio chunk type: {type(audio_chunk)}")
        return

    print(f"[INPUT] Processing {len(audio_chunk)} bytes of audio from {self.sid}")
    self.audio_buffer.extend(audio_chunk)

def finalize_audio_input(self):
    if not self.input_callback:
        print(f"[WARNING] No input_callback registered for {self.sid}")
        return

    if not self.audio_buffer:
        print(f"[INFO] No audio to process for {self.sid}")
        return

    print(f"[INPUT] Finalizing audio input of {len(self.audio_buffer)} bytes for {self.sid}")
    self.input_callback(bytes(self.audio_buffer))  # This does not seem to trigger
    self.audio_buffer.clear()

Additional context

Additional Context:
Related Issues: None found in the repository.
Possible Workaround: Manually calling input_callback externally works, but this defeats the purpose of using the built-in process_audio_chunk method.
Logs:
pgsql
Copy
Edit

[INPUT] Processing 512 bytes of audio from session-12345
[INPUT] Buffered 512 bytes from session-12345 (total buffered: 1024)
[INPUT] Finalizing audio input of 1024 bytes for session-12345
[WARNING] No input_callback registered for session-12345  # Unexpected

Any guidance on resolving this issue would be appreciated. Thank you!

@Al-aminI Al-aminI added the bug Something isn't working label Mar 4, 2025
@AngeloGiacco
Copy link
Collaborator

Hi @Al-aminI , thanks for adding the issue. taking a look into this now. out of curiosity could you give me a rough idea of what you're trying to do when you process the audio?

@Al-aminI
Copy link
Author

Al-aminI commented Mar 5, 2025

Hi @AngeloGiacco thanks for the reply, really appreciate your prompt response, I was trying to send in input audio chunks(via the input callback), from client as bytes, and then retrieve the output audio chunks via the output callback, however, I was able to retrieve audio output from the output callback, but unable to send in audio bytes through the input callback.
I was trying to implement a web socket audio interface, just like you have the default audio interface that use system audio through py audio, I was trying to replicate the same but with web socket so that I can integrate to client/frontend.
It is something you could also implement just like you have the default audio interface, since most of use cases will require a socket to connect to the frontend and the conversationalAI API.
Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants