You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am encountering a severe issue with the Conversational AI module and in particular the Twilio integration.
The reference codebase is this one, which I adapted following this example for outbound calling.
It seems that ElevenLabs has issues in Speech-To-Text capabilities. This happens at least for the 10-20% of the calls.
As per screenshot below, the user audio is not correctly interpreted, while the recording has no glitches / packet loss (user audio is just fine). The user transcript is filled with ... or !! characters instead of the real words (e.g. "Hello? Can you hear me?").
Notice that the unrecognized audio chunks are transformed into ... or !! with incremental length.
After a while the text seem to be recognized again (like a buffering processing that has been resumed):
Logs:
conversation starts at 08:04:27.409 and ends at 08:05:50.535 upon crash (see log line with error code 1011 - internal error) from the agent that closes the websocket.
2025-02-26 08:03:45.149 | DEBUG | main:<module>:97 - Ingress established at https://[REDACTED].ngrok-free.app
2025-02-26 08:03:48.230 | DEBUG | main:select_voice:257 - Searching for voice of [REDACTED]...
2025-02-26 08:03:50.291 | DEBUG | main:select_voice:263 - Voice of [REDACTED] found, id [REDACTED]
2025-02-26 08:03:50.592 | DEBUG | main:select_agent:291 - Searching for agent [REDACTED]...
2025-02-26 08:03:50.628 | DEBUG | main:select_agent:294 - Agent [REDACTED] found, id [REDACTED]
2025-02-26 08:03:50.628 | DEBUG | main:select_agent:296 - Ensuring agent config is up to date...
2025-02-26 08:03:59.912 | DEBUG | main:is_agent_unsafe:395 - Checking if agent is safe...
2025-02-26 08:04:01.283 | DEBUG | main:create_conversation_override:406 - Creating agent configuration override...
INFO: Started server process [8]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
2025-02-26 08:04:02.208 | DEBUG | main:start_server:775 - Server is ready
2025-02-26 08:04:12.210 | DEBUG | main:start_server:780 - Starting call...
2025-02-26 08:04:12.210 | DEBUG | main:make_call:163 - from_number = [REDACTED]
2025-02-26 08:04:12.210 | DEBUG | main:make_call:164 - to_number = [REDACTED]
INFO: [REDACTED] - "POST /incoming-call-eleven HTTP/1.1" 200 OK
INFO: [REDACTED] - "WebSocket /media-stream-eleven" [accepted]
2025-02-26 08:04:27.388 | DEBUG | main:handle_media_stream:188 - WebSocket connection established
2025-02-26 08:04:27.409 | DEBUG | main:handle_media_stream:221 - Conversation session started
INFO: connection open
Error receiving message: received 1011 (internal error); then sent 1011 (internal error)
Error sending user audio chunk: received 1011 (internal error); then sent 1011 (internal error)
2025-02-26 08:05:50.535 | DEBUG | main:handle_media_stream:230 - Hanging up the call...
2025-02-26 08:05:52.805 | DEBUG | main:handle_media_stream:247 - Ending conversation session...
INFO: connection closed
2025-02-26 08:06:02.805 | DEBUG | main:download_transcript:515 - Downloading conversation transcript...
2025-02-26 08:06:15.037 | DEBUG | main:download_transcript:679 - Call finished. Transcript downloaded. Closing the server.
2025-02-26 08:06:15.037 | DEBUG | main:shutdown_server:786 - Shutting down the server.
INFO: Shutting down
INFO: Waiting for application shutdown.
INFO: Application shutdown complete.
INFO: Finished server process [8]
Code example
No response
Additional context
No response
The text was updated successfully, but these errors were encountered:
Description
Hello,
I am encountering a severe issue with the Conversational AI module and in particular the Twilio integration.
The reference codebase is this one, which I adapted following this example for outbound calling.
It seems that ElevenLabs has issues in Speech-To-Text capabilities. This happens at least for the 10-20% of the calls.
As per screenshot below, the user audio is not correctly interpreted, while the recording has no glitches / packet loss (user audio is just fine). The user transcript is filled with
...
or!!
characters instead of the real words (e.g. "Hello? Can you hear me?").Notice that the unrecognized audio chunks are transformed into
...
or!!
with incremental length.After a while the text seem to be recognized again (like a buffering processing that has been resumed):
Logs:
conversation starts at 08:04:27.409 and ends at 08:05:50.535 upon crash (see log line with error code 1011 - internal error) from the agent that closes the websocket.
Code example
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: