Autoplay Latest Message #3930
Replies: 5 comments 2 replies
-
A couple different things here, so I moved to discussion. There are a lot of different variables that go into TTS/STT, and hopefully they will be addressed with a dedicated "phone call" mode when we get to implementing that.
This seems to be an issue with firefox, some browsers, but not chrome
Can't reproduce this on chrome or firefox. See video for above 2 on chrome:
Known issue, @berry-13 can you look into this?
I can't reproduce this, @berry-13, can you also review what's going on here? |
Beta Was this translation helpful? Give feedback.
-
I have tested on Chrome (with all extensions disabled) Brave and Edge on two different machines, and auto-play never works. I've also tested on my local instance of docker using http and hitting 127.0.0.1 doing nothing but what is in the "quick start" on the docs site, and on a separate instance within Portainer which is set up for https, and I'm hitting that via name. It simply does not work. No matter what I try. Just to be thorough, I also just spent $25 on Anthropic on the off chance it worked for Anthropic as shown in your video but not on OpenAI for some reason. Same results. I do see in the logs mentions of global audio being unmuted, and audio must have ended based on the timestamp, etc.. There's nothing else in the log to indicate anything being wrong even though I have debug logging enabled. I do not see the play/stop icon turn to a state where it looks like it's playing. It just goes from loading/spinning to the speaker icon. Pressing play works immediately, so the audio is being loaded no problem.
Of course not: If your auto-play is working, I'd expect that you wouldn't be able to reproduce this. There would be no scenario where audio is loaded but hasn't been played yet (the audio is only automatically requested if you have auto-play enabled) |
Beta Was this translation helpful? Give feedback.
-
Ok, so quick update: I was able to get the auto-play working, and the problem was totally my fault: instead of going to OpenAI directly, I'm going through a proxy for TTS. I fixed the proxy functionality and that's working now. Going direct or through the proxy, auto-play works now. However, autoplay breaks again if I use a custom endpoint for the LLM. But I think that's my issue too due to the endpoint not supporting chunking. I think I'm all set now thank you! |
Beta Was this translation helpful? Give feedback.
-
Sorry, last thing here: I'm realizing that if you're using an LLM endpoint that doesn't support streaming, then auto-play just doesn't work. At first I was thinking that I could work around that, but it turns out that would be much more complicated than I thought, if possible at all. I'm using dropParams: ["stream"] to drop the request to stream requests back from the LLM, but doing so also breaks the TTS auto-play, whether it supports streaming or not (testing going direct to OpenAI, and the dropParams is not defined in the TTS section, it's in the LLM endpoint section only.) If I switch mid-conversation back to OpenAI, auto-play works with no change to TTS config. It would be great if auto-play worked in this admittedly niche scenario. I do think the caching behavior that I mentioned earlier could be improved to prevent wasting tokens also. |
Beta Was this translation helpful? Give feedback.
-
Quick update: Sticking with the direct-to-openai connection for TTS, I'm noticing that with short replies from the LLM, auto-play fails intermittently. I would chalk it up to browser weirdness, and perhaps that is the case, but I am noticing that if I get a long response with audio after the short responses are failing, it always plays. I'm prompting with things like "in three words what is a door" to trigger three word responses, and the audio is retrieved but doesn't play a lot of the time. |
Beta Was this translation helpful? Give feedback.
-
What happened?
TTS does not autoplay "Autoplay Latest Message" is turned on. It does seem to send the xhr/fetch to grab the audio successfully, but it isn't played. I can't find anything in logs indicating why the failure is happening, but I provided the snippet of html for the audio object that is returned with autoplay set to a blank string.
This seems to be the case no matter which TTS I try (Browser, Edge, and External set to OpenAI)
Something else I noticed is that if you have caching disabled, but auto-play enabled, it's going to grab the audio once when the message is returned, and again when you hit play. This is quite slow and wasteful. It might be good to check first if the audio exists and has never been played to see if it needs to be fetched again.
For speech to text, audio never automatically sends when I'm finished talking. It does detect that I'm done and adds a new line, but never sends. Frustratingly, if I delete what it has in queue and start talking again, what I just deleted comes right back.
Steps to Reproduce
enable "Autoplay Latest Message"
What browsers are you seeing the problem on?
No response
Relevant log output
Screenshots
No response
Code of Conduct
Beta Was this translation helpful? Give feedback.
All reactions