Autoplay Latest Message #3930

ThreepE0 · 2024-09-05T14:22:59Z

ThreepE0
Sep 5, 2024

What happened?

TTS does not autoplay "Autoplay Latest Message" is turned on. It does seem to send the xhr/fetch to grab the audio successfully, but it isn't played. I can't find anything in logs indicating why the failure is happening, but I provided the snippet of html for the audio object that is returned with autoplay set to a blank string.

This seems to be the case no matter which TTS I try (Browser, Edge, and External set to OpenAI)

Something else I noticed is that if you have caching disabled, but auto-play enabled, it's going to grab the audio once when the message is returned, and again when you hit play. This is quite slow and wasteful. It might be good to check first if the audio exists and has never been played to see if it needs to be fetched again.

For speech to text, audio never automatically sends when I'm finished talking. It does detect that I'm done and adds a new line, but never sends. Frustratingly, if I delete what it has in queue and start talking again, what I just deleted comes right back.

Steps to Reproduce

enable "Autoplay Latest Message"

What browsers are you seeing the problem on?

No response

Relevant log output

<audio controls="" preload="none" controlslist="nodownload nofullscreen noremoteplayback" id="audio-6c92af58-4f75-469a-b6f4-3724db79e4af" autoplay="" src="blob:https://chat.domain.com/ccd32b85-d7e5-409f-8fd1-ba178756e1c9" style="position: absolute; overflow: hidden; display: none; height: 0px; width: 0px;"></audio>

Screenshots

No response

Code of Conduct

I agree to follow this project's Code of Conduct

danny-avila · 2024-09-05T16:06:17Z

danny-avila
Sep 5, 2024
Maintainer

A couple different things here, so I moved to discussion. There are a lot of different variables that go into TTS/STT, and hopefully they will be addressed with a dedicated "phone call" mode when we get to implementing that.

TTS does not autoplay "Autoplay Latest Message" is turned on. It does seem to send the xhr/fetch to grab the audio successfully, but it isn't played.

This seems to be an issue with firefox, some browsers, but not chrome

Something else I noticed is that if you have caching disabled, but auto-play enabled, it's going to grab the audio once when the message is returned, and again when you hit play. This is quite slow and wasteful. It might be good to check first if the audio exists and has never been played to see if it needs to be fetched again.

Can't reproduce this on chrome or firefox.

See video for above 2 on chrome:
https://github.com/user-attachments/assets/3f8d7a87-7bb3-41ae-8413-2d93426f4651

For speech to text, audio never automatically sends when I'm finished talking. It does detect that I'm done and adds a new line, but never sends.

Known issue, @berry-13 can you look into this?

Frustratingly, if I delete what it has in queue and start talking again, what I just deleted comes right back.

I can't reproduce this, @berry-13, can you also review what's going on here?

0 replies

ThreepE0 · 2024-09-05T16:50:37Z

ThreepE0
Sep 5, 2024
Author

This seems to be an issue with firefox, some browsers, but not chrome

I have tested on Chrome (with all extensions disabled) Brave and Edge on two different machines, and auto-play never works. I've also tested on my local instance of docker using http and hitting 127.0.0.1 doing nothing but what is in the "quick start" on the docs site, and on a separate instance within Portainer which is set up for https, and I'm hitting that via name. It simply does not work. No matter what I try.

Just to be thorough, I also just spent $25 on Anthropic on the off chance it worked for Anthropic as shown in your video but not on OpenAI for some reason. Same results.

I do see in the logs mentions of global audio being unmuted, and audio must have ended based on the timestamp, etc.. There's nothing else in the log to indicate anything being wrong even though I have debug logging enabled. I do not see the play/stop icon turn to a state where it looks like it's playing. It just goes from loading/spinning to the speaker icon. Pressing play works immediately, so the audio is being loaded no problem.

Can't reproduce this on chrome or firefox.

Of course not: If your auto-play is working, I'd expect that you wouldn't be able to reproduce this. There would be no scenario where audio is loaded but hasn't been played yet (the audio is only automatically requested if you have auto-play enabled)

2 replies

danny-avila Sep 5, 2024
Maintainer

I just realized you originally said:

if you have caching disabled, but auto-play enabled, it's going to grab the audio once when the message is returned, and again when you hit play.

This is expected behavior? even without autoplay, clicking the audio message will exhibit the same behavior with caching disabled

ThreepE0 Sep 5, 2024
Author

This is expected behavior? even without autoplay, clicking the audio message will exhibit the same behavior with caching disabled

Agreed (and pardon the confusion haha sorry about this!)

What I'm saying is that, because of the bug I'm experiencing (and due to the way some browsers fight so hard against things auto-playing,) There is a situation where audio is available and hasn't been played, and it will be fetched twice. I really think that should be cared for, because if you have even a medium-sized response from the model, the number of tokens that are wasted grabbing the audio twice, not to mention the bandwidth, seems a bit of a shame when a check for whether or not the audio has been played could be added before grabbing a new copy from the TTS service.

I really appreciate your help on this issue. I'd be happy to pay you for a troubleshooting session (I hope that's appropriate to offer,) or provide whatever logs would help. I keep trying to dig into the issue myself and I'm a bit out of my depth here. This project is awesome so far, and I'd love to see it keep improving.

ThreepE0 · 2024-09-05T22:50:57Z

ThreepE0
Sep 5, 2024
Author

Ok, so quick update: I was able to get the auto-play working, and the problem was totally my fault: instead of going to OpenAI directly, I'm going through a proxy for TTS. I fixed the proxy functionality and that's working now. Going direct or through the proxy, auto-play works now.

However, autoplay breaks again if I use a custom endpoint for the LLM. But I think that's my issue too due to the endpoint not supporting chunking.

I think I'm all set now thank you!

0 replies

ThreepE0 · 2024-09-06T00:13:33Z

ThreepE0
Sep 6, 2024
Author

Sorry, last thing here: I'm realizing that if you're using an LLM endpoint that doesn't support streaming, then auto-play just doesn't work. At first I was thinking that I could work around that, but it turns out that would be much more complicated than I thought, if possible at all.

I'm using dropParams: ["stream"] to drop the request to stream requests back from the LLM, but doing so also breaks the TTS auto-play, whether it supports streaming or not (testing going direct to OpenAI, and the dropParams is not defined in the TTS section, it's in the LLM endpoint section only.)

If I switch mid-conversation back to OpenAI, auto-play works with no change to TTS config.

It would be great if auto-play worked in this admittedly niche scenario. I do think the caching behavior that I mentioned earlier could be improved to prevent wasting tokens also.

0 replies

ThreepE0 · 2024-09-07T02:48:13Z

ThreepE0
Sep 7, 2024
Author

Quick update: Sticking with the direct-to-openai connection for TTS, I'm noticing that with short replies from the LLM, auto-play fails intermittently. I would chalk it up to browser weirdness, and perhaps that is the case, but I am noticing that if I get a long response with audio after the short responses are failing, it always plays.

I'm prompting with things like "in three words what is a door" to trigger three word responses, and the audio is retrieved but doesn't play a lot of the time.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autoplay Latest Message #3930

{{title}}

Replies: 5 comments 2 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Autoplay Latest Message #3930

ThreepE0 Sep 5, 2024

What happened?

Steps to Reproduce

What browsers are you seeing the problem on?

Relevant log output

Screenshots

Code of Conduct

Replies: 5 comments · 2 replies

danny-avila Sep 5, 2024 Maintainer

ThreepE0 Sep 5, 2024 Author

danny-avila Sep 5, 2024 Maintainer

ThreepE0 Sep 5, 2024 Author

ThreepE0 Sep 5, 2024 Author

ThreepE0 Sep 6, 2024 Author

ThreepE0 Sep 7, 2024 Author

ThreepE0
Sep 5, 2024

Replies: 5 comments 2 replies

danny-avila
Sep 5, 2024
Maintainer

ThreepE0
Sep 5, 2024
Author

danny-avila Sep 5, 2024
Maintainer

ThreepE0 Sep 5, 2024
Author

ThreepE0
Sep 5, 2024
Author

ThreepE0
Sep 6, 2024
Author

ThreepE0
Sep 7, 2024
Author