Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating a buffer/delay to displayed Segments #270

Open
xSlikZodiac opened this issue Aug 29, 2024 · 2 comments
Open

Creating a buffer/delay to displayed Segments #270

xSlikZodiac opened this issue Aug 29, 2024 · 2 comments

Comments

@xSlikZodiac
Copy link

Hey there, im new to github so if im creating this in the wrong area, please dont blast me. I've noticed with whisper models and implementations like this one that the displayed text segments often "self correct" or even jitter. this Jitter is even depicted in the main Demo during the chrome extension video demo. My thought was in the client, is there anyway to add a buffer to the words? maybe a 1-2 word buffer so it gives time for this? I know the alternative is waiting for the segment to complete but that would defeat the purpose of it being "Live" since it wouldn't be as real time. Im saying this because if youre trying to read along and the text is constantly updating words, punctuation, symbols, etc it makes it incredibly hard to follow. So im curious to see if anyone has found a way to prevent this while maintaining a Live Aspect. I know Youtube LiveStream captions does this well but even they have a 30 second delay. im curious to see if anyone has a work around this that is still "Live" but accurately displayed.

Thanks!!

@xSlikZodiac
Copy link
Author

My thought was to trimm off the last 3 words in the process_segments function but that didnt seem to help as much as i thought it would. Not sure if anyone has found a work around or maybe this is something i missed in the server.py script. Anyways, thanks!

@AdolfVonKleist
Copy link

What you're asking for is basically impossible. The reason it changes it because the early hypotheses are less reliable and lower confidence. You either have to wait - probably 5s is enough IMO - or accept that it is going to change. Even people behave this way; I'll often revise my 'internal recognition' some seconds afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants