-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Screen Rendering Slows Down Towards the End of Streaming from LLM Server #1388
Comments
This has been troubling me too. My quick investigation is leading me here: chat-ui/src/lib/utils/messageUpdates.ts Lines 227 to 248 in d7b02af
Looks like the buffer on the browser side fills up, and the delay calculations don't work quite right in that situation. Inference with bigger contexts is very bursty which i think skews it. |
I think we need to tweak the smoothing function to be faster the bigger the current buffer is. That way if there's a lot of words waiting to be displayed they should come out faster. |
I implore for an option to never throttle token output to the document. The way it gets dumped upon halting should give an idea as to where to look for a possible bug in the code. P. S. |
I notice a change has been made that alters batching of token output but aggravates JS sleeping cumbersomeness. Could idling be decoupled from global tab sleep..? That'd be a good compromise to start with. |
This commit added a PUBLIC_SMOOTH_UPDATES flag: Set it to true to enable the existing behaviour. |
I have to correct myself. Spent days trying to debug and profile, but it's a fool's errand when you obfuscate everything. dl=function(L){let le=null,Ee=null;if(za)L="<remove></remove>"+L;else{const Gt=k(L,/^[\r\n\t ]+/);Ee=Gt&&Gt[0]}Fr==="application/xhtml+xml"&&dr===B0&&(L='<html xmlns="http://www.w3.org/1999/xhtml"><head></head><body>'+L+"</body></html>");const vt=Kt?Kt.createHTML(L):L;if(dr===B0)try{le=new Wt().parseFromString(vt,Fr)}catch{}if(!le||!le.documentElement){le=Aa.createDocument(dr,"template",null);try{le.documentElement.innerHTML=Ra?Er:vt}catch{}}const Xt=le.body||le.documentElement;return L&&Ee&&Xt.insertBefore(X.createTextNode(Ee),Xt.childNodes[0]||null),dr===B0?u1.call(le,tr?"html":"body")[0]:tr?le.documentElement:Xt} 👀 Marvellous..! Was obfuscation really necessary in an open-source project??? Here's a profile screenshot… |
I've spent a bit of time instrumenting this; if you |
While the initial part of streaming from the LLM server is fine, the screen display speed slows down as time progresses, particularly towards the end of the streaming process. However, the LLM server has already finished sending the data, and only the screen display continues to be updated.
chat-ui/src/routes/conversation/[id]/+server.ts
Lines 390 to 396 in 6de97af
I suspect the above code section. Could it be a shortage of buffer?
The text was updated successfully, but these errors were encountered: