Skip to content

Add debounce for auto_fim with configurable debounce duration #66

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Juno-T
Copy link

@Juno-T Juno-T commented May 10, 2025

Add fim debounce specifically for auto_fim. It will not affect other fim calls, such as speculative fim.

I noticed that without debounce, llama#fim is being called on almost every character when typing in insert mode. This will help reduce unnecessary fim calls. I set the debounce duration to 500ms and it keeps my machine cool without much noticeable latency.

@pnb
Copy link
Contributor

pnb commented May 28, 2025

How does this compare to increasing the time on line 500? My understanding is that the current implementation wouldn't necessarily prevent many requests in succession if the first request finished before the next one started; it only prevents new requests from being issued before the previous one completes. But if that understanding is correct, it seems not too different, and simpler than debouncing.

Increasing the existing timer to 500ms on line 500 does seem to fix an issue I have, which is that when I use llama.cpp through a crummy reverse proxy with SSH tunneling it occasionally gets overloaded and stops working.

@ggerganov
Copy link
Member

@pnb I was also wondering - it seems like it's the same thing. We should simply make the constant on line 500 to become a configurable parameter.

@Juno-T
Copy link
Author

Juno-T commented May 28, 2025

Thanks for the comments. I have tried increasing the duration at the line 500, but it didn't achieve the behavior that I want.

The behavior that I want is that, the plugin will only make a request when the user has stopped typing for e.g. 500ms, and not before that.

The current implementation will make several completion requests even when I'm typing a long line, starting from the very moment I enter insert mode. And these requests are a waste, because the characters I am going to type next, will make the request 100ms ago obsolete (given that the model guesses them wrong, which is very common). Only one request at the end, when I stopped typing is useful.

My PR achieves that, but I'm not sure if it's the right place to put this logic in or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants