Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token streaming doesn't work #517

Open
LoganDark opened this issue May 10, 2024 · 9 comments
Open

Token streaming doesn't work #517

LoganDark opened this issue May 10, 2024 · 9 comments

Comments

@LoganDark
Copy link

kobold_debug.json

For some reason token streaming just does not work. It's enabled and the actual terminal output from the server updates every token but no messages are actually sent over websocket to the UI so it can't be displayed until the response is complete. No idea what is going on.

I'm on the latest United commit 1e985ed.

@henk717
Copy link
Owner

henk717 commented May 10, 2024

What kind of model / api was it hooked up to?

@LoganDark
Copy link
Author

LoganDark commented May 10, 2024

Thought that would be in the debug json, but I've tried with both LLaMA 2 and Mixtral 8x7B in GGUF format, running on KoboldCPP (with cuBLAS and full offload to a 3090). I'm using the KoboldAI United UI (localhost:5000, not lite).

@henk717
Copy link
Owner

henk717 commented May 10, 2024

United can't stream over the API thats why streaming is missing.

@LoganDark
Copy link
Author

What do you mean it can't stream over the API? So it can't stream at all?

@henk717
Copy link
Owner

henk717 commented May 10, 2024

It can stream when you use huggingface based models in the main UI.

@LoganDark
Copy link
Author

So I can't use my 3090 to run models? Or I can't use GGUF files?

@henk717
Copy link
Owner

henk717 commented May 10, 2024

You can't use GGUF's combined with United combined with streaming.
You can use it when you directly use Koboldcpp in its own bundled KoboldAI Lite.

@LoganDark
Copy link
Author

OK, so the solution is to not use GGUF then? the lite UI is mostly unusable for me (it works fine, it just has an awful user experience)

@henk717
Copy link
Owner

henk717 commented May 10, 2024

Yes, the backends built in to KoboldAI United should work (Huggingface, exllama2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants