koboldcpp-1.45.2
koboldcpp-1.45.2
- Improved embedded horde worker: more responsive, and added Session Stats (Total Kudos Earned, EarnRate, Timings)
- Added a new parameter to grammar sampler API
grammar_retain_state
which lets you persist the grammar state across multiple requests. - Allow launching by picking a .kcpps file in the file selector GUI combined with
--skiplauncher
. That settings file must already have a model selected. (Similar to--config
, but that one doesn't use GUI at all.) - Added a new flag toggle
--foreground
for windows users. This sends the console terminal to the foreground every time a new prompt is generated, to avoid some idling slowdown issues. - Increased max support context with
--contextsize
to 32k, but only for GGUF models. It's still limited to 16k for older model versions. GGUF now actually has no hard limit to max context since it switched to using allocators, but it's not be compatible with older models. Additionally, models not trained with extended context are unlikely to work when RoPE scaled beyond 32k. - Added a simple OpenAI compatible completions API, which you can access at
/v1/completions
. You're still recommended to use the Kobold API as it has many more settings. - Increased stop_sequence limit to 16.
- Improved SSE streaming by batching pending tokens between events.
- Upgraded Lite polled-streaming to work even in multiuser mode. This works by sending a unique key for each request.
- Improved Makefile to reduce unnecessary builds, added flag for skipping K-quants.
- Enhanced Remote-Link.cmd to also work on Linux, simply run it to create a Cloudflare tunnel to access koboldcpp anywhere.
- Improved the default colab notebook to use mmq.
- Updated Lite and pulled other fixes and improvements from upstream llama.cpp.
Important: Deprecation Notice for KoboldCpp 1.45.1
The following command line arguments are considered deprecated and will be removed soon, in a future version.
--psutil_set_threads - parameter will be removed as it's now generally unhelpful, the defaults are usually sufficient.
--stream - a Kobold Lite only parameter, which is now a toggle saved inside Lite's settings and thus no longer necessary.
--unbantokens - EOS unbans should only be set via the generate API, in the use_default_badwordsids json field.
--usemirostat - Mirostat values should only be set via the generate API, in the mirostat mirostat_tau and mirostat_eta json fields.
Hotfix for 1.45.2 - Fixed a bug with reading thread counts in 1.45 and 1.45.1, also moved the OpenAI endpoint from /api/extra/oai/v1/completions
to just /v1/completions
To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
If you don't need CUDA, you can use koboldcpp_nocuda.exe which is much smaller.
If you're using AMD, you can try koboldcpp_rocm at YellowRoseCx's fork here
Run it from the command line with the desired launch parameters (see --help
), or manually select the model in the GUI.
and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001
For more information, be sure to run the program from command line with the --help
flag.