Skip to content

Releases: LostRuins/koboldcpp

llamacpp-for-kobold-1.0.5

25 Mar 03:29
Compare
Choose a tag to compare

llamacpp-for-kobold-1.0.5

  • Merged the upstream fixes for 65b
  • Clamped max thread count to 4, it actually provides better results as it is memory bottlenecked.
  • Added support for select kv data type, defaulting to f32 instead of f16
  • Added more default build flags
  • Added softprompts endpoint

To use, download and run the llamacpp_for_kobold.exe
Alternatively, drag and drop a compatible quantized model for llamacpp on top of the .exe, or run it and manually select the model in the popup dialog.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

llamacpp-for-kobold-1.0.4

24 Mar 14:22
Compare
Choose a tag to compare

llamacpp-for-kobold-1.0.4

  • Added a script to make standalone pyinstaller .exes, which will be used for all future releases. The llamacpp.dll and llama-for-kobold.py files are still available by cloning the repo and will be included and updated there.
  • Added token caching for prompts, allowing fast forwarding through partially duplicated prompts. This make edits towards the end of the previous prompt much faster.
  • Merged improvements from parent repo.
  • Weights not included.

To use, download and run the llamacpp_for_kobold.exe
Alternatively, drag and drop a compatible quantized model for llamacpp on top of the .exe, or run it and manually select the model in the popup dialog.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

llamacpp-for-kobold-1.0.3

22 Mar 14:50
Compare
Choose a tag to compare

llamacpp-for-kobold-1.0.3

  • Applied the massive refactor from the parent repo. It was a huge pain but I managed to keep the old tokenizer untouched and retained full support for the original model formats.
  • Reduced default batch sizes greatly, as large batch sizes were causing bad output and high memory usage
  • Support dynamic context lengths sent from client.
  • TavernAI is working although I wouldn't recommend it, they spam the server with multiple requests of huge contexts so you're going to have a very painful time getting responses.

Weights not included.
To use, download, extract and run (defaults port is 5001):
llama_for_kobold.py [ggml_quant_model.bin] [port]

and then you can connect like this (or use the full koboldai client):
http://localhost:5001

llamacpp-for-kobold-1.0.2

21 Mar 13:19
Compare
Choose a tag to compare

llamacpp-for-kobold-1.0.2

  • Added an embedded version of Kobold Lite inside (AGPL Licensed)
  • Updated to new ggml model format, but still maintain support for the old one and the old tokenizer.
  • Changed license to AGPL v3. The original GGML library and llama.cpp are still under MIT license in their original repos.

Weights not included.
To use, download, extract and run (defaults port is 5001):
llama_for_kobold.py [ggml_quant_model.bin] [port]

and then you can connect like this (or use the full koboldai client):
http://localhost:5001

llamacpp-for-kobold-1.0.1

20 Mar 07:05
Compare
Choose a tag to compare

llamacpp-for-kobold-1.0.1

  • Bugfixes for OSX, and KV caching allows continuing a previous generation without reprocessing the whole prompt
  • Weights not included.

To use, download, extract and run (defaults port is 5001):
llama_for_kobold.py [ggml_quant_model.bin] [port]

and then you can connect like this (or use the full koboldai client):
https://lite.koboldai.net/?local=1&port=5001

llamacpp-for-kobold-1.0.0

18 Mar 17:10
c21c89e
Compare
Choose a tag to compare

llamacpp-for-kobold-1.0.0
Initial version
Weights not included.

To use, download, extract and run:
llama_for_kobold.py [ggml_quant_model.bin] [port]