Releases · LostRuins/koboldcpp

25 Mar 03:29

v1.0.5

8a339bd

llamacpp-for-kobold-1.0.5

llamacpp-for-kobold-1.0.5

Merged the upstream fixes for 65b
Clamped max thread count to 4, it actually provides better results as it is memory bottlenecked.
Added support for select kv data type, defaulting to f32 instead of f16
Added more default build flags
Added softprompts endpoint

To use, download and run the llamacpp_for_kobold.exe
Alternatively, drag and drop a compatible quantized model for llamacpp on top of the .exe, or run it and manually select the model in the popup dialog.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

Assets 3

24 Mar 14:22

LostRuins

v1.0.4

e791827

llamacpp-for-kobold-1.0.4

llamacpp-for-kobold-1.0.4

Added a script to make standalone pyinstaller .exes, which will be used for all future releases. The llamacpp.dll and llama-for-kobold.py files are still available by cloning the repo and will be included and updated there.
Added token caching for prompts, allowing fast forwarding through partially duplicated prompts. This make edits towards the end of the previous prompt much faster.
Merged improvements from parent repo.
Weights not included.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

Assets 4

22 Mar 14:50

LostRuins

v1.0.3

4ff58f7

llamacpp-for-kobold-1.0.3

llamacpp-for-kobold-1.0.3

Applied the massive refactor from the parent repo. It was a huge pain but I managed to keep the old tokenizer untouched and retained full support for the original model formats.
Reduced default batch sizes greatly, as large batch sizes were causing bad output and high memory usage
Support dynamic context lengths sent from client.
TavernAI is working although I wouldn't recommend it, they spam the server with multiple requests of huge contexts so you're going to have a very painful time getting responses.

Weights not included.
To use, download, extract and run (defaults port is 5001):
llama_for_kobold.py [ggml_quant_model.bin] [port]

and then you can connect like this (or use the full koboldai client):
http://localhost:5001

Assets 3

21 Mar 13:19

LostRuins

v1.0.2

a1625c4

llamacpp-for-kobold-1.0.2

llamacpp-for-kobold-1.0.2

Added an embedded version of Kobold Lite inside (AGPL Licensed)
Updated to new ggml model format, but still maintain support for the old one and the old tokenizer.
Changed license to AGPL v3. The original GGML library and llama.cpp are still under MIT license in their original repos.

Weights not included.
To use, download, extract and run (defaults port is 5001):
llama_for_kobold.py [ggml_quant_model.bin] [port]

and then you can connect like this (or use the full koboldai client):
http://localhost:5001

Assets 3

20 Mar 07:05

LostRuins

v1.0.1

dda69d4

llamacpp-for-kobold-1.0.1

Bugfixes for OSX, and KV caching allows continuing a previous generation without reprocessing the whole prompt
Weights not included.

To use, download, extract and run (defaults port is 5001):
llama_for_kobold.py [ggml_quant_model.bin] [port]

and then you can connect like this (or use the full koboldai client):
https://lite.koboldai.net/?local=1&port=5001

Assets 3

18 Mar 17:10

LostRuins

v1.0.0

c21c89e

llamacpp-for-kobold-1.0.0

llamacpp-for-kobold-1.0.0
Initial version
Weights not included.

To use, download, extract and run:
llama_for_kobold.py [ggml_quant_model.bin] [port]

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: LostRuins/koboldcpp

llamacpp-for-kobold-1.0.5

llamacpp-for-kobold-1.0.4

llamacpp-for-kobold-1.0.3

llamacpp-for-kobold-1.0.2

llamacpp-for-kobold-1.0.1

llamacpp-for-kobold-1.0.0