Releases · LostRuins/koboldcpp

13 Apr 06:44

ca297c1

koboldcpp-1.6

koboldcpp-1.6

This is a bugfix release, to try and see if it resolves the recent crashing issues reported.
Recent CLBlast fixes merged, now shows GPU name.
Batch size reduced back from 1024 to 512 due to reported crashes.

To use, download and run the koboldcpp.exe, which is a one-file pyinstaller.
Alternatively, drag and drop a compatible ggml model on top of the .exe, or run it and manually select the model in the popup dialog.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program with the --help flag.

Alternative Options:
Non-AVX2 version now included in the same .exe file, enable with --noavx2 flags

Assets 3

12 Apr 16:03

LostRuins

v1.5

f4257a8

koboldcpp-1.5

koboldcpp-1.5

This release consolidates a lot of upstream bug fixes and improvements, if you had issues with earlier versions please try this one. The upstreamed GPTJ changes should also make GPT-J-6B inference even faster by another 20% or so.
Integrated AVX2 and Non-AVX2 support into the same binary for windows. If your CPU is very old and doesn't support AVX2 instructions, you can switch to compatibility mode with --noavx2, but it will be slower.
Now has integrated experimental CLBlast support thanks to @0cc4m, which uses your GPU to speed up prompt processing. Enable it with --useclblast [platform_id] [device_id]
To quantize various fp16 model, you can use the quantizers in the tools.zip. Remember to convert them from Pytorch/Huggingface format first with the relevant Python conversion scripts.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

For more information, be sure to run the program with the --help flag.

Alternative Options:
Non-AVX2 version now included in the same .exe file, enable with --noavx2 flags
If you prefer, you can download the zip file, extract and run the python script e.g. koboldcpp.py [ggml_model.bin] manually

Contributors

0cc4m

Assets 4

2 Join discussion

10 Apr 16:36

LostRuins

v1.4

69b85f5

koboldcpp-1.4

koboldcpp-1.4

This is an expedited bugfix release because the new model formats were breaking on large contexts.
Also people have requested mmap to be the default, so now it is, you can disable it with --nommap

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

Alternative Options:
None are provided for this release as it is a temporary one.

Assets 3

10 Apr 04:14

LostRuins

v1.3

f53238f

koboldcpp-1.3

koboldcpp-1.3

-Bug fixes for various issues (missing endpoints, malformed url)
-Merged upstream file loading enhancements. mmap is now disabled by default, enable with --usemmap
-Now can automatically distinguish between older and newer GPTJ and GPT2 quantized files.
-Version numbers are now displayed at start

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

Alternative Options:
If your CPU is very old and doesn't support AVX2 instructions, you can try running the noavx2 version. It will be slower.
If you prefer, you can download the zip file, extract and run the python script e.g. koboldcpp.py [ggml_model.bin] manually
To quantize an fp16 model, you can use the quantize.exe in the tools.zip

Assets 6

08 Apr 17:33

LostRuins

v1.2

26a7933

koboldcpp-1.2

koboldcpp-1.2

This is a checkpoint version which should be relatively stable and includes more release variants.

Support for new versions of GPT2 models, for example the Cerebras models on HF.
Prevented the TK GUI window from staying open and being annoying.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

Assets 6

07 Apr 14:17

LostRuins

v1.1

1abcdb2

koboldcpp-1.1

koboldcpp-1.1

Simplifying the version numbering as I don't think I really need that granularity
Various small tweaks and improvements, and bugfixes
Updated embedded kobold lite

To use, download and run the koboldcpp.exe
Alternatively, drag and drop a compatible ggml model on top of the .exe, or run it and manually select the model in the popup dialog.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

If your CPU is very old and doesn't support AVX2 instructions, you can try running the noavx2 version. It will be slower.

Assets 4

06 Apr 08:55

LostRuins

v1.0.10

b56f872

koboldcpp-1.0.10

koboldcpp-1.0.10

Updated the embedded kobold lite to version 19
Merged the various improvements from the parent repo
Removed psutil dependencies, reverting calculations to be based on 0.5 x cpu_count
Changed makefile to hopefully work on ARM

To use, download and run the koboldcpp.exe
Alternatively, drag and drop a compatible ggml model on top of the .exe, or run it and manually select the model in the popup dialog.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

Assets 3

05 Apr 08:16

LostRuins

v1.0.9beta

1490cdd

koboldcpp-1.0.9beta

koboldcpp-1.0.9beta

Integrated support for GPT2! This also should theoretically work with Cerebras models, but I have not tried those yet. This is a great way to get started as now you can try models so tiny even a potato CPU can run them. Here's a good one to start with: https://huggingface.co/ggerganov/ggml/resolve/main/ggml-model-gpt-2-117M.bin with which I can generate 100 tokens in a second.
Upgraded embedded Kobold Lite to support a Stanford Alpaca compatible Instruct Mode, which can be enabled in settings.
Removed all -march=native and -mtune=native flags when building the binary. Compatibility should be more consistent with different devices now.
Fixed an incorrect flag name used to trigger the ACCELERATE library for mac OSX. This should give you greatly increased performance of OSX users for GPT-J and GPT2 models, assuming you have ACCELERATE support.
Added Rep Pen for GPT-J and GPT-2 models, and by extension pyg.cpp, this means that repetition penalty now works similar to the way it does in llama.cpp.

To use, download and run the koboldcpp.exe
Alternatively, drag and drop a compatible ggml model on top of the .exe, or run it and manually select the model in the popup dialog.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

Assets 3

03 Apr 03:58

LostRuins

v1.0.8beta

eb5b22d

koboldcpp-1.0.8beta

koboldcpp-1.0.8beta

Rebranded to koboldcpp (formerly llamacpp-for-kobold). Library file names and references are changed too, Please let me know if anything is broken!
Added support for the original GPT4ALL.CPP format!
Added support for GPT-J formats, including the original 16bit legacy format as well as the 4bit version from Pygmalion.cpp
Switched compiler flag from -O3 to -Ofast. This should increase generation speed even more, but I dunno if anything will break, please let me know if so.
Changed default threads to scale according to physical Core counts instead of os.cpu_count(). This will generally result in fewer threads being utilized, but it should provide a better default for slower systems. You can override this manually with --threads parameter.

To use, download and run the koboldcpp.exe
Alternatively, drag and drop a compatible quantized model for llamacpp on top of the .exe, or run it and manually select the model in the popup dialog.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

Assets 3

0 Join discussion

01 Apr 01:11

LostRuins

v1.0.7

9ab6e87

llamacpp-for-kobold-1.0.7

llamacpp-for-kobold-1.0.7

Added support for new version of the ggml llamacpp model format (magic=ggjt, version 3). All old versions will continue to be supported.
Integrated speed improvements from parent repo.
Fixed an encoding issue with utf-8 in the outputs.
Improved console debug information during generation, now shows token progress and time taken directly.
Set non-streaming to be the default mode. You can enable streaming with --stream

To use, download and run the llamacpp-for-kobold.exe
Alternatively, drag and drop a compatible quantized model for llamacpp on top of the .exe, or run it and manually select the model in the popup dialog.

and then once loaded, you can connect like this (or use the full koboldai client):
http://localhost:5001

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Contributors

Releases: LostRuins/koboldcpp

koboldcpp-1.6

koboldcpp-1.5

Contributors

koboldcpp-1.4

koboldcpp-1.3

koboldcpp-1.2

koboldcpp-1.1

koboldcpp-1.0.10

koboldcpp-1.0.9beta

koboldcpp-1.0.8beta

llamacpp-for-kobold-1.0.7