KoboldAI Horde

About

KoboldAI is a part of AI Horde but is for generating text rather than images. it allows you to use many text generation models for free.

How to contribute using Vast.ai

A text worker can easily be run on services like (Vast.ai)[https://cloud.vast.ai] using (koboldcpp)[https://github.com/LostRuins/koboldcpp].

Once setting your account up, create a custom template, in image path, put pytorch/pytorch:2.1.0-cuda12.1-cudnn8-devel and make sure cuda12.1-cudnn8-devel is selected as version tag.

Select "Run a jupyter-python notebook" and check the two options.

As on-start script, input:

git clone https://github.com/LostRuins/koboldcpp
cd koboldcpp
make -B LLAMA_CUBLAS=1 -j8
pip install -r requirements.txt
wget -nc https://huggingface.co/TheBloke/MythoMax-L2-13B-GGUF/resolve/main/mythomax-l2-13b.Q4_K_M.gguf
python koboldcpp.py mythomax-l2-13b.Q4_K_M.gguf --port 5001 --usecublas normal 0 mmq --gpulayers 43 --threads 1 --contextsize 4096 --hordeconfig MythoMax-L2-13b 1024 4096 APIKEY WORKERNAME

Replace "APIKEY" and "WORKERNAME" with your horde API key and the worker name for this instance. This could make your API key visible to Vast.ai staff and owner of the instances, be aware of this You can also swap the model for a different one, but you might need more disk space or VRAM for a bigger model, as well as adjusting the number of gpulayers

Name your template, and do not enable "Public Template"

In the "search" section of the site, select your template in "Instance configuration". Set disk space to 8GB or more

As filter options, you will want to pick:

GPU RAM minimum 12G
Inet Down As you want, but minimum 200Mbps down to avoid spending hours on the model download
GPU count to 1 (not sure if the script works for more than one)
Unverified instances (cheaper)

Feel free to play with the sorting function to see which instance you might prefer, as well as interruptible instances. Upon clicking "rent", vast.ai will start the image and run the script. You can monitor it using the "request logs" button.

As a quick way of checking that the instance is working, look at

Disk used: If this is low, the model wasn't downloaded
VRAM used: If this isn't around 12G, the process didn't load the model in VRAM
GPU usage %: Sometimes drop to 0 between jobs, but should often be more than 50%

Remember to delete instances after usage, as an offline instance will still be billed for disk space.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KoboldAI Horde

About

How to contribute using Vast.ai

Clone this wiki locally