How to use as replacement of chatgpt with 4o #1366

superchargez · 2025-02-12T08:18:48Z

superchargez
Feb 12, 2025

Hi there, great work. I have enjoyed playing with gguf models a lot. However, I've a question, how can koboldcpp be used as replacement of gpt4o (multi modal API). Currently this model runs at 5001, however, all I can do is have send text to text conversation. But I also want to be able to send audio files or use an endpoint where I can have live conversation through speech (both me and bot) as well as upload images which it can explain by speaking, like chatgpt (gpt4o) can.

I can provide it model files by uploading text, audio as well as mmproj file, however, I can only interact with it through chat interface, or at least I don't know what endpoint I should use and how to send post requests for audio/image files (or live audio stream for speech recognition).

I use python but bash and cmd commands are also welcomed. Thank you.

Answered by LostRuins

Feb 12, 2025

There isn't an all-in-one "omni" model for KoboldCpp, instead you need to use various smaller models combined to get what you need.

For text generation, you already have that.
For image generation, KoboldCpp supports Stable Diffusion 1.5, SDXL, SD3 and Flux models. For the API, you would use this A1111 compatible txt2img api https://lite.koboldai.net/koboldcpp_api#/sdapi%2Fv1/post_sdapi_v1_txt2img
For image recognition/vision, KoboldCpp supports various vision mmproj for multiple architectures, currently the best one is MiniCPM v2.6, you can get it here https://huggingface.co/koboldcpp/mmproj, load that along with the main GGUF model. If you want to access it over the API, use the images …

View full answer

LostRuins · 2025-02-12T08:51:51Z

LostRuins
Feb 12, 2025
Maintainer

There isn't an all-in-one "omni" model for KoboldCpp, instead you need to use various smaller models combined to get what you need.

For text generation, you already have that.
For image generation, KoboldCpp supports Stable Diffusion 1.5, SDXL, SD3 and Flux models. For the API, you would use this A1111 compatible txt2img api https://lite.koboldai.net/koboldcpp_api#/sdapi%2Fv1/post_sdapi_v1_txt2img
For image recognition/vision, KoboldCpp supports various vision mmproj for multiple architectures, currently the best one is MiniCPM v2.6, you can get it here https://huggingface.co/koboldcpp/mmproj, load that along with the main GGUF model. If you want to access it over the API, use the images field in https://lite.koboldai.net/koboldcpp_api#/api%2Fv1/post_api_v1_generate
For speech generation, KoboldCpp supports OuteTTS. Please follow the instructions listed https://github.com/LostRuins/koboldcpp/releases/tag/v1.82.4 and you can also use the api at https://lite.koboldai.net/koboldcpp_api#/api%2Fextra/post_api_extra_tts
For speech recognition, KoboldCpp supports Whisper. Models can be found at https://huggingface.co/koboldcpp/whisper/tree/main and API is https://lite.koboldai.net/koboldcpp_api#/api%2Fextra/post_api_extra_transcribe

Additionally, it also provides openai, xtts and various other APIs like ComfyUI compatibility. Generally you can access all basic features from the Web KoboldAI lite interface, just check various settings and observe the API calls used.

You can find all model links on the wiki. Please refer to it for more info.
https://github.com/LostRuins/koboldcpp/wiki#getting-an-ai-model-file

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to use as replacement of chatgpt with 4o #1366

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to use as replacement of chatgpt with 4o #1366

superchargez Feb 12, 2025

Replies: 1 comment

LostRuins Feb 12, 2025 Maintainer

superchargez
Feb 12, 2025

LostRuins
Feb 12, 2025
Maintainer