How to configure ollama model in 0.12 llama-server HTTP api #2374

reddiedev · 2024-06-07T12:10:02Z

reddiedev
Jun 7, 2024

I was able to successfully connect to my locall ollama server, and completions in VSCode are working great

Current config in .tabby/config.toml

[model.completion.http]
kind = "ollama/completion"
api_endpoint = "http://localhost:11434"

just can't find any reference on how to set which ollama model will be used for completion. Is there a way to set it up and confirm which model is being used at any given time?

wsxiaoys · 2024-06-08T11:23:34Z

wsxiaoys
Jun 8, 2024
Maintainer

Hi, the completion prompt is quite specific to the model and is not usually documented in detail. You can refer to https://github.com/tabbyml/registry-tabby/blob/main/models.json to see a list of models that come with FIM support. Also, please let us know which model you would like to use in Ollama for this purpose.

0 replies

blob42 · 2024-06-16T14:08:38Z

blob42
Jun 16, 2024

@reddiedev @wsxiaoys What flags are used to start the server ? I am using docker and a similar config to yours and it seams my config file is totally ignored. I did not add the model flag so tabby is trying to download an empty model and keeps crashing.

1 reply

reddiedev Jun 16, 2024
Author

I got it working by doing the following:

Pulling codellama:13b(one of the models supported in https://github.com/tabbyml/registry-tabby/blob/main/models.json) and removing all other models in my ollama instance.
Configuring .tabby/config.toml as shown above

[model.completion.http]
kind = "ollama/completion"
api_endpoint = "http://localhost:11434"

Make sure you have removed all previous tabby containers. I am honestly not sure if the following command is correct, but it worked when I was trying it

docker run -it --gpus all \
  -p 8080:8080 -v $HOME/.tabby:/data \
  tabbyml/tabby serve  --device cuda

*notice I just removed the --model flag from the default run command
4. There were no error logs when I checked docker logs <container_name>, and inference was working well in VSCode.

However, I eventually moved to the old system as I preferred running a different ollama model for chat and just stopping/restarting a tabby container when I need it (since my GPU can't run both at the same time)

I suggest you try looking at the docker container logs (if you haven't yet), or maybe you misconfigured the ip/port on the ollama api endpoint, or maybe you were using an unsupported model

blob42 · 2024-06-16T14:59:35Z

blob42
Jun 16, 2024

Thanks for the notes. I tried the same and the debug logs show a recurring error because the model is empty. I don't understand with your configuration how does tabby know that it should use the codellama model.

…

On June 16, 2024 4:19:25 PM GMT+02:00, reddiedev ***@***.***> wrote: I got it working by doing the following: 1. Pulling `codellama:13b`(one of the models supported in https://github.com/tabbyml/registry-tabby/blob/main/models.json) and removing **all** other models in my ollama instance. 2. Configuring `.tabby/config.toml` as shown above ```toml [model.completion.http] kind = "ollama/completion" api_endpoint = "http://localhost:11434" ``` 3. Make sure you have removed all previous tabby containers. I am honestly not sure if the following command is correct, but it worked when I was trying it ``` docker run -it --gpus all \ -p 8080:8080 -v $HOME/.tabby:/data \ tabbyml/tabby serve --device cuda ``` **notice I just removed the --model flag from the default run command* 4. There were no error logs when I checked `docker logs <container_name>`, and inference was working well in VSCode. However, I eventually moved to the old system as I preferred running a different ollama model for chat and just stopping/restarting a tabby container when I need it (since my GPU can't run both at the same time) I suggest you try looking at the docker container logs (if you haven't yet), or maybe you misconfigured the ip/port on the ollama api endpoint, or maybe you were using an unsupported model -- Reply to this email directly or view it on GitHub: #2374 (reply in thread) You are receiving this because you commented. Message ID: ***@***.***>

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to configure ollama model in 0.12 llama-server HTTP api #2374

{{title}}

Replies: 3 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

How to configure ollama model in 0.12 llama-server HTTP api #2374

reddiedev Jun 7, 2024

Replies: 3 comments · 1 reply

wsxiaoys Jun 8, 2024 Maintainer

blob42 Jun 16, 2024

reddiedev Jun 16, 2024 Author

blob42 Jun 16, 2024

reddiedev
Jun 7, 2024

Replies: 3 comments 1 reply

wsxiaoys
Jun 8, 2024
Maintainer

blob42
Jun 16, 2024

reddiedev Jun 16, 2024
Author

blob42
Jun 16, 2024