Replies: 3 comments 1 reply
-
Hi, the completion prompt is quite specific to the model and is not usually documented in detail. You can refer to https://github.com/tabbyml/registry-tabby/blob/main/models.json to see a list of models that come with FIM support. Also, please let us know which model you would like to use in Ollama for this purpose. |
Beta Was this translation helpful? Give feedback.
-
@reddiedev @wsxiaoys What flags are used to start the server ? I am using docker and a similar config to yours and it seams my config file is totally ignored. I did not add the model flag so tabby is trying to download an empty model and keeps crashing. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the notes.
I tried the same and the debug logs show a recurring error because the model is empty.
I don't understand with your configuration how does tabby know that it should use the codellama model.
…On June 16, 2024 4:19:25 PM GMT+02:00, reddiedev ***@***.***> wrote:
I got it working by doing the following:
1. Pulling `codellama:13b`(one of the models supported in https://github.com/tabbyml/registry-tabby/blob/main/models.json) and removing **all** other models in my ollama instance.
2. Configuring `.tabby/config.toml` as shown above
```toml
[model.completion.http]
kind = "ollama/completion"
api_endpoint = "http://localhost:11434"
```
3. Make sure you have removed all previous tabby containers. I am honestly not sure if the following command is correct, but it worked when I was trying it
```
docker run -it --gpus all \
-p 8080:8080 -v $HOME/.tabby:/data \
tabbyml/tabby serve --device cuda
```
**notice I just removed the --model flag from the default run command*
4. There were no error logs when I checked `docker logs <container_name>`, and inference was working well in VSCode.
However, I eventually moved to the old system as I preferred running a different ollama model for chat and just stopping/restarting a tabby container when I need it (since my GPU can't run both at the same time)
I suggest you try looking at the docker container logs (if you haven't yet), or maybe you misconfigured the ip/port on the ollama api endpoint, or maybe you were using an unsupported model
--
Reply to this email directly or view it on GitHub:
#2374 (reply in thread)
You are receiving this because you commented.
Message ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
-
I was able to successfully connect to my locall ollama server, and completions in VSCode are working great
Current config in
.tabby/config.toml
just can't find any reference on how to set which ollama model will be used for completion. Is there a way to set it up and confirm which model is being used at any given time?
Beta Was this translation helpful? Give feedback.
All reactions