About use a local gguf file #1366

gilgamsh · 2024-02-03T05:45:59Z

gilgamsh
Feb 3, 2024

My server can't connect to an extranet, so I'm downloading gguf first and then transferring it to my server.

My folder looks like this:

coder
- ggml
- - deepseek-coder-33B-instruct.Q5_K_M.gguf
- tabby.json

In tabby.json. I just copy the content for deepseek-coder-6.7B.

My command to launch tabby is tabby serve --model /absolute/path/to/coder ....
I found it instreresting that tabby always try to find the q8_0.v2.gguf file under the /absolute/path/to/coder/ggml folder.

Question:
Is the only useful thing in tabby.json "prompt_template"/"chat_template" (if i already have the gguf file)?
Should I add the models.json (If I just want to use the local gguf)?

By the way, I can lauch tabby when i modify the gguf file name to q8_0.v2.gguf.
I am just worried about this, since my gguf is Q5_K_M instead of q8.

wsxiaoys · 2024-02-03T05:50:18Z

wsxiaoys
Feb 3, 2024
Maintainer

Tabby does accept a local directory as a parameter for --model (undocument, though), provided that the directory adheres to the specifications outlined here: https://github.com/TabbyML/tabby/blob/main/MODEL_SPEC.md.

The naming convention, such as q8_0.v2.gguf, is mostly a legacy issue. In fact, you can place a file named q5_k_m into the q8_0.v2.gguf directory, and Tabby will load it successfully.

1 reply

gilgamsh Feb 3, 2024
Author

In fact, you can place a file named q5_k_m into the q8_0.v2.gguf directory.
You mean change the code here ?

tabby/crates/tabby-common/src/registry.rs

Line 104 in c151725

pub static GGML_MODEL_RELATIVE_PATH: &str = "ggml/q8_0.v2.gguf";

By the way, just change the name of the gguf file q5_k_m.gguf to q8_0.v2.gguf and everything all right? (I am not sure whether the inference engine will treat the gguf as int8 quantization , according to its filename)

dagbdagb · 2024-04-06T17:12:25Z

dagbdagb
Apr 6, 2024

I see this discussion and would like to also ask if there also is a way to limit the context size, such that one can use a 6.7B q5 @6000 context? This will allow me to run a slightly more capable model on my 8GB GPU. Otherwise, support for partial offloading to GPU would also be useful, of course.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About use a local gguf file #1366

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

About use a local gguf file #1366

gilgamsh Feb 3, 2024

Replies: 2 comments · 1 reply

wsxiaoys Feb 3, 2024 Maintainer

gilgamsh Feb 3, 2024 Author

dagbdagb Apr 6, 2024

gilgamsh
Feb 3, 2024

Replies: 2 comments 1 reply

wsxiaoys
Feb 3, 2024
Maintainer

gilgamsh Feb 3, 2024
Author

dagbdagb
Apr 6, 2024