LLaVA with GGUF files (and --mmproj argument)? #8341

benmayersohn · 2024-09-10T17:43:04Z

benmayersohn
Sep 10, 2024

Hi everyone! I'm coming from llama.cpp which currently doesn't support serving multi-modal models (it was removed and hopefully it will return soon).

I'm used to the GGUF format, which is only recently supported by vLLM. In llama.cpp you would do something like this:

./llama-server -t ${N_THREADS} -m ${LLAVA_MODEL} -c ${N_CTX} --host ${HOST} \
  --port ${PORT} -ngl ${N_GPU_LAYERS} --mmproj ${MMPROJ} --temp 0.1 \
  -p $'### User: What do you see?\n### Assistant: '

where

${MMPROJ} is a multi-modal projector that aligns the vision and text data (per this overview)
${LLAVA_MODEL} is the Llava model itself

and both are two distinct GGUF files. Does anyone know what the equivalent command/arguments be when serving via python -m vllm.entrypoints.openai.api_server?

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLaVA with GGUF files (and --mmproj argument)? #8341

{{title}}

Replies: 0 comments

Select a reply

LLaVA with GGUF files (and --mmproj argument)? #8341

benmayersohn Sep 10, 2024

Replies: 0 comments

benmayersohn
Sep 10, 2024