vLLM benchmark - Quantized models #10326

plp38 · 2024-11-14T13:04:10Z

plp38
Nov 14, 2024

Hi everyone,
I'd like to test vLLM performance (T/S, TTFT) but I only have results for Ollama and their Q4 models.
In order to compare my results, I'd like to reuse the same models from Ollama repo (Llama3.1, Gemma2, Mistral) but I don't know how to dump them and/or make them compatible with vLLM.

Do you know a way of importing Ollama models into vLLM (reassigning blobs)?
If not, would an AWQ quantization method give me the same result (model) as the Ollama Q4 models ?
Other solution ?

Regards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM benchmark - Quantized models #10326

{{title}}

Replies: 0 comments

Select a reply

vLLM benchmark - Quantized models #10326

plp38 Nov 14, 2024

Replies: 0 comments

plp38
Nov 14, 2024