You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi everyone,
I'd like to test vLLM performance (T/S, TTFT) but I only have results for Ollama and their Q4 models.
In order to compare my results, I'd like to reuse the same models from Ollama repo (Llama3.1, Gemma2, Mistral) but I don't know how to dump them and/or make them compatible with vLLM.
Do you know a way of importing Ollama models into vLLM (reassigning blobs)?
If not, would an AWQ quantization method give me the same result (model) as the Ollama Q4 models ?
Other solution ?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi everyone,
I'd like to test vLLM performance (T/S, TTFT) but I only have results for Ollama and their Q4 models.
In order to compare my results, I'd like to reuse the same models from Ollama repo (Llama3.1, Gemma2, Mistral) but I don't know how to dump them and/or make them compatible with vLLM.
Do you know a way of importing Ollama models into vLLM (reassigning blobs)?
If not, would an AWQ quantization method give me the same result (model) as the Ollama Q4 models ?
Other solution ?
Regards
Beta Was this translation helpful? Give feedback.
All reactions