ollama does not use GPU #111

Blake110 · 2025-01-15T01:02:34Z

try to run web-ui with remote ollama server with qwen2.5 model, it can create a connection with server, but only running model in cpu mode.

test with open-webui with remote ollama server, it is running good with gpu.
running model in remote server in cli wit gpu without any issuses.

repull the model, restart ollama server, restart remote server, but still same result.

the browser-web-ui working good with groq.

MeshkatShB · 2025-01-15T12:54:06Z

try to run web-ui with remote ollama server with qwen2.5 model, it can create a connection with server, but only running model in cpu mode.

test with open-webui with remote ollama server, it is running good with gpu. running model in remote server in cli wit gpu without any issuses.

repull the model, restart ollama server, restart remote server, but still same result.

the browser-web-ui working good with groq.

Hi there. When you want to use ollama, you need to make sure that you have enough vRAM for that purpose. The model itself (e.g. qwen2.5) may be 4.7GB but when used, it requires more vRAM.

How much vRAM do you have available?

I have 8GB NVIDIA 3070 Ti GPU and yet the qwen2:1.5b is loaded but due to its size, the model is far far slower. So, there is a tradeoff between your GPU vRAM and the functionality of the model.

coolrazor007 · 2025-01-18T20:56:28Z

it is hard to tell, but I'm seeing similar slowness when using Ollama as well. I use Ollama all of the time for LLM prompting and it is quite quick (read takes less than 30sec for a response always) on my RTX3090 when using llama3.1:8b or even other models once loaded. Anyway, no matter what model I use with web-ui here it takes upwards of 30 MINUTES to do a task and often never finishes anyway. Is this normal behavior? How many LLM calls is it making?

From I can tell though, Ollama is using the GPU which makes sense because an external Ollama isn't controlled by web-ui. Web-ui is just making API calls to it. Not to hijack this thread but I get the feeling we are experiencing the same thing.

MeshkatShB · 2025-01-19T12:48:58Z

it is hard to tell, but I'm seeing similar slowness when using Ollama as well. I use Ollama all of the time for LLM prompting and it is quite quick (read takes less than 30sec for a response always) on my RTX3090 when using llama3.1:8b or even other models once loaded. Anyway, no matter what model I use with web-ui here it takes upwards of 30 MINUTES to do a task and often never finishes anyway. Is this normal behavior? How many LLM calls is it making?

From I can tell though, Ollama is using the GPU which makes sense because an external Ollama isn't controlled by web-ui. Web-ui is just making API calls to it. Not to hijack this thread but I get the feeling we are experiencing the same thing.

I tried the RTX3090 Desktop (24GB vRAM, 32GB RAM) and it ran smoothly (without Docker). It was using around ~70% GPU and 30% CPU (logged with Ollama ps)

coolrazor007 · 2025-01-19T17:32:23Z

it is hard to tell, but I'm seeing similar slowness when using Ollama as well. I use Ollama all of the time for LLM prompting and it is quite quick (read takes less than 30sec for a response always) on my RTX3090 when using llama3.1:8b or even other models once loaded. Anyway, no matter what model I use with web-ui here it takes upwards of 30 MINUTES to do a task and often never finishes anyway. Is this normal behavior? How many LLM calls is it making?

From I can tell though, Ollama is using the GPU which makes sense because an external Ollama isn't controlled by web-ui. Web-ui is just making API calls to it. Not to hijack this thread but I get the feeling we are experiencing the same thing.

I tried the RTX3090 Desktop (24GB vRAM, 32GB RAM) and it ran smoothly (without Docker). It was using around ~70% GPU and 30% CPU (logged with Ollama ps)

I'm running Ollama and Webui in Docker so that's a difference. Dumb question but Webui doesn't need a GPU itself right? It just calls the remote API for everything LLM related?

MeshkatShB · 2025-01-20T10:48:41Z

I'm running Ollama and Webui in Docker so that's a difference. Dumb question but Webui doesn't need a GPU itself right? It just calls the remote API for everything LLM related?

Yeah, it's just an API call but for Ollama, it uses your Ollama serving address at localhost:11434 or remote server if you use the base_url parameter. There are some modifications that you can make to force Ollama use your own GPU as below:

You can set the num_gpu=1 to explicitly tell the model that you want to use GPU,
Set keep_alive="10m" to let the model stay in your memory for 10 minutes,
and also
Set num_thread=0 to force the model to suppress the CPU load.

Below is the code snippet to do so:

    elif provider == "ollama":
        return ChatOllama(
            model=kwargs.get("model_name", "qwen2.5:7b"),
            temperature=kwargs.get("temperature", 0.0),
            num_ctx=128000,
            base_url=kwargs.get("base_url", "http://localhost:11434"),
            num_thread=0,
            num_gpu=1,
            keep_alive="10m",
        )

You can add the mentioned code to this return call in the source code.

EvilFreelancer · 2025-01-24T18:20:35Z

Reason is here: https://github.com/langchain-ai/langchain/blob/master/libs/community/langchain_community/llms/ollama.py#L160-L173

Seems like a bug in OllamaChat in langchain, too many parameters passed to ollama API.

I've commented all of them (except temperature) and now it seems uses my GPU.

Batman313v mentioned this issue Feb 8, 2025

Local deepseek deepseek-r1:14b, working but took very long time #195

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ollama does not use GPU #111

ollama does not use GPU #111

Blake110 commented Jan 15, 2025 •

edited

Loading

MeshkatShB commented Jan 15, 2025

coolrazor007 commented Jan 18, 2025

MeshkatShB commented Jan 19, 2025

coolrazor007 commented Jan 19, 2025

MeshkatShB commented Jan 20, 2025 •

edited

Loading

EvilFreelancer commented Jan 24, 2025

ollama does not use GPU #111

ollama does not use GPU #111

Comments

Blake110 commented Jan 15, 2025 • edited Loading

MeshkatShB commented Jan 15, 2025

coolrazor007 commented Jan 18, 2025

MeshkatShB commented Jan 19, 2025

coolrazor007 commented Jan 19, 2025

MeshkatShB commented Jan 20, 2025 • edited Loading

EvilFreelancer commented Jan 24, 2025

Blake110 commented Jan 15, 2025 •

edited

Loading

MeshkatShB commented Jan 20, 2025 •

edited

Loading