-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPT-4o/Vision models cannot use GPU due to CLIP changes #4815
Labels
Comments
Confirmed LocalAI 2.24.2 loads CLIP with CUDA:
Still don't get a response in the Edit: should have read the docs. Need to use a 500x500 image. That works, with GPU on 2.24.2. I'll try with the correct image size on 2.25.0 and see how the latency is with CLIP on CPU (if it works at all). |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
LocalAI version:
LocalAI version: v2.25.0 (07655c0c2e0e5fe2bca86339a12237b69d258636)
Environment, CPU architecture, OS, and Version:
Linux ai-server 5.10.102.1-dxgrknl #1 SMP Sat Apr 23 13:33:19 +07 2022 x86_64 x86_64 x86_64 GNU/Linux
It's a VM with 2x vCPU, GPU-np partitioning on an RTX 3090. (Somehow managed to get that working...)
Describe the bug
The latest version of CLIP has commented out support for GPU's. Thus, at least for me, the vision models all lose connection to the stream before the response can be completed (which is CPU intensive and takes a while).
To Reproduce
Pull the latest GPU docker image, open visual chat experience, and try to send any image to the
GPT-4o
model. It will hang, while things get processed by the CPU, and after 30-60s will lose connection to the stream. The response (if it ever is completed by CLIP on CPU) won't get displayed.I confirmed this works (mostly) fine when sending text only. (There's a separate issue that we should add stopwords to the default config for this model, which I'll open an issue for).
Expected behavior
It responds telling me I've sent a picture of a cute cat. But really, CLIP should use GPU.
Logs
E.g. of LLM on GPU but CLIP on CPU:
Additional context
There's an issue complaining about it here: ggml-org/llama.cpp#11322 (comment)
It looks like @ggernaov removed support here: ggml-org/llama.cpp#10896
He points to some issues, I guess where some models weren't properly working, e.g. here.
Apparently they are still working on vision, here's a discussion.
Since LocalAI pulls Llama.CPP as a git submodule, defaulting on
master
, it automatically picked up those changes from Dec 19th onwards.Thus, the newly built images, e.g. v2.25.0, do not actually support GPU for vision with LLama.cpp. Looks like 2.24.2 should be unaffected, I'll see if I can get it working there.
Workarounds (for LocalAI users):
A). Rebuild with CLIP GPU support
B) Downgrade to LocalAI v2.24.2 (which was released Dec 10th, 2024).
I have not tried either of this yet, but I will, and will update this thread. Downgrading is probably the easiest temporary solution.
Resolutions for LocalAI, in no specific order:
BTW -- thanks much for the work on this project! I was able to spin up LocalAI for my Home Assistant in a day!
The text was updated successfully, but these errors were encountered: