You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On my aging 3400G, the whole desktop GUI (Linux, either X11+XFCE or Wayland+KDE) tends to freeze completely during llama.cpp/stable-diffusion.cpp more intensive GPU computations (on Vulkan). From low to high impact:
low-ish (<= 256-512, depending on the model) batch processing with no layer offloading runs fine, even hitting >95% GPU usage;
higher batch processing causes some GUI stuttering, and tends to slow down prompt processing speed;
full offloading makes the GUI unusable, freezing for 2-3 seconds at a time, mainly during prompt processing;
stable-diffusion.cpp inference also causes 2-3 second freezes;
stable-diffusion.cpp VAE phase completely freezes the interface during its whole run (20s-120s, depending on image size)
Also, these 'choking' events sometimes trigger driver bugs, causing full system lock-ups.
So, I'm looking for ways to throttle GPU usage during inference. What I tried so far:
operating system utilities: no luck on that front. There seems to be no GPU support in cgroups, and utilities that limit GPU usage seem to always focus on fps or vsync;
creating the queues with VK_QUEUE_GLOBAL_PRIORITY_LOW_EXT: doesn't seem to have any effect, but I'm not sure if I'm implementing it correctly (I have zero graphics programming experience). Something like this:
sprinkling ctx->device->device.waitIdle() + sleep before ggml_vk_build_graph calls: kind of works as a proof-of-concept thing, but of course is no real solution.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
On my aging 3400G, the whole desktop GUI (Linux, either X11+XFCE or Wayland+KDE) tends to freeze completely during llama.cpp/stable-diffusion.cpp more intensive GPU computations (on Vulkan). From low to high impact:
Also, these 'choking' events sometimes trigger driver bugs, causing full system lock-ups.
So, I'm looking for ways to throttle GPU usage during inference. What I tried so far:
ctx->device->device.waitIdle()
+ sleep beforeggml_vk_build_graph
calls: kind of works as a proof-of-concept thing, but of course is no real solution.Thoughts?
Beta Was this translation helpful? Give feedback.
All reactions