Replies: 2 comments 3 replies
-
I assume you have a 16xx series card? The green square issue is well known for those when running in half precision. Running in full precision does increase VRAM usage and Stable-Diffusion is VRAM heavy to begin with, so it's not surprising you are running into VRAM issues. There are two optimized modes available in this repo that should help you, though it's worth nothing that the repo you link to has a pretty recent optimization PR that has not been merged here yet. And there is a second optimization which has been merged into neither repository but might have an even bigger effect. Which is to say that things will only get better for you as these optimizations are merged. To enable the optimization mode open the "relauncher.py" file found under the script folder and change either optimized or optimized-turbo from False to True, those values are case sensitive, and you shouldn't enable both at once. The optimized mode uses the least VRAM but sacrifices a lot of speed for it. The optimized-turbo mode uses more VRAM but has a pretty minor speed penalty in comparison. You will likely have to use the plain optimized mode for now. But once those PRs I mentioned above are merged you should be able to switch over to optimized-turbo. You might even be able to run without either of the modes active. |
Beta Was this translation helpful? Give feedback.
-
I had the same problem. I was able to partially solve it - still not completely sure why it works, but if you move the model to cpu ("model.cpu") after creating the sampler but before calling "process_images" function, it will not throw OOM. Note that you need to move it back afterwards (model.cuda). This does not work in turbo mode (I'm getting errors for inconsistent devices, trying to figure out why), but it does allow me to run on 4 GB device (of the 16xx series, in full precision optimized mode) After merging neonsecret's fork (#262) I'm able to generate batch size =3 of 512*512 images on my poor 4 GB memory, which is really cool. |
Beta Was this translation helpful? Give feedback.
-
I have a 4GB VRAM Nvidia card, and I am running this with the "optimized=True" option.
When running without the --precision full --no-half arguments, there aren't any memory problems, but the resulting image is a green square. Adding the arguments "--precision full --no-half" to additional_arguments = "" in relauncher.py causes the CUDA memory error to immediately appear when trying to generate an image from a prompt. The error appears even when trying to generate a 64x64 image.
Any ideas how to solve it? I would really like to use this branch, and explore the integration with the upscaling tools
Interestingly, with the Optimized Stable Diffusion repo, https://github.com/basujindal/stable-diffusion, I can generate images using the optimized script without any memory problems, with the "--precision full" argument. (seems there isn't a "--no-half" argument in that version).
Beta Was this translation helpful? Give feedback.
All reactions