-
-
Notifications
You must be signed in to change notification settings - Fork 428
SD XL
You can simply download these two files from Huggingface and place them into your normal checkpoint directory, though we recommend a subfolder.
To facilitate easy use of SD-XL and swapping between refiners, backends, and pipelines, we recommend selecting the
following items in your Settings Tab, on the User Interface page:
Once you select them, hit Apply settings, and then Restart server.
When the server returns to being active and your browser page reloads, the Quicksettings
at the top of your screen should look like this (assuming you were using SDXL):
There are now 3 methods of memory optimization with the Diffusers backend, and consequently SDXL: Model Shuffle, Medvram, and Lowvram.
Choose one based on your GPU, VRAM, and how large you want your batches to be.
Note: VAE Tiling
can be enabled to save additional VRAM if necessary, but it is recommended to use VAE Slicing
if you do not have
abundant VRAM.
Enable attention slicing
should generally not be used, as the performance impact is significant. If you have it enabled, disable it.
"Model Shuffle" is a memory optimization feature that dynamically moves different parts of the model between the GPU and CPU to
efficiently utilize VRAM. This is enabled when the following 3 options are Enabled in the Diffusers settings page:
- Move the base model to CPU when using the refiner.
- Move the refiner model to CPU when not in use.
- Move the UNet to CPU during VAE decoding.
To use Model Shuffling
do not have --medvram
or --lowvram
active, then use the following settings:
The important parts are the 3 Move checkboxes.
Note that if you activate either CPU model offload
or Sequential CPU offload
, they will deactivate and ignore Model Shuffling.
VRAM Usage: "Model Shuffle" will work in 8 GB of VRAM.
If you have a GPU with 6GB VRAM or require larger batches of SD-XL images without VRAM constraints, you can use the --medvram
command line argument.
This option significantly reduces VRAM requirements at the expense of inference speed.
Cannot be used with --lowvram/Sequential CPU offloading
Note: Until some upstream fixes go in, this will not work with DML or MAC.
Alternatively, you can enable the Enable model CPU offload
checkbox in the Settings
tab on the Diffusers settings
page:
- Model CPU Offload (same as
--medvram
) - VAE slicing (recommended)
- Attention slicing is NOT recommended.
VRAM Usage: "Model CPU Offload" can work in 6 GB of VRAM.
Note: --medvram
supersedes the Model Shuffle
option (e.g., Move base model, refiner model, UNet), and is mutually exclusive
and cannot be used together with --lowvram/Sequential CPU offload
If your GPU has as low as 2GB of VRAM, start your SD.Next session with --lowvram
as a command line argument to vastly reduce
VRAM requirements at the cost of even more inference speed. This is essentially the Enable Sequential CPU offload
setting.
Note: VAE slicing, VAE tiling, and Attention slicing are all enabled by --lowvram
regardless of the checkboxes.
Using this setting with a GPU that has higher VRAM, your generations will take even longer, but you will be able to do ridiculously large
batches of SD-XL images, up to and including 24 on a 12GB GPU.
Note: Until some upstream fixes go in, this will not work with SDXL LoRA's and SD 1.5.
We look forward to seeing how large your batches can get, do let us know on the Discord server, and we HIGHLY RECOMMEND that
you continue down this guide and configure your SD.Next with the Fixed FP16 VAE!
It is currently recommended to use a Fixed FP16 VAE rather than the ones built into the SD-XL base and refiner for
significant reductions in VRAM (from 6GB of VRAM to <1GB VRAM) and a doubling of VAE processing speed.
Below are the instructions for installation and use:
- Download Fixed FP16 VAE to your VAE folder.
- In your
Settings
tab, go toDiffusers settings
and setVAE Upcasting
toFalse
and hit Apply. - Select the your VAE and simply
Reload Checkpoint
to reload the model or hitRestart server
.
You should be good to go, Enjoy the huge performance boost!
- To use SD-XL, first SD.Next needs to be in Diffusers mode, not Original, select it from the Backend radio buttons.
- Then select Stable Diffusion XL from the Pipeline dropdown.
- Next select the sd_xl_base_1.0.safetensors file from the Checkpoint dropdown.
- (optional) Finally select the sd_xl_refiner_1.0.safetensors file from the Refiner dropdown.
To use refiner, it first needs to be loaded and then it can be enabled using Second pass
option in the UI.
Note that use of refiner is not necessary as base model can produce very good results on its own.
Refiner can be used in two-modes: as in traditional workflow or with early handover from base to refiner.
In either case, refiner will use calculated number of steps based on secondary steps
, but not use those values as-is.
If denoise start
is set to 0
or 1
, then traditional workflow is used:
- Base model runs from
0
->100%
using number of steps specified - Refiner model runs from
0
->100%
using number of calculated steps
Number of steps is roughly based on number of steps specified insecondary steps
xdenoising strength
However, in this mode, refiner may not produce much better result and will likely only smoothen the image as base model already reached 100% and there is insufficient remaining noise for refiner to do anything else.
If refiner start
is set to any other value, then handover mode is used:
- Base model runs from
0%
->denoise_start%
and actual number of steps used will be lower than specified number of steps.
Exact number is calculated internally and roughly correlates todenoise start
xsteps
, but not exactly. - Refiner model runs from
denoise_start%
->100%
and actual number of steps used will be lower than specified number of secondary steps.
Exact number is calculated internally and roughly correlates todenoise start
xsecondary steps
, but not exactly
In this mode, using different number of steps for primary and secondary pass is allowed, but may result in unexpected results as base and refiner operations will not be perfectly aligned.
Note on steps vs timesteps. In all workflows (even with original backend and SD 1.5 models), steps do not refer directly do operations internally executed. Steps are used to calculate actual values at which operations will be executed. For example, steps=6 roughly means execute denoising at 0% -> 20% -> 40% -> 60% -> 80% -> 100%. For that reason, specifying steps above 99 is meaningless.
© SD.Next