A rusty robot holding a fire torch in its hand, generated by Stable Diffusion XL using Rust and candle.
The stable-diffusion
example is a conversion of
diffusers-rs using candle
rather than libtorch. This implementation supports Stable Diffusion v1.5, v2.1,
as well as Stable Diffusion XL 1.0.
The weights are automatically downloaded for you from the HuggingFace
Hub on the first run. There are various command line
flags to use local files instead, run with --help
to learn about them.
cargo run --example stable-diffusion --release --features=cuda,cudnn \
-- --prompt "a cosmonaut on a horse (hd, realistic, high-def)"
The final image is named sd_final.png
by default.
The default scheduler is the Denoising Diffusion Implicit Model scheduler (DDIM). The
original paper and some code can be found in the associated repo.
--prompt
: the prompt to be used to generate the image.--uncond-prompt
: the optional unconditional prompt.--sd-version
: the Stable Diffusion version to use, can bev1-5
,v2-1
, orxl
.--cpu
: use the cpu rather than the gpu (much slower).--height
,--width
: set the height and width for the generated image.--n-steps
: the number of steps to be used in the diffusion process.--num-samples
: the number of samples to generate.--final-image
: the filename for the generated image(s).
Using flash attention makes image generation a lot faster and uses less memory.
The downside is some long compilation time. You can set the
CANDLE_FLASH_ATTN_BUILD_DIR
environment variable to something like
/home/user/.candle
to ensures that the compilation artifacts are properly
cached.
Enabling flash-attention requires both a feature flag, --feature flash-attn
and using the command line flag --use-flash-attn
.
Note that flash-attention-v2 is only compatible with Ampere, Ada, or Hopper GPUs (e.g., A100/H100, RTX 3090/4090).
...
This requires a GPU with more than 8GB of memory, as a fallback the CPU version can be used
with the --cpu
flag but is much slower.
Alternatively, reducing the height and width with the --height
and --width
flag is likely to reduce memory usage significantly.