-
-
Notifications
You must be signed in to change notification settings - Fork 428
DirectML
SD.Next includes support for PyTorch-DirectML.
Add --use-directml
on commandline arguments.
For details, go to Installation.
The performance is quite bad compared to ROCm.
If you are familiar with Linux system, we recommend ROCm.
PyTorch-DirectML does not access graphics memory by indexing. Because PyTorch-DirectML's tensor implementation extends OpaqueTensorImpl, we cannot access the actual storage of a tensor.
Olive is an easy-to-use hardware-aware model optimization tool that composes industry-leading techniques across model compression, optimization, and compilation. (from pypi)
Currently, SDXL is not supported.
This feature is EXPERIMENTAL. If you run this, your existing installation may be broken. Run it in a new installation or in a new virtual environment.
You should switch branch to olive
.
You don't need to modify your commandline arguments.
Go to System
tab → Diffusers Settings
and set Diffusers pipeline
to ONNX Stable Diffusion (Olive)
.
Guide on YouTube:
Model optimization occurs automatically before generation.
Target models can be .safetensors, .ckpt, Diffusers and the optimization takes 5-10 minutes depending on your system.
The optimized models are automatically cached and used later to create images of the same size (height and width).
If your system memory is not enough to optimize model or you don't want to waste your time to optimize the model yourself, you can download optimized model from Huggingface.
Go to Models
→ Huggingface
tab and download optimized model.
There's an optimized version of runwayml/stable-diffusion-v1-5
.
Guide on YouTube:
Property | Value |
---|---|
Prompt | a castle, best quality |
Negative Prompt | worst quality |
Sampler | Euler |
Sampling Steps | 20 |
Device | RX 7900 XTX 24GB |
Version | olive-ai(0.3.3) onnxruntime-directml(1.16.1) ROCm(5.6) torch(olive: 1.13.1, rocm: 2.1.0) |
Model | runwayml/stable-diffusion-v1-5 (ROCm), lshqqytiger/stable-diffusion-v1-5-olive (Olive) |
Precision | fp16 |
Token Merging | Olive(0, not supported) ROCm(0.5) |
Olive | ROCm |
---|---|
- The generation is faster than PyTorch-DirectML.
- Uses less graphics memory than PyTorch-DirectML.
- Uses graphics memory more efficiently than PyTorch-DirectML.
- Optimization is required for every models and image sizes.
- Some features are unavailable.
Run this command and try again:
(venv) $ pip uninstall onnxruntime onnxruntime-directml -y
© SD.Next