This extension aim for integrating AnimateDiff into AUTOMATIC1111 Stable Diffusion WebUI. You can generate GIFs in exactly the same way as generating images after enabling this extension.
This extension implements AnimateDiff in a different way. It does not require you to clone the whole SD1.5 repository. It also applied (probably) the least modification to ldm
, so that you do not need to reload your model weights if you don't want to.
Batch size on WebUI will be replaced by GIF frame number internally: 1 full GIF generated in 1 batch. If you want to generate multiple GIF at once, please change batch number.
Batch number is NOT the same as batch size. In A1111 WebUI, batch number is above batch size. Batch number means the number of sequential steps, but batch size means the number of parallel steps. You do not have to worry too much when you increase batch number, but you do need to worry about your VRAM when you increase your batch size (where in this extension, video frame number). You do not need to change batch size at all when you are using this extension.
You might also be interested in another extension I created: Segment Anything for Stable Diffusion WebUI.
- Install this extension via link.
- Download motion modules and put the model weights under
stable-diffusion-webui/extensions/sd-webui-animatediff/model/
. If you want to use another directory to save the model weights, please go toSettings/AnimateDiff
. See model zoo for a list of available motion modules. - Enable
Pad prompt/negative prompt to be same length
andBatch cond/uncond
and clickApply settings
inSettings
. You must do this to prevent generating two separate unrelated GIFs.
- Go to txt2img if you want to try txt2gif and img2img if you want to try img2gif.
- Choose an SD1.5 checkpoint, write prompts, set configurations such as image width/height. If you want to generate multiple GIFs at once, please change batch number, instead of batch size.
- Enable AnimateDiff extension, and set up each parameter, and click
Generate
.- Number of frames — The model is trained with 16 frames, so it’ll give the best results when the number of frames is set to
16
. Choose [1, 24] for V1 motion modules and [1, 32] for V2 motion modules. - Frames per second — How many frames (images) are shown every second. If 16 frames are generated at 8 frames per second, your GIF’s duration is 2 seconds.
- Display loop number — How many times the GIF is played. A value of
0
means the GIF never stops playing. - Save — Format of the output. Choose at least one of "GIF"|"MP4"|"PNG". Check "TXT" if you want infotext, which will live in the same directory as the output GIF.
- Reverse — Append reversed frames to your output.
- Number of frames — The model is trained with 16 frames, so it’ll give the best results when the number of frames is set to
- You should see the output GIF on the output gallery. You can access GIF output at
stable-diffusion-webui/outputs/{txt2img or img2img}-images/AnimateDiff
. You can also access image frames atstable-diffusion-webui/outputs/{txt2img or img2img}-images/{date}
.
You need to go to img2img and submit an init frame via A1111 panel. You can optionally submit a last frame via extension panel (experiment feature, not tested, not sure if it will work).
By default: your init_latent
will be changed to
init_alpha = (1 - frame_number ^ latent_power / latent_scale)
init_latent = init_latent * init_alpha + random_tensor * (1 - init_alpha)
If you upload a last frame: your init_latent
will be changed in a similar way. Read this code to understand how it works.
Just like how you use ControlNet. Here is a sample. You will get a list of generated frames. You will have to view GIF in your file system, as mentioned at #WebUI item 4.
'alwayson_scripts': {
'AnimateDiff': {
'args': [{
'enable': True, # enable AnimateDiff
'video_length': 16, # video frame number, 0-24 for v1 and 0-32 for v2
'format': 'MP4', # 'GIF' | 'MP4' | 'PNG' | 'TXT'
'loop_number': 0, # 0 = infinite loop
'fps': 8, # frames per second
'model': 'mm_sd_v15_v2.ckpt', # motion module name
'reverse': [], # 0 | 1 | 2 - 0: Add Reverse Frame, 1: Remove head, 2: Remove tail
# parameters below are for img2gif only.
'latent_power': 1,
'latent_scale': 32,
'last_frame': None,
'latent_power_last': 1,
'latent_scale_last': 32
}
]
}
},
mm_sd_v14.ckpt
&mm_sd_v15.ckpt
&mm_sd_v15_v2.ckpt
by @guoyww: Google Drive | HuggingFace | CivitAI | Baidu NetDiskmm-Stabilized_high.pth
&mm-Stabbilized_mid.pth
by @manshoety: HuggingFacetemporaldiff-v1-animatediff.ckpt
by @CiaraRowles: HuggingFace
2023/07/20
v1.1.0: fix gif duration, add loop number, remove auto-download, remove xformers, remove instructions on gradio UI, refactor README, add sponsor QR code.2023/07/24
v1.2.0: fix incorrect insertion of motion modules, add option to change path to save motion modules in Settings/AnimateDiff, fix loading different motion modules.2023/09/04
v1.3.0: support any community models with the same architecture; fix grey problem via #63 (credit to @TDS4874 and @opparco)2023/09/11
v1.4.0: support official v2 motion module (different architecture: GroupNorm not hacked, UNet middle layer has motion module).2023/09/14
: v1.4.1: always changebeta
,alpha_comprod
andalpha_comprod_prev
to resolve grey problem in other samplers.2023/09/16
: v1.5.0: randomize init latent to support better img2gif, credit to this forked repo; add other output formats and infotext output, credit to @zappityzap; add appending reversed frames; refactor code to ease maintaining.2023/09/19
: v1.5.1: support xformers, sdp, sub-quadratic attention optimization - VRAM usage decrease to 5.60GB with default setting. See FAQ 1st item for more information.2023/09/22
: v1.5.2: option to disable xformers atSettings/AnimateDiff
due to a bug in xformers, API support, option to enable GIF paletter optimization atSettings/AnimateDiff
(credit to @rkfg), gifsicle optimization move toSettings/AnimateDiff
.2023/09/25
: v1.6.0: motion lora supported. Download and use them like any other LoRA you use (example: download motion lora tostable-diffusion-webui/models/Lora
and add<lora:v2_lora_PanDown:0.8>
to your positive prompt).
Infinite V2V, ControlNet and Prompt Travel are currently work in progress inside [#121]. Stay tuned and they should be released within a week.
-
Q: How much VRAM do I need?
A: Actual VRAM usage depends on your image size and video frame number. You can try to reduce image size or video frame number to reduce VRAM usage. I list some data tested on Ubuntu 22.04, NVIDIA 4090, torch 2.0.1+cu117, H=W=512, frame=16 (default setting):
Optimization VRAM usage No optimization 12.13GB xformers/sdp 5.60GB sub-quadratic 10.39GB -
Q: Can I use SDXL to generate GIFs?
A: You will have to wait for someone to train SDXL-specific motion modules which will have a different model architecture. This extension essentially inject multiple motion modules into SD1.5 UNet. It does not work for other variations of SD, such as SD2.1 and SDXL.
-
Q: Can I use this extension to do GIF2GIF? Can I apply ControlNet to this extension? Can I override the limitation of 24/32 frames per generation?
A: Not at this time, but will be supported via supporting AnimateDIFF CLI Prompt Travel in the near future. This is a huge amount of work and life is busy, so expect to wait for a long time before updating.
AnimateDiff | Extension v1.2.0 | Extension v1.3.0 | img2img |
---|---|---|---|
Note that I did not modify random tensor generation when producing v1.3.0 samples.
No LoRA | PanDown | PanLeft |
---|---|---|
You can sponsor me via WeChat, AliPay or Paypal.
AliPay | Paypal | |
---|---|---|