Skip to content

Control

Vladimir Mandic edited this page Dec 25, 2023 · 34 revisions

Control

Native control module for SD.Next for Diffusers backend
Can be used for generation Control as well as Image and Text workflows

Install

Final release

For final release no extra steps will be required

Pre-release

  • Make sure you're using latest SD.Next from DEV branch

    webui --upgrade git checkout dev

  • Make sure you enable extra logging as described below
  • Yes, you need to be in backend=diffusers
  • Any issues should be reported on SD.Next Discord server in a dedicated channel
    https://discord.com/channels/1101998836328697867/1186383781066719322
    Do not create GitHub issues for pre-release versions

Additional steps

  • ControlNet-XS support requires diffusers 0.25.dev from latest main branch

    venv\Scripts\activate (on Windows)
    source venv/bin/activate (on Linux)
    pip uninstall diffusers
    pip install git+https://github.com/huggingface/diffusers
    exit

    After changing diffusers version, you need to start SD.Next using webui --experimental flag
    or it would automatically install latest known supported version

  • DWPose: Requires OpenMMLab framework
    pip install openmim
    mim install mmengine mmcv mmpose mmdet

  • MediaPipe: Requires MediaPipe framework
    pip install mediapipe

Supported Control Models

  • lllyasviel ControlNet for SD 1.5 and SD-XL models
    Includes ControlNets as well as Reference-only mode and any compatible 3rd party models
    Original ControlNets for SD15 are 1.4GB each and for SDXL its at massive 4.9GB
  • VisLearn ControlNet XS for SD-XL models
    Lightweight ControlNet models for SDXL at 165MB only with near-identical results
  • TencentARC T2I-Adapter for SD 1.5 and SD-XL models
    T2I-Adapters provide similar functionality at much lower resource cost at only 300MB each
  • Kohya Control LLite for SD-XL models
    LLLite models for SDXL at 46MB only provide lightweight image control
  • TenecentAILab IP-Adapter for SD 1.5 and SD-XL models
    IP-Adapters provides great style transfer functionality at much lower resource cost at below 100MB for SD15 and 700MB for SDXL
    IP-Adapters can be combined with ControlNet for more stable results, especially when doing batch/video processing
  • CiaraRowles TemporalNet for SD 1.5 models
    ControlNet model designed to enhance temporal consistency and reduce flickering for batch/video processing

All built-in models are downloaded upon first use and stored stored in /models/controlnet, /models/adapter, /models/xs, /models/lite, /models/processor

Listed below are all models that are supported out-of-the-box:

ControlNet

  • SD15:
    Canny, Depth, IP2P, LineArt, LineArt Anime, MLDS, NormalBae, OpenPose,
    Scribble, Segment, Shuffle, SoftEdge, TemporalNet, HED, Tile
  • SDXL:
    Canny Small XL, Canny Mid XL, Canny XL, Depth Zoe XL, Depth Mid XL

Note: only models compatible with currently loaded base model are listed
Additional ControlNet models in safetensors can be downloaded manually and placed into corresponding folder: /models/controlnet

ControlNet XS

  • SDXL:
    Canny, Depth

ControlNet LLLite

  • SDXL:
    Canny, Canny anime, Depth anime, Blur anime, Pose anime, Replicate anime

Note: control-lllite is implemented using unofficial implementation and its considered experimental
Additional ControlNet models in safetensors can be downloaded manually and placed into corresponding folder: /models/lite

T2I-Adapter

'Segment': 'TencentARC/t2iadapter_seg_sd14v1',
'Zoe Depth': 'TencentARC/t2iadapter_zoedepth_sd15v1',
'OpenPose': 'TencentARC/t2iadapter_openpose_sd14v1',
'KeyPose': 'TencentARC/t2iadapter_keypose_sd14v1',
'Color': 'TencentARC/t2iadapter_color_sd14v1',
'Depth v1': 'TencentARC/t2iadapter_depth_sd14v1',
'Depth v2': 'TencentARC/t2iadapter_depth_sd15v2',
'Canny v1': 'TencentARC/t2iadapter_canny_sd14v1',
'Canny v2': 'TencentARC/t2iadapter_canny_sd15v2',
'Sketch v1': 'TencentARC/t2iadapter_sketch_sd14v1',
'Sketch v2': 'TencentARC/t2iadapter_sketch_sd15v2',
  • SD15:
    Segment, Zoe Depth, OpenPose, KeyPose, Color, Depth v1, Depth v2, Canny v1, Canny v2, Sketch v1, Sketch v2
  • SDXL:
    Canny XL, Depth Zoe XL, Depth Midas XL, LineArt XL, OpenPose XL, Sketch XL

Note: Only models compatible with currently loaded base model are listed

Processors

  • Pose style: OpenPose, DWPose, MediaPipe Face
  • Outline style: Canny, Edge, LineArt Realistic, LineArt Anime, HED, PidiNet
  • Depth style: Midas Depth Hybrid, Zoe Depth, Leres Depth, Normal Bae
  • Segmentation style: SegmentAnything
  • Other: MLSD, Shuffle

Note: Processor sizes can vary from none for built-in ones to anywhere between 200MB up to 4.2GB for ZoeDepth-Large

Reference

Reference mode is its own pipeline, so it cannot have multiple units or processors

Workflows

Inputs & Outputs

  • Image -> Image
  • Batch: list of images -> Gallery and/or Video
  • Folder: folder with images -> Gallery and/or Video
  • Video -> Gallery and/or Video

Notes:

  • Input/Output/Preview panels can be minimized by clicking on them
  • For video output, make sure to set video options

Unit

  • Unit is: input plus process plus control
  • Pipeline consists of any number of configured units
    If unit is using using control modules, all control modules inside pipeline must be of same type
    e.g. ControlNet, ControlNet-XS, T2I-Adapter or Reference
  • Each unit can use primary input or its own override input
  • Each unit can have no processor in which case it will run control on input directly
    Use when you're using predefined input templates
  • Unit can have no control in which case it will run processor only
  • Any combination of input, processor and control is possible
    For example, two enabled units with process only will produce compound processed image but without control

What-if?

  • If no input is provided then pipeline will run in txt2img mode
    Can be freely used instead of standard txt2img
  • If none of units have control or adapter, pipeline will run in img2img mode using input image
    Can be freely used instead of standard img2img
  • If you have processor enabled, but no controlnet or adapter loaded,
    pipeline will run in img2img mode using processed input
  • If you have multiple processors enabled, but no controlnet or adapter loaded,
    pipeline will run in img2img mode on blended processed image
  • Output resolution is by default set to input resolution,
    Use resize settings to force any resolution
  • Resize operation can run before (on input image) or after processing (on output image)
  • Using video input will run pipeline on each frame unless skip frames is set
    Video output is standard list of images (gallery) and can be optionally encoded into a video file
    Video file can be interpolated using RIFE for smoother playback

Overrides

  • Control can be based on main input or each individual unit can have its own override input
  • By default, control runs in default control+txt2img mode
  • If init image is provided, it runs in control+img2img mode
    Init image can be same as control image or separate
  • IP adapter can be applied to any workflow
  • IP adapter can use same input as control input or separate

Logging

To enable extra logging for troubleshooting purposes,
set environment variables before running SD.Next

  • Linux:

    export SD_CONTROL_DEBUG=true
    export SD_PROCESS_DEBUG=true
    ./webui.sh --debug

  • Windows:

    set SD_CONTROL_DEBUG=true
    set SD_PROCESS_DEBUG=true
    webui.bat --debug

Note: Starting with debug info enabled also enables Test mode in Control module

Requirements

Control itself does not have any additional requirements and any used models are downloaded automatically
However, some processors require additional packages to be installed

Note: Its recommended to activate venv before installing requirements

  • DWPose: Requires OpenMMLab framework
    pip install openmim
    mim install mmengine mmcv mmpose mmdet
  • MediaPipe: Requires MediaPipe framework
    pip install mediapipe

Limitations / TODO

Todo

  • Bind restore button and override controls
  • API is missing
  • Some metadata is not included in output images (key metadata is included)

Future

Clone this wiki locally