Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for SD ecosystem feature parity #69

Open
14 of 56 tasks
Keavon opened this issue Jun 5, 2023 · 6 comments
Open
14 of 56 tasks

Tracking issue for SD ecosystem feature parity #69

Keavon opened this issue Jun 5, 2023 · 6 comments

Comments

@Keavon
Copy link

Keavon commented Jun 5, 2023

The intention for this issue is to provide a comprehensive outline of all the core features and capabilities other distributions of Stable Diffusion (primarily A1111) provide. It's a big list, but not all are nearly as high priority as others. Some items in this outline will be turned into GitHub issues for discussing and tracking progress on implementation. Please comment on this issue to suggest additions, clarifications, and sub-features and I'll aim to keep the outline up to date.

Generation methods

  • Txt2img
  • Img2img
    • In/outpainting
      • Choice of starting with existing image, smeared surrounding colors, latent noise, and latent nothing
    • Denoising strength (this is already implemented?)
  • Depth2Img (via txt2img and img2img)
  • Regional prompts/latent couple/two shot diffusion (a unique prompt per grid area, like the left half and right half of the image)

Generation parameters

  • Viewing the image generation progress as it runs (this is very high priority for Graphite)
  • Negative prompts
  • CFG scale (is this already implemented?)
  • Non-square multiple-of-64 resolutions
    • Widths and heights as multiples of 8 instead of 64
  • Infinite prompt token length
  • Multiple prompts (like space ship (sci-fi) vs. space AND ship (sailing ship in space))
  • Prompt token weighting (like (beautiful:1.5) tree (with autumn leaves:0.8))
  • Seed resize (pin a seed and its resolution, then generate at a different resolution or aspect ratio and keep mostly the same image)

Model support

  • Stable Diffusion model formats
    • SD 1.4 (is this already implemented?)
    • SD 1.5
    • SD 2.0 (is this already implemented?)
    • SD 2.1
    • SDXL
  • Inpainting-specific models
    • "Inpainting conditioning mask strength" parameter
  • Instruct-Pix2Pix
  • Custom checkpoints/models

Stylization

  • LoRA
  • Hypernetworks
  • Textual Inversion
  • Dreambooth
  • Swappable VAEs (is this already implemented?)

ControlNet

Some features are described at https://github.com/Mikubill/sd-webui-controlnet but I don't currently have time to make a list of them. Help with such a list would be appreciated.

Optimizations

VRAM reduction strategies, things like xformers and floating point precision? I don't understand this stuff enough to really get it. Also other methods will remove certain parts of the pipeline from VRAM after that stage has been completed which trades time for VRAM requirements. I'll need help creating a list of out this.

Upscaling

Some upscalers are entirely separate models and are thus likely out of scope. Other upscalers, I think, are part of the SD pipeline. Some are scripts, but I think others are actual models which require being implemented in the actual pipeline? Those ones should probably be included here, but I need help creating a list.

Sampling methods

  • Euler a
  • Euler
  • LMS
  • Heun
  • DPM2
  • DPM2 a
  • DPM++ 2S a
  • DPM++ 2M
  • DPM++ SDE
  • DPM fast
  • DPM adaptive
  • LMS Karras
  • DPM2 Karras
  • DPM2 a Karras
  • DPM++ 2S a Karras
  • DPM++ 2M Karras
  • DPM++ SDE Karras
  • DDIM
  • PLMS
  • UniPC

Other models

  • Upscaling (ESRGAN, etc.)
  • CLIP interrogator
  • Restore faces (GFPGAN, CodeFormer)
  • (probably more?)

Did I miss something? Probably! Hopefully the community can help me keep this list updated so it's as comprehensive as possible. Thanks ❤️.

Ideally these capabilities would be modular, allowing for composability and opting in and out of specific features at will for any desired image generation pipeline. In our use case with Graphite, we want to put different options into nodes within a node graph so they are user-configurable. (I should also mention that keeping the MIT/Apache 2.0 license is important for Graphite, since our project is also Apache 2.0, so I'd humbly request that some care be taken to not copy from copyleft code which would force this library to change its license, thanks 😃).

@Njasa2k
Copy link

Njasa2k commented Jun 10, 2023

Some features are described at https://github.com/Mikubill/sd-webui-controlnet but I don't currently have time to make a list of them. Help with such a list would be appreciated.

Control Type

  • Canny
  • Depth
  • Normal
  • OpenPose
  • MLSD
  • Lineart
  • SoftEdge
  • Scribble
  • Segmentation
  • Shuffle
  • Tile
  • Inpaint
  • IP2P
  • Reference
  • T2IA

Preprocessors

  • invert (from white bg & black line)

  • canny

  • mediapipe_face

  • mlsd

  • shuffle

  • threshold

  • normal

    • normal_bae
    • normal_midas
  • depth

    • depth_leres
    • depth_leres++
    • depth_midas
    • depth_zoe
  • inpaint

    • inpaint_global_harmonious
    • inpaint_only
    • inpaint_only+lama
  • lineart

    • lineart_anime
    • lineart_anime_denoise
    • lineart_coarse
    • lineart_realistic
    • lineart_standard (from white bg & black line)
  • openpose

    • openpose_face
    • openpose_faceonly
    • openpose_full
    • openpose_hand
  • reference

    • reference_adain
    • reference_adain+attn
    • reference_only
  • scribble

    • scribble_hed
    • scribble_pidinet
    • scribble_xdog
  • seg

    • seg_ofade20k
    • seg_ofcoco
    • seg_ufade20k
  • softedge

    • softedge_hed
    • softedge_hedsafe
    • softedge_pidinet
    • softedge_pidisafe
  • t2ia

    • t2ia_color_grid
    • t2ia_sketch_pidi
    • t2ia_style_clipvision
  • tile

    • tile_colorfix
    • tile_colorfix+sharp
    • tile_resample

@katopz
Copy link

katopz commented Jun 16, 2023

Nice! fyi ControlNet canny is supported here.

@LaurentMazare
Copy link
Owner

[ ] Viewing the image generation progress as it runs (this is very high priority for Graphite)

@Keavon do you mean that you would want the intermediary images to be available or something else? For the intermediary images, this should already be doable (and available in the command line examples), see for example this snippet.

@Keavon
Copy link
Author

Keavon commented Jun 17, 2023

Yes, viewing the intermediary images while waiting for the final image to be completed. Good to know that's already supported, thanks! Feel free to check off any others that are in my list and already supported, too. Thank you!

@LaurentMazare
Copy link
Owner

Just to mention that I didn't have the time to do much on all these features, one thing though that I've been working on is a new ML framework written in rust called candle.
As an example this includes stable diffusion 1.5 and 2.1 but only text2img at the moment, if there is some interest I can add more there and make a full crate out of this example. The main upside compared to this crate is that there is no dependency on libtorch anymore so deployment is a lot faster, it could run on wasm etc (main downside is that it might not be as optimized as the libtorch version yet but we're working on it).

@Keavon
Copy link
Author

Keavon commented Aug 28, 2023

I was looking at both Candle as well as Burn (which has recently had @Gadersd port both SD 1.4 in Burn and SDXL in Burn) and it definitely looks like one of those frameworks is the path forward in the Rust ecosystem (although I'm curious what their differences are).

I'd really love to help organize interested contributors into a team building a robust, production-ready, pure-Rust Stable Diffusion distro that aims to be as fully-featured as AUTOMATIC1111. I wonder if you have any thoughts or suggestions about that, @LaurentMazare. Likewise @Gadersd was interested in the idea but I should reach out again about next steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants