Skip to content

Official implementation of "Exploiting the Signal-Leak Bias in Diffusion Models" (WACV 2024)

License

Notifications You must be signed in to change notification settings

IVRL/signal-leak-bias

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exploiting the Signal-Leak Bias in Diffusion Models

arXiv Project Page Proceedings Open In Colab

5-min demo

Overview

This repository contains the official implementation for our paper titled "Exploiting the Signal-Leak Bias in Diffusion Models", presented at WACV 2024 🔥

🔎 Research Highlights

  • In the training of most diffusion models, data are never completely noised, creating a signal leakage and leading to discrepancies between training and inference processes.
  • As a consequence of this signal leakage, the low-frequency / large-scale content of the generated images is mostly unchanged from the initial latents we start the generation process from, generating greyish images or images that do not match the desired style.
  • Our research proposed to exploit this signal-leak bias at inference time to gain more control over generated images.
  • We model the distribution of the signal leak present during training, to include a signal leak at inference time in the initial latents.
  • ✨✨ No training required! ✨✨

Martin Nicolas Everaert 1, Athanasios Fitsios 1,2, Marco Bocchio 2, Sami Arpa 2, Sabine Süsstrunk 1, Radhakrishna Achanta 1

1School of Computer and Communication Sciences, EPFL, Switzerland ; 2Largo.ai, Lausanne, Switzerland

Abstract: There is a bias in the inference pipeline of most diffusion models. This bias arises from a signal leak whose distribution deviates from the noise distribution, creating a discrepancy between training and inference processes. We demonstrate that this signal-leak bias is particularly significant when models are tuned to a specific style, causing sub-optimal style matching. Recent research tries to avoid the signal leakage during training. We instead show how we can exploit this signal-leak bias in existing diffusion models to allow more control over the generated images. This enables us to generate images with more varied brightness, and images that better match a desired style or color. By modeling the distribution of the signal leak in the spatial frequency and pixel domains, and including a signal leak in the initial latent, we generate images that better match expected results without any additional training.

Getting Started

If you're unsure where to execute the following commands, you will likely find it more convenient to open this README in Google Colab. Simply click the following badge to launch a Colab environment:

Open In Colab

Code and development environment

Clone this repository:

git clone https://github.com/IVRL/signal-leak-bias
cd signal-leak-bias/src

Our code mainly builds on top of the code of the 🤗 Diffusers library. Run the following command to install our dependencies:

pip install diffusers==0.25.1
pip install accelerate==0.26.1
pip install transformers==4.26.1

Computing statistics of the signal leak

The provided Python file for computing statistics of the signal leak can be used, for example, as follows:

python signal_leak.py \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-2-1"  \
  --data_dir="path/to/natural/images" \
  --output_dir="examples/C" \
  --resolution=768 \
  --n_components=3 \
  --statistic_type="dct+pixel" \
  --center_crop

Inference

Once the statistics have been computed, you can use them to sample a signal leak at inference time too, for instance as follows:

from signal_leak import sample_from_stats

signal_leak = sample_from_stats(path="examples/C")

Images can be generated with the sampled signal leak in the initial latents, for instance as follows:

from diffusers import StableDiffusionPipeline
import torch

pipeline = StableDiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-2-1").to("cuda")
num_inference_steps = 50

# Get the timestep T of the first reverse diffusion iteration
pipeline.scheduler.set_timesteps(num_inference_steps, device="cuda")
first_inference_timestep = pipeline.scheduler.timesteps[0].item()

# Calculate the expected amount of signal and noise at timestep T
sqrt_alpha_prod = pipeline.scheduler.alphas_cumprod[first_inference_timestep] ** 0.5
sqrt_one_minus_alpha_prod = (1 - pipeline.scheduler.alphas_cumprod[first_inference_timestep]) ** 0.5

# Generate the initial latents, with signal leak
latents = torch.randn([1, 4, 96, 96]) # original initial latents, without signal leak
latents = sqrt_alpha_prod * signal_leak + sqrt_one_minus_alpha_prod * latents

# Generate images
image = pipeline(
    prompt = "An astronaut riding a horse",
    num_inference_steps = num_inference_steps,
    latents = latents,
).images[0]

Examples

Improving style-tuned models

Models tuned on specific styles often produce results that do not match the styles well (see the second column of the next two tables). We argue that this is because of a discrepancy between training (contains a signal leak whose distribution differs from unit/standard multivariate Gaussian) and inference (no signal leak). We fix this discrepancy by modelling the signal leak present during training and including a signal leak (see third column) at inference time too. We use a "pixel" model, that is we estimate the mean and variance of each pixel (spatial elements of the latent encodings).

In the 2 following examples, we show how to fix two models:

Example 1

# Clone the repository with the images
git clone https://huggingface.co/sd-dreambooth-library/nasa-space-v2-768

# Compute statistics of the signal leak from the images
python signal_leak.py \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-2-1" \
  --data_dir="nasa-space-v2-768/concept_images" \
  --output_dir="examples/A1/" \
  --resolution=768 \
  --statistic_type="pixel" \
  --center_crop

# We do not need the original images anymore
rm -rf nasa-space-v2-768 

# Generate image with our signal-leak bias
python -m examples.A1.generate

Model: sd-dreambooth-library/nasa-space-v2-768, with guidance_scale = 1

Prompt: "A very dark picture of the sky, Nasa style"

Initial latents Generated image (original) + Signal Leak Generated image (ours)

Example 2

git clone https://huggingface.co/sd-concepts-library/line-art

python signal_leak.py \
  --pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" \
  --data_dir="line-art/concept_images" \
  --output_dir="examples/A2/" \
  --resolution=512 \
  --statistic_type="pixel" \
  --center_crop
  
rm -rf line-art
    
python -m examples.A2.generate

Model: CompVis/stable-diffusion-v1-4 + sd-concepts-library/line-art

Prompt: "An astronaut riding a horse in the style of <line-art>"

Initial latents Generated image (original) + Signal Leak Generated image (ours)

Training-free style adaptation of Stable Diffusion

The same approach as the previous example can be used directly in the base diffusion model, instead of the model finetuned on a style. That is, we include a signal leak at inference time to bias the image generation towards the desired style.

Without our approach (see second column of the next two tables), the prompt alone is not sufficient enough to generate picture of the desired style. Complementing it with a signal leak of the style (third column) generates images (last column) that better match the desired output.

Example 1

python -m examples.B1.generate

Model: stabilityai/stable-diffusion-2-1, with guidance_scale = 1

Prompt: "A very dark picture of the sky, taken by the Nasa."

Initial latents Generated image (original) + Signal Leak Generated image (ours)

Example 2

python -m examples.B2.generate

Model: CompVis/stable-diffusion-v1-4

Prompt: "An astronaut riding a horse, in the style of line art, pastel colors."

Initial latents Generated image (original) + Signal Leak Generated image (ours)

More diverse generated images

In the previous examples, the signal leak is modelled with a "pixel" model, realigning the training and inference distributions for stylized images. For natural images, the disrepency between training and inference distribution mostly lies in the frequency components: noised images during training still retain the low-frequency contents (large-scale patterns, main colors) of the original images, while the initial latents during inference always have medium low-frequency contents (e.g. greyish average color). Compared to the examples above, we then additionnaly model the low-frequency content of the signal leak, using a small set of natural images.

In the next examples, we will use this set of 128 images from COCO

wget https://github.com/ultralytics/yolov5/releases/download/v1.0/coco128.zip
unzip coco128
rm coco128.zip
mv coco128/images/train2017 coco128_images
rm -r coco128

python signal_leak.py \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-2-1" \
  --data_dir="coco128_images" \
  --output_dir="examples/C/" \
  --resolution=768 \
  --n_components=3 \
  --statistic_type="dct+pixel" \
  --center_crop
  
rm -r coco128_images

python -m examples.C.generate

Model: stabilityai/stable-diffusion-2-1

Prompt: "An astronaut riding a horse"

Initial latents Generated image (original) + Signal Leak Generated image (ours)

Control on the average color

In the previous example, the signal leak given at inference time is sampled randomly from the statistics of the signal leak present at training time. Instead, it is also possible to manually set its low-frequency components, providing control on the low-frequency content of the generated image, as we show in the following example.

wget https://github.com/ultralytics/yolov5/releases/download/v1.0/coco128.zip
unzip coco128
rm coco128.zip
mv coco128/images/train2017 coco128_images
rm -r coco128

python signal_leak.py \
  --pretrained_model_name_or_path="stabilityai/stable-diffusion-2-1" \
  --data_dir="coco128_images" \
  --output_dir="examples/D/" \
  --resolution=768 \
  --n_components=1 \
  --statistic_type="dct+pixel" \
  --center_crop
  
rm -r coco128_images
  
python -m examples.D.generate

Model: stabilityai/stable-diffusion-2-1

Prompt: "An astronaut riding a horse"

Channel -2 -1 0 1 2
0
1
2
3

License

The implementation here is provided solely as part of the research publication "Exploiting the Signal-Leak Bias in Diffusion Models", only for academic non-commercial usage. Details can be found in the LICENSE file. If the License is not suitable for your business or project, please contact Largo.ai ([email protected]) and EPFL-TTO ([email protected]) for a full commercial license.

Citation

Please cite the paper as follows:

@InProceedings{Everaert_2024_WACV,
      author   = {Everaert, Martin Nicolas and Fitsios, Athanasios and Bocchio, Marco and Arpa, Sami and Süsstrunk, Sabine and Achanta, Radhakrishna},
      title    = {{E}xploiting the {S}ignal-{L}eak {B}ias in {D}iffusion {M}odels}, 
      booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
      month     = {January},
      year      = {2024},
      pages     = {4025-4034}
}

About

Official implementation of "Exploiting the Signal-Leak Bias in Diffusion Models" (WACV 2024)

Resources

License

Stars

Watchers

Forks