Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
kliyer-ai committed Dec 4, 2024
0 parents commit eefe7e2
Show file tree
Hide file tree
Showing 94 changed files with 7,392 additions and 0 deletions.
65 changes: 65 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
<h2 align="center">🧹CleanDIFT: Diffusion Features without Noise</h2>
<div align="center">
<a href="https://nickstracke.dev/" target="_blank">Nick Stracke</a><sup>*</sup> ·
<a href="https://stefan-baumann.eu/" target="_blank">Stefan A. Baumann</a><sup>*</sup> ·
<a href="https://bsky.app/profile/koljabauer.bsky.social" target="_blank">Kolja Bauer</a><sup>*</sup> ·
<a href="https://ffundel.de/" target="_blank">Frank Fundel</a> ·
<a href="https://ommer-lab.com/people/ommer/" target="_blank">Björn Ommer</a>
</div>
<p align="center">
<b>CompVis Group @ LMU Munich</b> <br/>
<sup>*</sup> Equal Contribution
</p>

[![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://compvis.github.io/CleanDIFT/)
[![Paper](https://img.shields.io/badge/arXiv-PDF-b31b1b)](https://compvis.github.io/CleanDIFT/static/pdfs/cleandift.pdf)
[![Weights](https://img.shields.io/badge/HuggingFace-Weights-orange)](https://huggingface.co/CompVis/cleandift)



This repository contains the official implementation of the paper "CleanDIFT: Diffusion Features without Noise".

We propose CleanDIFT, a novel method to extract noise-free, timestep-independent features by enabling diffusion models to work directly with clean input images. Our approach is efficient, training on a single GPU in just 30 minutes.

![teaser](./docs/static/images/teaser_fig.png)


## 🚀 Usage
### Setup
Just clone the repo and install the requirements via `pip install -r requirements.txt`, then you're ready to go.

### Training

In order to train a feature extractor on your own, you can run `python train.py`. The training script expects your data to be stored in `./data` with the following format: Single level directory with images named `filename.jpg` and corresponding json files `filename.json` that contain the key `caption`.

### Feature Extraction

For feature extraction, please refer to one of the notebooks at [`notebooks`](https://github.com/CompVis/CleanDIFT/tree/main/notebooks). We demonstrate how to extract features and use them for semantic correspondence detection and depth prediction.

Our checkpoints are fully compatible with the `diffusers` library. If you already have a pipeline using SD 1.5 or SD 2.1 from `diffusers`, you can simply replace the U-Net state dict:

```python
from diffusers import UNet2DConditionModel
from huggingface_hub import hf_hub_download

unet = UNet2DConditionModel.from_pretrained("stabilityai/stable-diffusion-2-1", subfolder="unet")
ckpt_pth = hf_hub_download(repo_id="CompVis/cleandift", filename="cleandift_sd21_unet.safetensors")
state_dict = load_file(ckpt_pth)
unet.load_state_dict(state_dict, strict=True)
```


## 🎓 Citation

If you use this codebase or otherwise found our work valuable, please cite our paper:

```bibtex
@misc{stracke2024cleandift,
title={CleanDIFT: Diffusion Features without Noise},
author={Nick Stracke and Stefan Andreas Baumann and Kolja Bauer and Frank Fundel and Björn Ommer},
year={2024},
eprint={????},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
63 changes: 63 additions & 0 deletions configs/sd15_feature_extractor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
seed: 42
max_val_steps: 10
val_freq: 100
checkpoint_freq: 100
checkpoint_dir: ./checkpoints
lr: 1e-5
max_steps: null

grad_accum_steps: 1

data:
_target_: src.dataloader.DataModule
dataset_dir: ./data
batch_size: 8
img_size: 512

model:
_target_: src.sd_feature_extraction.StableFeatureAligner
sd_version: sd15
t_max: 999 # Max timestep used during training
num_t_stratification_bins: 3
train_unet: True
learn_timestep: True
use_text_condition: true

ae:
_target_: src.ae.AutoencoderKL
repo: stable-diffusion-v1-5/stable-diffusion-v1-5
mapping:
_target_: src.utils.MappingSpec
depth: 2
width: 256
d_ff: 768
dropout: 0.0
adapter_layer_class: src.sd_feature_extraction.FFNStack
adapter_layer_params:
depth: 3
ffn_expansion: 1
dim_cond: ${..mapping.width}
feature_extractor_cls: src.sd_feature_extraction.SD15UNetFeatureExtractor
feature_dims:
mid: 1280
us1: 1280
us2: 1280
us3: 1280
us4: 1280
us5: 1280
us6: 1280
us7: 640
us8: 640
us9: 640
us10: 320


lr_scheduler:
name: constant_with_warmup
num_warmup_steps: 2000
num_training_steps: null
scheduler_specific_kwargs: {}

hydra:
job:
chdir: false
19 changes: 19 additions & 0 deletions configs/sd21_depth_prober.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# @package _global_

model:
_target_: src.depth.DepthPred
loss:
_target_: src.depth.SigLoss
model_config_path: ./configs/sd21_feature_extractor.yaml
diffusion_image_size: 768
channels: 1280
base_model_timestep: 199
use_base_model_features: false
adapter_timestep: null
interpolate_features: NONE

hydra:
job:
chdir: false


63 changes: 63 additions & 0 deletions configs/sd21_feature_extractor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
seed: 42
max_val_steps: 100
val_freq: 100
checkpoint_freq: 200
checkpoint_dir: ./checkpoints
lr: 1e-5
max_steps: null

grad_accum_steps: 1

data:
_target_: src.dataloader.DataModule
dataset_dir: ./data
batch_size: 8
img_size: 768

model:
_target_: src.sd_feature_extraction.StableFeatureAligner
sd_version: sd21
t_max: 999 # Max timestep used during training
num_t_stratification_bins: 3
train_unet: True
learn_timestep: True
use_text_condition: true

ae:
_target_: src.ae.AutoencoderKL
repo: stabilityai/stable-diffusion-2-1
mapping:
_target_: src.utils.MappingSpec
depth: 2
width: 256
d_ff: 768
dropout: 0.0
adapter_layer_class: src.sd_feature_extraction.FFNStack
adapter_layer_params:
depth: 3
ffn_expansion: 1
dim_cond: ${..mapping.width}
feature_extractor_cls: src.sd_feature_extraction.SD21UNetFeatureExtractor
feature_dims:
mid: 1280
us1: 1280
us2: 1280
us3: 1280
us4: 1280
us5: 1280
us6: 1280
us7: 640
us8: 640
us9: 640
us10: 320


lr_scheduler:
name: constant_with_warmup
num_warmup_steps: 2000
num_training_steps: null
scheduler_specific_kwargs: {}

hydra:
job:
chdir: false
Loading

0 comments on commit eefe7e2

Please sign in to comment.