-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit eefe7e2
Showing
94 changed files
with
7,392 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
<h2 align="center">🧹CleanDIFT: Diffusion Features without Noise</h2> | ||
<div align="center"> | ||
<a href="https://nickstracke.dev/" target="_blank">Nick Stracke</a><sup>*</sup> · | ||
<a href="https://stefan-baumann.eu/" target="_blank">Stefan A. Baumann</a><sup>*</sup> · | ||
<a href="https://bsky.app/profile/koljabauer.bsky.social" target="_blank">Kolja Bauer</a><sup>*</sup> · | ||
<a href="https://ffundel.de/" target="_blank">Frank Fundel</a> · | ||
<a href="https://ommer-lab.com/people/ommer/" target="_blank">Björn Ommer</a> | ||
</div> | ||
<p align="center"> | ||
<b>CompVis Group @ LMU Munich</b> <br/> | ||
<sup>*</sup> Equal Contribution | ||
</p> | ||
|
||
[![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://compvis.github.io/CleanDIFT/) | ||
[![Paper](https://img.shields.io/badge/arXiv-PDF-b31b1b)](https://compvis.github.io/CleanDIFT/static/pdfs/cleandift.pdf) | ||
[![Weights](https://img.shields.io/badge/HuggingFace-Weights-orange)](https://huggingface.co/CompVis/cleandift) | ||
|
||
|
||
|
||
This repository contains the official implementation of the paper "CleanDIFT: Diffusion Features without Noise". | ||
|
||
We propose CleanDIFT, a novel method to extract noise-free, timestep-independent features by enabling diffusion models to work directly with clean input images. Our approach is efficient, training on a single GPU in just 30 minutes. | ||
|
||
![teaser](./docs/static/images/teaser_fig.png) | ||
|
||
|
||
## 🚀 Usage | ||
### Setup | ||
Just clone the repo and install the requirements via `pip install -r requirements.txt`, then you're ready to go. | ||
|
||
### Training | ||
|
||
In order to train a feature extractor on your own, you can run `python train.py`. The training script expects your data to be stored in `./data` with the following format: Single level directory with images named `filename.jpg` and corresponding json files `filename.json` that contain the key `caption`. | ||
|
||
### Feature Extraction | ||
|
||
For feature extraction, please refer to one of the notebooks at [`notebooks`](https://github.com/CompVis/CleanDIFT/tree/main/notebooks). We demonstrate how to extract features and use them for semantic correspondence detection and depth prediction. | ||
|
||
Our checkpoints are fully compatible with the `diffusers` library. If you already have a pipeline using SD 1.5 or SD 2.1 from `diffusers`, you can simply replace the U-Net state dict: | ||
|
||
```python | ||
from diffusers import UNet2DConditionModel | ||
from huggingface_hub import hf_hub_download | ||
|
||
unet = UNet2DConditionModel.from_pretrained("stabilityai/stable-diffusion-2-1", subfolder="unet") | ||
ckpt_pth = hf_hub_download(repo_id="CompVis/cleandift", filename="cleandift_sd21_unet.safetensors") | ||
state_dict = load_file(ckpt_pth) | ||
unet.load_state_dict(state_dict, strict=True) | ||
``` | ||
|
||
|
||
## 🎓 Citation | ||
|
||
If you use this codebase or otherwise found our work valuable, please cite our paper: | ||
|
||
```bibtex | ||
@misc{stracke2024cleandift, | ||
title={CleanDIFT: Diffusion Features without Noise}, | ||
author={Nick Stracke and Stefan Andreas Baumann and Kolja Bauer and Frank Fundel and Björn Ommer}, | ||
year={2024}, | ||
eprint={????}, | ||
archivePrefix={arXiv}, | ||
primaryClass={cs.CV} | ||
} | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
seed: 42 | ||
max_val_steps: 10 | ||
val_freq: 100 | ||
checkpoint_freq: 100 | ||
checkpoint_dir: ./checkpoints | ||
lr: 1e-5 | ||
max_steps: null | ||
|
||
grad_accum_steps: 1 | ||
|
||
data: | ||
_target_: src.dataloader.DataModule | ||
dataset_dir: ./data | ||
batch_size: 8 | ||
img_size: 512 | ||
|
||
model: | ||
_target_: src.sd_feature_extraction.StableFeatureAligner | ||
sd_version: sd15 | ||
t_max: 999 # Max timestep used during training | ||
num_t_stratification_bins: 3 | ||
train_unet: True | ||
learn_timestep: True | ||
use_text_condition: true | ||
|
||
ae: | ||
_target_: src.ae.AutoencoderKL | ||
repo: stable-diffusion-v1-5/stable-diffusion-v1-5 | ||
mapping: | ||
_target_: src.utils.MappingSpec | ||
depth: 2 | ||
width: 256 | ||
d_ff: 768 | ||
dropout: 0.0 | ||
adapter_layer_class: src.sd_feature_extraction.FFNStack | ||
adapter_layer_params: | ||
depth: 3 | ||
ffn_expansion: 1 | ||
dim_cond: ${..mapping.width} | ||
feature_extractor_cls: src.sd_feature_extraction.SD15UNetFeatureExtractor | ||
feature_dims: | ||
mid: 1280 | ||
us1: 1280 | ||
us2: 1280 | ||
us3: 1280 | ||
us4: 1280 | ||
us5: 1280 | ||
us6: 1280 | ||
us7: 640 | ||
us8: 640 | ||
us9: 640 | ||
us10: 320 | ||
|
||
|
||
lr_scheduler: | ||
name: constant_with_warmup | ||
num_warmup_steps: 2000 | ||
num_training_steps: null | ||
scheduler_specific_kwargs: {} | ||
|
||
hydra: | ||
job: | ||
chdir: false |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
# @package _global_ | ||
|
||
model: | ||
_target_: src.depth.DepthPred | ||
loss: | ||
_target_: src.depth.SigLoss | ||
model_config_path: ./configs/sd21_feature_extractor.yaml | ||
diffusion_image_size: 768 | ||
channels: 1280 | ||
base_model_timestep: 199 | ||
use_base_model_features: false | ||
adapter_timestep: null | ||
interpolate_features: NONE | ||
|
||
hydra: | ||
job: | ||
chdir: false | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
seed: 42 | ||
max_val_steps: 100 | ||
val_freq: 100 | ||
checkpoint_freq: 200 | ||
checkpoint_dir: ./checkpoints | ||
lr: 1e-5 | ||
max_steps: null | ||
|
||
grad_accum_steps: 1 | ||
|
||
data: | ||
_target_: src.dataloader.DataModule | ||
dataset_dir: ./data | ||
batch_size: 8 | ||
img_size: 768 | ||
|
||
model: | ||
_target_: src.sd_feature_extraction.StableFeatureAligner | ||
sd_version: sd21 | ||
t_max: 999 # Max timestep used during training | ||
num_t_stratification_bins: 3 | ||
train_unet: True | ||
learn_timestep: True | ||
use_text_condition: true | ||
|
||
ae: | ||
_target_: src.ae.AutoencoderKL | ||
repo: stabilityai/stable-diffusion-2-1 | ||
mapping: | ||
_target_: src.utils.MappingSpec | ||
depth: 2 | ||
width: 256 | ||
d_ff: 768 | ||
dropout: 0.0 | ||
adapter_layer_class: src.sd_feature_extraction.FFNStack | ||
adapter_layer_params: | ||
depth: 3 | ||
ffn_expansion: 1 | ||
dim_cond: ${..mapping.width} | ||
feature_extractor_cls: src.sd_feature_extraction.SD21UNetFeatureExtractor | ||
feature_dims: | ||
mid: 1280 | ||
us1: 1280 | ||
us2: 1280 | ||
us3: 1280 | ||
us4: 1280 | ||
us5: 1280 | ||
us6: 1280 | ||
us7: 640 | ||
us8: 640 | ||
us9: 640 | ||
us10: 320 | ||
|
||
|
||
lr_scheduler: | ||
name: constant_with_warmup | ||
num_warmup_steps: 2000 | ||
num_training_steps: null | ||
scheduler_specific_kwargs: {} | ||
|
||
hydra: | ||
job: | ||
chdir: false |
Oops, something went wrong.