Skip to content

Commit

Permalink
fix bugs reported by Celia
Browse files Browse the repository at this point in the history
  • Loading branch information
olivier-bernard-creatis committed Jan 10, 2024
1 parent 876d5e5 commit 2ec836d
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions collections/_posts/2023-12-19-latent-diffusion-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
layout: review
title: "High-resolution image synthesis with latent diffusion models"
tags: diffusion model, generative model
author: "Celia Goujeat, Olivier Bernard"
author: "Olivier Bernard"
cite:
authors: "Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer"
title: "High-resolution image synthesis with latent diffusion models"
Expand All @@ -17,7 +17,7 @@ pdf: "https://arxiv.org/pdf/2112.10752.pdf"

# Highlights

* Diffusion models (DMs) are applied in the latent space of powerfull pretrained autoencoders
* Diffusion models (DMs) are applied in the latent space of powerful pretrained autoencoders
* Allows to reach a good compromise between complexity reduction and details preservation
* Introduce cross-attention layers into the model architecture for general conditioning inputs such as text

Expand Down Expand Up @@ -131,9 +131,9 @@ $$\mathcal{L}_{LDM} := \mathbb{E}_{z \sim E(x), y, \epsilon \sim \mathcal{N}(0,\
<p style="text-align: center;font-style:italic">Figure 3. Analyzing the training of class-conditional LDMs with
different downsampling factors f over 2M train steps on the ImageNet dataset.</p>

* LDM-1 corresponds to DM without any latent representation.
* LDM-1 corresponds to DM without any latent representation
* LDM-4, LDM-8 and LDM-16 appear to be the most efficient
* LDM-32 shows limitations due to to high downsampling effects
* LDM-32 shows limitations due to high downsampling effects

&nbsp;

Expand Down Expand Up @@ -200,7 +200,7 @@ recent state-of-the-art methods for class-conditional image generation on ImageN

## Semantic-map-to-image synthesis

* use of images of landscapes paired with semantic maps
* Use of images of landscapes paired with semantic maps
* Downsampled versions of the semantic maps are simply concatenated with the latent image representation of a LDM-4 model with VQ-reg.
* No cross-attention scheme is used here
* The model is trained on an input resolution of 256x256 but the authors find that the model generalizes to larger resolutions and can generate images up to the megapixel regime
Expand Down

0 comments on commit 2ec836d

Please sign in to comment.