Skip to content

Commit

Permalink
diffusion video post
Browse files Browse the repository at this point in the history
  • Loading branch information
Lilian Weng committed Apr 15, 2024
1 parent 380d820 commit e8ca8c0
Show file tree
Hide file tree
Showing 38 changed files with 1,259 additions and 93 deletions.
15 changes: 13 additions & 2 deletions archives/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -229,8 +229,19 @@
<h1>Archive</h1>
</header>
<div class="archive-year">
<h2 class="archive-year-header">2024<sup class="archive-count">&nbsp;&nbsp;1</sup>
<h2 class="archive-year-header">2024<sup class="archive-count">&nbsp;&nbsp;2</sup>
</h2>
<div class="archive-month">
<h3 class="archive-month-header">April<sup class="archive-count">&nbsp;&nbsp;1</sup></h3>
<div class="archive-posts">
<div class="archive-entry">
<h3 class="archive-entry-title">Diffusion Models for Video Generation
</h3>
<div class="archive-meta">Date: April 12, 2024 | Estimated Reading Time: 20 min | Author: Lilian Weng</div>
<a class="entry-link" aria-label="post link to Diffusion Models for Video Generation" href="https://lilianweng.github.io/posts/2024-04-12-diffusion-video/"></a>
</div>
</div>
</div>
<div class="archive-month">
<h3 class="archive-month-header">February<sup class="archive-count">&nbsp;&nbsp;1</sup></h3>
<div class="archive-posts">
Expand Down Expand Up @@ -376,7 +387,7 @@ <h3 class="archive-month-header">July<sup class="archive-count">&nbsp;&nbsp;1</s
<div class="archive-entry">
<h3 class="archive-entry-title">What are Diffusion Models?
</h3>
<div class="archive-meta">Date: July 11, 2021 | Estimated Reading Time: 31 min | Author: Lilian Weng</div>
<div class="archive-meta">Date: July 11, 2021 | Estimated Reading Time: 32 min | Author: Lilian Weng</div>
<a class="entry-link" aria-label="post link to What are Diffusion Models?" href="https://lilianweng.github.io/posts/2021-07-11-diffusion-models/"></a>
</div>
</div>
Expand Down
27 changes: 14 additions & 13 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,19 @@ <h1>👋 Welcome to Lil&rsquo;Log</h1>
</footer>
</article>

<article class="post-entry">
<header class="entry-header">
<h2>Diffusion Models for Video Generation
</h2>
</header>
<section class="entry-content">
<p>Diffusion models have demonstrated strong results on image synthesis in past years. Now the research community has started working on a harder task—using it for video generation. The task itself is a superset of the image case, since an image is a video of 1 frame, and it is much more challenging because
It has extra requirements on temporal consistency across frames, which naturally demand more world knowledge to be encoded into the model....</p>
</section>
<footer class="entry-footer">Date: April 12, 2024 | Estimated Reading Time: 20 min | Author: Lilian Weng</footer>
<a class="entry-link" aria-label="post link to Diffusion Models for Video Generation" href="https://lilianweng.github.io/posts/2024-04-12-diffusion-video/"></a>
</article>

<article class="post-entry">
<header class="entry-header">
<h2>Thinking about High-Quality Human Data
Expand Down Expand Up @@ -400,7 +413,7 @@ <h2>What are Diffusion Models?
<p>[Updated on 2021-09-19: Highly recommend this blog post on score-based generative modeling by Yang Song (author of several key papers in the references)]. [Updated on 2022-08-27: Added classifier-free guidance, GLIDE, unCLIP and Imagen. [Updated on 2022-08-31: Added latent diffusion model. [Updated on 2024-04-13: Added progressive distillation, consistency models, and the Model Architecture section.
So far, I’ve written about three types of generative models, GAN, VAE, and Flow-based models. They have shown great success in generating high-quality samples, but each has some limitations of its own....</p>
</section>
<footer class="entry-footer">Date: July 11, 2021 | Estimated Reading Time: 31 min | Author: Lilian Weng</footer>
<footer class="entry-footer">Date: July 11, 2021 | Estimated Reading Time: 32 min | Author: Lilian Weng</footer>
<a class="entry-link" aria-label="post link to What are Diffusion Models?" href="https://lilianweng.github.io/posts/2021-07-11-diffusion-models/"></a>
</article>

Expand Down Expand Up @@ -482,18 +495,6 @@ <h2>Exploration Strategies in Deep Reinforcement Learning
<footer class="entry-footer">Date: June 7, 2020 | Estimated Reading Time: 36 min | Author: Lilian Weng</footer>
<a class="entry-link" aria-label="post link to Exploration Strategies in Deep Reinforcement Learning" href="https://lilianweng.github.io/posts/2020-06-07-exploration-drl/"></a>
</article>

<article class="post-entry">
<header class="entry-header">
<h2>The Transformer Family
</h2>
</header>
<section class="entry-content">
<p>[Updated on 2023-01-27: After almost three years, I did a big refactoring update of this post to incorporate a bunch of new Transformer models since 2020. The enhanced version of this post is here: The Transformer Family Version 2.0. Please refer to that post on this topic.] It has been almost two years since my last post on attention. Recent progress on new and enhanced versions of Transformer motivates me to write another post on this specific topic, focusing on how the vanilla Transformer can be improved for longer-term attention span, less memory and computation consumption, RL task solving and more....</p>
</section>
<footer class="entry-footer">Date: April 7, 2020 | Estimated Reading Time: 25 min | Author: Lilian Weng</footer>
<a class="entry-link" aria-label="post link to The Transformer Family" href="https://lilianweng.github.io/posts/2020-04-07-the-transformer-family/"></a>
</article>
<footer class="page-footer">
<nav class="pagination">
<a class="next" href="https://lilianweng.github.io/page/2/"> »</a>
Expand Down
2 changes: 1 addition & 1 deletion index.json

Large diffs are not rendered by default.

12 changes: 11 additions & 1 deletion index.xml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,17 @@
<description>Recent content on Lil&#39;Log</description>
<generator>Hugo -- gohugo.io</generator>
<language>en-us</language>
<lastBuildDate>Mon, 05 Feb 2024 00:00:00 +0000</lastBuildDate><atom:link href="https://lilianweng.github.io/index.xml" rel="self" type="application/rss+xml" />
<lastBuildDate>Fri, 12 Apr 2024 00:00:00 +0000</lastBuildDate><atom:link href="https://lilianweng.github.io/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>Diffusion Models for Video Generation</title>
<link>https://lilianweng.github.io/posts/2024-04-12-diffusion-video/</link>
<pubDate>Fri, 12 Apr 2024 00:00:00 +0000</pubDate>

<guid>https://lilianweng.github.io/posts/2024-04-12-diffusion-video/</guid>
<description>Diffusion models have demonstrated strong results on image synthesis in past years. Now the research community has started working on a harder task&amp;mdash;using it for video generation. The task itself is a superset of the image case, since an image is a video of 1 frame, and it is much more challenging because
It has extra requirements on temporal consistency across frames, which naturally demand more world knowledge to be encoded into the model.</description>
</item>

<item>
<title>Thinking about High-Quality Human Data</title>
<link>https://lilianweng.github.io/posts/2024-02-05-human-data-quality/</link>
Expand Down
25 changes: 12 additions & 13 deletions page/2/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,18 @@
</header>
<main class="main">

<article class="post-entry">
<header class="entry-header">
<h2>The Transformer Family
</h2>
</header>
<section class="entry-content">
<p>[Updated on 2023-01-27: After almost three years, I did a big refactoring update of this post to incorporate a bunch of new Transformer models since 2020. The enhanced version of this post is here: The Transformer Family Version 2.0. Please refer to that post on this topic.] It has been almost two years since my last post on attention. Recent progress on new and enhanced versions of Transformer motivates me to write another post on this specific topic, focusing on how the vanilla Transformer can be improved for longer-term attention span, less memory and computation consumption, RL task solving and more....</p>
</section>
<footer class="entry-footer">Date: April 7, 2020 | Estimated Reading Time: 25 min | Author: Lilian Weng</footer>
<a class="entry-link" aria-label="post link to The Transformer Family" href="https://lilianweng.github.io/posts/2020-04-07-the-transformer-family/"></a>
</article>

<article class="post-entry">
<header class="entry-header">
<h2>Curriculum for Reinforcement Learning
Expand Down Expand Up @@ -433,19 +445,6 @@ <h2>Object Detection for Dummies Part 1: Gradient Vector, HOG, and SS
<footer class="entry-footer">Date: October 29, 2017 | Estimated Reading Time: 15 min | Author: Lilian Weng</footer>
<a class="entry-link" aria-label="post link to Object Detection for Dummies Part 1: Gradient Vector, HOG, and SS" href="https://lilianweng.github.io/posts/2017-10-29-object-recognition-part-1/"></a>
</article>

<article class="post-entry">
<header class="entry-header">
<h2>Learning Word Embedding
</h2>
</header>
<section class="entry-content">
<p>Human vocabulary comes in free text. In order to make a machine learning model understand and process the natural language, we need to transform the free-text words into numeric values. One of the simplest transformation approaches is to do a one-hot encoding in which each distinct word stands for one dimension of the resulting vector and a binary value indicates whether the word presents (1) or not (0).
However, one-hot encoding is impractical computationally when dealing with the entire vocabulary, as the representation demands hundreds of thousands of dimensions....</p>
</section>
<footer class="entry-footer">Date: October 15, 2017 | Estimated Reading Time: 18 min | Author: Lilian Weng</footer>
<a class="entry-link" aria-label="post link to Learning Word Embedding" href="https://lilianweng.github.io/posts/2017-10-15-word-embedding/"></a>
</article>
<footer class="page-footer">
<nav class="pagination">
<a class="prev" href="https://lilianweng.github.io/">« </a>
Expand Down
13 changes: 13 additions & 0 deletions page/3/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -191,6 +191,19 @@
</header>
<main class="main">

<article class="post-entry">
<header class="entry-header">
<h2>Learning Word Embedding
</h2>
</header>
<section class="entry-content">
<p>Human vocabulary comes in free text. In order to make a machine learning model understand and process the natural language, we need to transform the free-text words into numeric values. One of the simplest transformation approaches is to do a one-hot encoding in which each distinct word stands for one dimension of the resulting vector and a binary value indicates whether the word presents (1) or not (0).
However, one-hot encoding is impractical computationally when dealing with the entire vocabulary, as the representation demands hundreds of thousands of dimensions....</p>
</section>
<footer class="entry-footer">Date: October 15, 2017 | Estimated Reading Time: 18 min | Author: Lilian Weng</footer>
<a class="entry-link" aria-label="post link to Learning Word Embedding" href="https://lilianweng.github.io/posts/2017-10-15-word-embedding/"></a>
</article>

<article class="post-entry">
<header class="entry-header">
<h2>Anatomize Deep Learning with Information Theory
Expand Down
19 changes: 10 additions & 9 deletions posts/2021-07-11-diffusion-models/index.html

Large diffs are not rendered by default.

5 changes: 5 additions & 0 deletions posts/2024-02-05-human-data-quality/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -541,6 +541,11 @@ <h1 id="citation">Citation<a hidden class="anchor" aria-hidden="true" href="#cit
<li><a href="https://lilianweng.github.io/tags/human-ai/">human-ai</a></li>
</ul>
<nav class="paginav">
<a class="prev" href="https://lilianweng.github.io/posts/2024-04-12-diffusion-video/">
<span class="title">« </span>
<br>
<span>Diffusion Models for Video Generation</span>
</a>
<a class="next" href="https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/">
<span class="title"> »</span>
<br>
Expand Down
Binary file added posts/2024-04-12-diffusion-video/3D-U-net.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added posts/2024-04-12-diffusion-video/gen-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added posts/2024-04-12-diffusion-video/imagen-video.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
765 changes: 765 additions & 0 deletions posts/2024-04-12-diffusion-video/index.html

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added posts/2024-04-12-diffusion-video/lumiere.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added posts/2024-04-12-diffusion-video/make-a-video.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added posts/2024-04-12-diffusion-video/sora.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added posts/2024-04-12-diffusion-video/v-param.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added posts/2024-04-12-diffusion-video/video-LDM.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
27 changes: 14 additions & 13 deletions posts/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,19 @@
<h1>Posts</h1>
</header>

<article class="post-entry">
<header class="entry-header">
<h2>Diffusion Models for Video Generation
</h2>
</header>
<section class="entry-content">
<p>Diffusion models have demonstrated strong results on image synthesis in past years. Now the research community has started working on a harder task—using it for video generation. The task itself is a superset of the image case, since an image is a video of 1 frame, and it is much more challenging because
It has extra requirements on temporal consistency across frames, which naturally demand more world knowledge to be encoded into the model....</p>
</section>
<footer class="entry-footer">Date: April 12, 2024 | Estimated Reading Time: 20 min | Author: Lilian Weng</footer>
<a class="entry-link" aria-label="post link to Diffusion Models for Video Generation" href="https://lilianweng.github.io/posts/2024-04-12-diffusion-video/"></a>
</article>

<article class="post-entry">
<header class="entry-header">
<h2>Thinking about High-Quality Human Data
Expand Down Expand Up @@ -359,7 +372,7 @@ <h2>What are Diffusion Models?
<p>[Updated on 2021-09-19: Highly recommend this blog post on score-based generative modeling by Yang Song (author of several key papers in the references)]. [Updated on 2022-08-27: Added classifier-free guidance, GLIDE, unCLIP and Imagen. [Updated on 2022-08-31: Added latent diffusion model. [Updated on 2024-04-13: Added progressive distillation, consistency models, and the Model Architecture section.
So far, I’ve written about three types of generative models, GAN, VAE, and Flow-based models. They have shown great success in generating high-quality samples, but each has some limitations of its own....</p>
</section>
<footer class="entry-footer">Date: July 11, 2021 | Estimated Reading Time: 31 min | Author: Lilian Weng</footer>
<footer class="entry-footer">Date: July 11, 2021 | Estimated Reading Time: 32 min | Author: Lilian Weng</footer>
<a class="entry-link" aria-label="post link to What are Diffusion Models?" href="https://lilianweng.github.io/posts/2021-07-11-diffusion-models/"></a>
</article>

Expand Down Expand Up @@ -441,18 +454,6 @@ <h2>Exploration Strategies in Deep Reinforcement Learning
<footer class="entry-footer">Date: June 7, 2020 | Estimated Reading Time: 36 min | Author: Lilian Weng</footer>
<a class="entry-link" aria-label="post link to Exploration Strategies in Deep Reinforcement Learning" href="https://lilianweng.github.io/posts/2020-06-07-exploration-drl/"></a>
</article>

<article class="post-entry">
<header class="entry-header">
<h2>The Transformer Family
</h2>
</header>
<section class="entry-content">
<p>[Updated on 2023-01-27: After almost three years, I did a big refactoring update of this post to incorporate a bunch of new Transformer models since 2020. The enhanced version of this post is here: The Transformer Family Version 2.0. Please refer to that post on this topic.] It has been almost two years since my last post on attention. Recent progress on new and enhanced versions of Transformer motivates me to write another post on this specific topic, focusing on how the vanilla Transformer can be improved for longer-term attention span, less memory and computation consumption, RL task solving and more....</p>
</section>
<footer class="entry-footer">Date: April 7, 2020 | Estimated Reading Time: 25 min | Author: Lilian Weng</footer>
<a class="entry-link" aria-label="post link to The Transformer Family" href="https://lilianweng.github.io/posts/2020-04-07-the-transformer-family/"></a>
</article>
<footer class="page-footer">
<nav class="pagination">
<a class="next" href="https://lilianweng.github.io/posts/page/2/"> »</a>
Expand Down
12 changes: 11 additions & 1 deletion posts/index.xml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,17 @@
<description>Recent content in Posts on Lil&#39;Log</description>
<generator>Hugo -- gohugo.io</generator>
<language>en-us</language>
<lastBuildDate>Mon, 05 Feb 2024 00:00:00 +0000</lastBuildDate><atom:link href="https://lilianweng.github.io/posts/index.xml" rel="self" type="application/rss+xml" />
<lastBuildDate>Fri, 12 Apr 2024 00:00:00 +0000</lastBuildDate><atom:link href="https://lilianweng.github.io/posts/index.xml" rel="self" type="application/rss+xml" />
<item>
<title>Diffusion Models for Video Generation</title>
<link>https://lilianweng.github.io/posts/2024-04-12-diffusion-video/</link>
<pubDate>Fri, 12 Apr 2024 00:00:00 +0000</pubDate>

<guid>https://lilianweng.github.io/posts/2024-04-12-diffusion-video/</guid>
<description>Diffusion models have demonstrated strong results on image synthesis in past years. Now the research community has started working on a harder task&amp;mdash;using it for video generation. The task itself is a superset of the image case, since an image is a video of 1 frame, and it is much more challenging because
It has extra requirements on temporal consistency across frames, which naturally demand more world knowledge to be encoded into the model.</description>
</item>

<item>
<title>Thinking about High-Quality Human Data</title>
<link>https://lilianweng.github.io/posts/2024-02-05-human-data-quality/</link>
Expand Down
25 changes: 12 additions & 13 deletions posts/page/2/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -194,6 +194,18 @@
<h1>Posts</h1>
</header>

<article class="post-entry">
<header class="entry-header">
<h2>The Transformer Family
</h2>
</header>
<section class="entry-content">
<p>[Updated on 2023-01-27: After almost three years, I did a big refactoring update of this post to incorporate a bunch of new Transformer models since 2020. The enhanced version of this post is here: The Transformer Family Version 2.0. Please refer to that post on this topic.] It has been almost two years since my last post on attention. Recent progress on new and enhanced versions of Transformer motivates me to write another post on this specific topic, focusing on how the vanilla Transformer can be improved for longer-term attention span, less memory and computation consumption, RL task solving and more....</p>
</section>
<footer class="entry-footer">Date: April 7, 2020 | Estimated Reading Time: 25 min | Author: Lilian Weng</footer>
<a class="entry-link" aria-label="post link to The Transformer Family" href="https://lilianweng.github.io/posts/2020-04-07-the-transformer-family/"></a>
</article>

<article class="post-entry">
<header class="entry-header">
<h2>Curriculum for Reinforcement Learning
Expand Down Expand Up @@ -436,19 +448,6 @@ <h2>Object Detection for Dummies Part 1: Gradient Vector, HOG, and SS
<footer class="entry-footer">Date: October 29, 2017 | Estimated Reading Time: 15 min | Author: Lilian Weng</footer>
<a class="entry-link" aria-label="post link to Object Detection for Dummies Part 1: Gradient Vector, HOG, and SS" href="https://lilianweng.github.io/posts/2017-10-29-object-recognition-part-1/"></a>
</article>

<article class="post-entry">
<header class="entry-header">
<h2>Learning Word Embedding
</h2>
</header>
<section class="entry-content">
<p>Human vocabulary comes in free text. In order to make a machine learning model understand and process the natural language, we need to transform the free-text words into numeric values. One of the simplest transformation approaches is to do a one-hot encoding in which each distinct word stands for one dimension of the resulting vector and a binary value indicates whether the word presents (1) or not (0).
However, one-hot encoding is impractical computationally when dealing with the entire vocabulary, as the representation demands hundreds of thousands of dimensions....</p>
</section>
<footer class="entry-footer">Date: October 15, 2017 | Estimated Reading Time: 18 min | Author: Lilian Weng</footer>
<a class="entry-link" aria-label="post link to Learning Word Embedding" href="https://lilianweng.github.io/posts/2017-10-15-word-embedding/"></a>
</article>
<footer class="page-footer">
<nav class="pagination">
<a class="prev" href="https://lilianweng.github.io/posts/">« </a>
Expand Down
Loading

0 comments on commit e8ca8c0

Please sign in to comment.