From 959e44a15e84a3d5d859387789a548a0d6316374 Mon Sep 17 00:00:00 2001
From: Ashish Bora <ashish.dilip.bora@gmail.com>
Date: Mon, 6 Jun 2022 14:19:28 -0700
Subject: [PATCH 1/2] Fix typo

---
 posts/2018-10-13-flow-models/index.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/posts/2018-10-13-flow-models/index.html b/posts/2018-10-13-flow-models/index.html
index 4c7972d..a91f60d 100644
--- a/posts/2018-10-13-flow-models/index.html
+++ b/posts/2018-10-13-flow-models/index.html
@@ -474,7 +474,7 @@ <h2 id="realnvp">RealNVP<a hidden class="anchor" aria-hidden="true" href="#realn
 $$
 </div>
 <p>So far, the affine coupling layer looks perfect for constructing a normalizing flow :)</p>
-<p>Even better, since (i) computing $f^-1$ does not require computing the inverse of $s$ or $t$ and (ii) computing the Jacobian determinant does not involve computing the Jacobian of $s$ or $t$, those functions can be <em>arbitrarily complex</em>; i.e. both $s$ and $t$ can be modeled by deep neural networks.</p>
+<p>Even better, since (i) computing $f^{-1}$ does not require computing the inverse of $s$ or $t$ and (ii) computing the Jacobian determinant does not involve computing the Jacobian of $s$ or $t$, those functions can be <em>arbitrarily complex</em>; i.e. both $s$ and $t$ can be modeled by deep neural networks.</p>
 <p>In one affine coupling layer, some dimensions (channels) remain unchanged. To make sure all the inputs have a chance to be altered, the model reverses the ordering in each layer so that different components are left unchanged. Following such an alternating pattern, the set of units which remain identical in one transformation layer are always modified in the next. Batch normalization is found to help training models with a very deep stack of coupling layers.</p>
 <p>Furthermore, RealNVP can work in a multi-scale architecture to build a more efficient model for large inputs. The multi-scale architecture applies several &ldquo;sampling&rdquo; operations to normal affine layers, including spatial checkerboard pattern masking, squeezing operation, and channel-wise masking. Read the <a href="https://arxiv.org/abs/1605.08803">paper</a> for more details on the multi-scale architecture.</p>
 <h2 id="nice">NICE<a hidden class="anchor" aria-hidden="true" href="#nice">#</a></h2>

From 07e68d352ee77eee31c3228cb1a655485df91124 Mon Sep 17 00:00:00 2001
From: Ashish Bora <ashish.dilip.bora@gmail.com>
Date: Mon, 6 Jun 2022 14:36:52 -0700
Subject: [PATCH 2/2] Add function parameter

---
 posts/2018-10-13-flow-models/index.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/posts/2018-10-13-flow-models/index.html b/posts/2018-10-13-flow-models/index.html
index a91f60d..c533e67 100644
--- a/posts/2018-10-13-flow-models/index.html
+++ b/posts/2018-10-13-flow-models/index.html
@@ -501,7 +501,7 @@ <h2 id="glow">Glow<a hidden class="anchor" aria-hidden="true" href="#glow">#</a>
 <p>It performs an affine transformation using a scale and bias parameter per channel, similar to batch normalization, but works for mini-batch size 1. The parameters are trainable but initialized so that the first minibatch of data have mean 0 and standard deviation 1 after actnorm.</p>
 <p>Substep 2: <strong>Invertible 1x1 conv</strong></p>
 <p>Between layers of the RealNVP flow, the ordering of channels is reversed so that all the data dimensions have a chance to be altered. A 1×1 convolution with equal number of input and output channels is <em>a generalization of any permutation</em> of the channel ordering.</p>
-<p>Say, we have an invertible 1x1 convolution of an input $h \times w \times c$ tensor $\mathbf{h}$ with a weight matrix $\mathbf{W}$ of size $c \times c$. The output is a $h \times w \times c$ tensor, labeled as $f = \texttt{conv2d}(\mathbf{h}; \mathbf{W})$. In order to apply the change of variable rule, we need to compute the Jacobian determinant $\vert \det\partial f / \partial\mathbf{h}\vert$.</p>
+<p>Say, we have an invertible 1x1 convolution of an input $h \times w \times c$ tensor $\mathbf{h}$ with a weight matrix $\mathbf{W}$ of size $c \times c$. The output is a $h \times w \times c$ tensor, labeled as $f(\mathbf{h}) = \texttt{conv2d}(\mathbf{h}; \mathbf{W})$. In order to apply the change of variable rule, we need to compute the Jacobian determinant $\vert \det\partial f / \partial\mathbf{h}\vert$.</p>
 <p>Both the input and output of 1x1 convolution here can be viewed as a matrix of size $h \times w$. Each entry $\mathbf{x}_{ij}$ ($i=1,\dots,h, j=1,\dots,w$) in $\mathbf{h}$ is a vector of $c$ channels and each entry is multiplied by the weight matrix $\mathbf{W}$ to obtain the corresponding entry $\mathbf{y}_{ij}$ in the output matrix respectively. The derivative of each entry is $\partial \mathbf{x}_{ij} \mathbf{W} / \partial\mathbf{x}_{ij} = \mathbf{W}$ and there are $h \times w$ such entries in total:</p>
 <div>
 $$