-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* add NF intro * set up doc files * add gitignore * minor update to readme * update home page * update docs for each funciton * update docs * src * update function docs * update docs * fix readme math rendering issue * update docs * update example doc * update customize layer docs * finish docs * finish docs * Update README.md Co-authored-by: Cameron Pfiffer <[email protected]> * Update README.md Co-authored-by: Cameron Pfiffer <[email protected]> * Update README.md Co-authored-by: Cameron Pfiffer <[email protected]> * Update docs/src/index.md Co-authored-by: Xianda Sun <[email protected]> * Update README.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/index.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/customized_layer.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/customized_layer.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/customized_layer.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/customized_layer.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/customized_layer.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/index.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/index.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/index.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/example.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/example.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/example.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/example.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/example.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/example.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/example.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/customized_layer.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/customized_layer.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/customized_layer.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/customized_layer.md Co-authored-by: Xianda Sun <[email protected]> * minor ed * minor ed to fix latex issue * minor update --------- Co-authored-by: Cameron Pfiffer <[email protected]> Co-authored-by: Xianda Sun <[email protected]>
- Loading branch information
1 parent
8f4371d
commit 45101e0
Showing
14 changed files
with
579 additions
and
13 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
build/ | ||
site/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
## API | ||
|
||
```@index | ||
``` | ||
|
||
|
||
## Main Function | ||
|
||
```@docs | ||
NormalizingFlows.train_flow | ||
``` | ||
|
||
The flow object can be constructed by `transformed` function in `Bijectors.jl` package. | ||
For example of Gaussian VI, we can construct the flow as follows: | ||
```@julia | ||
using Distributions, Bijectors | ||
T= Float32 | ||
q₀ = MvNormal(zeros(T, 2), ones(T, 2)) | ||
flow = Bijectors.transformed(q₀, Bijectors.Shift(zeros(T,2)) ∘ Bijectors.Scale(ones(T, 2))) | ||
``` | ||
To train the Gaussian VI targeting at distirbution $p$ via ELBO maiximization, we can run | ||
```@julia | ||
using NormalizingFlows | ||
sample_per_iter = 10 | ||
flow_trained, stats, _ = train_flow( | ||
elbo, | ||
flow, | ||
logp, | ||
sample_per_iter; | ||
max_iters=2_000, | ||
optimiser=Optimisers.ADAM(0.01 * one(T)), | ||
) | ||
``` | ||
## Variational Objectives | ||
We have implemented two variational objectives, namely, ELBO and the log-likelihood objective. | ||
Users can also define their own objective functions, and pass it to the [`train_flow`](@ref) function. | ||
`train_flow` will optimize the flow parameters by maximizing `vo`. | ||
The objective function should take the following general form: | ||
```julia | ||
vo(rng, flow, args...) | ||
``` | ||
where `rng` is the random number generator, `flow` is the flow object, and `args...` are the | ||
additional arguments that users can pass to the objective function. | ||
|
||
#### Evidence Lower Bound (ELBO) | ||
By maximizing the ELBO, it is equivalent to minimizing the reverse KL divergence between $q_\theta$ and $p$, i.e., | ||
```math | ||
\begin{aligned} | ||
&\min _{\theta} \mathbb{E}_{q_{\theta}}\left[\log q_{\theta}(Z)-\log p(Z)\right] \quad \text{(Reverse KL)}\\ | ||
& = \max _{\theta} \mathbb{E}_{q_0}\left[ \log p\left(T_N \circ \cdots \circ | ||
T_1(Z_0)\right)-\log q_0(X)+\sum_{n=1}^N \log J_n\left(F_n \circ \cdots \circ | ||
F_1(X)\right)\right] \quad \text{(ELBO)} | ||
\end{aligned} | ||
``` | ||
Reverse KL minimization is typically used for **Bayesian computation**, | ||
where one only has access to the log-(unnormalized)density of the target distribution $p$ (e.g., a Bayesian posterior distribution), | ||
and hope to generate approximate samples from it. | ||
|
||
```@docs | ||
NormalizingFlows.elbo | ||
``` | ||
#### Log-likelihood | ||
|
||
By maximizing the log-likelihood, it is equivalent to minimizing the forward KL divergence between $q_\theta$ and $p$, i.e., | ||
```math | ||
\begin{aligned} | ||
& \min_{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)-\log p(Z)\right] \quad \text{(Forward KL)} \\ | ||
& = \max_{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)\right] \quad \text{(Expected log-likelihood)} | ||
\end{aligned} | ||
``` | ||
Forward KL minimization is typically used for **generative modeling**, | ||
where one is given a set of samples from the target distribution $p$ (e.g., images) | ||
and aims to learn the density or a generative process that outputs high quality samples. | ||
|
||
```@docs | ||
NormalizingFlows.loglikelihood | ||
``` | ||
|
||
|
||
## Training Loop | ||
|
||
```@docs | ||
NormalizingFlows.optimize | ||
``` | ||
|
||
|
||
## Utility Functions for Taking Gradient | ||
```@docs | ||
NormalizingFlows.grad! | ||
NormalizingFlows.value_and_gradient! | ||
``` | ||
|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,180 @@ | ||
# Defining Your Own Flow Layer | ||
|
||
In practice, user might want to define their own normalizing flow. | ||
As briefly noted in [What are normalizing flows?](@ref), the key is to define a | ||
customized normalizing flow layer, including its transformation and inverse, | ||
as well as the log-determinant of the Jacobian of the transformation. | ||
`Bijectors.jl` offers a convenient interface to define a customized bijection. | ||
We refer users to [the documentation of | ||
`Bijectors.jl`](https://turinglang.org/Bijectors.jl/dev/transforms/#Implementing-a-transformation) | ||
for more details. | ||
`Flux.jl` is also a useful package, offering a convenient interface to define neural networks. | ||
|
||
|
||
In this tutorial, we demonstrate how to define a customized normalizing flow | ||
layer -- an `Affine Coupling Layer` (Dinh *et al.*, 2016) -- using `Bijectors.jl` and `Flux.jl`. | ||
|
||
## Affine Coupling Flow | ||
|
||
Given an input vector $\boldsymbol{x}$, the general *coupling transformation* splits it into two | ||
parts: $\boldsymbol{x}_{I_1}$ and $\boldsymbol{x}_{I\setminus I_1}$. Only one | ||
part (e.g., $\boldsymbol{x}_{I_1}$) undergoes a bijective transformation $f$, noted as the *coupling law*, | ||
based on the values of the other part (e.g., $\boldsymbol{x}_{I\setminus I_1}$), which remains unchanged. | ||
```math | ||
\begin{array}{llll} | ||
c_{I_1}(\cdot ; f, \theta): & \mathbb{R}^d \rightarrow \mathbb{R}^d & c_{I_1}^{-1}(\cdot ; f, \theta): & \mathbb{R}^d \rightarrow \mathbb{R}^d \\ | ||
& \boldsymbol{x}_{I \backslash I_1} \mapsto \boldsymbol{x}_{I \backslash I_1} & & \boldsymbol{y}_{I \backslash I_1} \mapsto \boldsymbol{y}_{I \backslash I_1} \\ | ||
& \boldsymbol{x}_{I_1} \mapsto f\left(\boldsymbol{x}_{I_1} ; \theta\left(\boldsymbol{x}_{I\setminus I_1}\right)\right) & & \boldsymbol{y}_{I_1} \mapsto f^{-1}\left(\boldsymbol{y}_{I_1} ; \theta\left(\boldsymbol{y}_{I\setminus I_1}\right)\right) | ||
\end{array} | ||
``` | ||
Here $\theta$ can be an arbitrary function, e.g., a neural network. | ||
As long as $f(\cdot; \theta(\boldsymbol{x}_{I\setminus I_1}))$ is invertible, $c_{I_1}$ is invertible, and the | ||
Jacobian determinant of $c_{I_1}$ is easy to compute: | ||
```math | ||
\left|\text{det} \nabla_x c_{I_1}(x)\right| = \left|\text{det} \nabla_{x_{I_1}} f(x_{I_1}; \theta(x_{I\setminus I_1}))\right| | ||
``` | ||
|
||
The affine coupling layer is a special case of the coupling transformation, where the coupling law $f$ is an affine function: | ||
```math | ||
\begin{aligned} | ||
\boldsymbol{x}_{I_1} &\mapsto \boldsymbol{x}_{I_1} \odot s\left(\boldsymbol{x}_{I\setminus I_1}\right) + t\left(\boldsymbol{x}_{I \setminus I_1}\right) \\ | ||
\boldsymbol{x}_{I \backslash I_1} &\mapsto \boldsymbol{x}_{I \backslash I_1} | ||
\end{aligned} | ||
``` | ||
Here, $s$ and $t$ are arbitrary functions (often neural networks) called the "scaling" and "translation" functions, respectively. | ||
They produce vectors of the | ||
same dimension as $\boldsymbol{x}_{I_1}$. | ||
|
||
|
||
## Implementing Affine Coupling Layer | ||
|
||
We start by defining a simple 3-layer multi-layer perceptron (MLP) using `Flux.jl`, | ||
which will be used to define the scaling $s$ and translation functions $t$ in the affine coupling layer. | ||
```@example afc | ||
using Flux | ||
function MLP_3layer(input_dim::Int, hdims::Int, output_dim::Int; activation=Flux.leakyrelu) | ||
return Chain( | ||
Flux.Dense(input_dim, hdims, activation), | ||
Flux.Dense(hdims, hdims, activation), | ||
Flux.Dense(hdims, output_dim), | ||
) | ||
end | ||
``` | ||
|
||
#### Construct the Object | ||
|
||
Following the user interface of `Bijectors.jl`, we define a struct `AffineCoupling` as a subtype of `Bijectors.Bijector`. | ||
The functions `parition` , `combine` are used to partition and recombine a vector into 3 disjoint subvectors. | ||
And `PartitionMask` is used to store this partition rule. | ||
These three functions are | ||
all defined in `Bijectors.jl`; see the [documentaion](https://github.com/TuringLang/Bijectors.jl/blob/49c138fddd3561c893592a75b211ff6ad949e859/src/bijectors/coupling.jl#L3) for more details. | ||
|
||
```@example afc | ||
using Functors | ||
using Bijectors | ||
using Bijectors: partition, combine, PartitionMask | ||
struct AffineCoupling <: Bijectors.Bijector | ||
dim::Int | ||
mask::Bijectors.PartitionMask | ||
s::Flux.Chain | ||
t::Flux.Chain | ||
end | ||
# to apply functions to the parameters that are contained in AffineCoupling.s and AffineCoupling.t, | ||
# and to re-build the struct from the parameters, we use the functor interface of `Functors.jl` | ||
# see https://fluxml.ai/Flux.jl/stable/models/functors/#Functors.functor | ||
@functor AffineCoupling (s, t) | ||
function AffineCoupling( | ||
dim::Int, # dimension of input | ||
hdims::Int, # dimension of hidden units for s and t | ||
mask_idx::AbstractVector, # index of dimension that one wants to apply transformations on | ||
) | ||
cdims = length(mask_idx) # dimension of parts used to construct coupling law | ||
s = MLP_3layer(cdims, hdims, cdims) | ||
t = MLP_3layer(cdims, hdims, cdims) | ||
mask = PartitionMask(dim, mask_idx) | ||
return AffineCoupling(dim, mask, s, t) | ||
end | ||
``` | ||
By default, we define $s$ and $t$ using the `MLP_3layer` function, which is a | ||
3-layer MLP with leaky ReLU activation function. | ||
|
||
#### Implement the Forward and Inverse Transformations | ||
|
||
|
||
```@example afc | ||
function Bijectors.transform(af::AffineCoupling, x::AbstractVector) | ||
# partition vector using 'af.mask::PartitionMask` | ||
x₁, x₂, x₃ = partition(af.mask, x) | ||
y₁ = x₁ .* af.s(x₂) .+ af.t(x₂) | ||
return combine(af.mask, y₁, x₂, x₃) | ||
end | ||
function Bijectors.transform(iaf::Inverse{<:AffineCoupling}, y::AbstractVector) | ||
af = iaf.orig | ||
# partition vector using `af.mask::PartitionMask` | ||
y_1, y_2, y_3 = partition(af.mask, y) | ||
# inverse transformation | ||
x_1 = (y_1 .- af.t(y_2)) ./ af.s(y_2) | ||
return combine(af.mask, x_1, y_2, y_3) | ||
end | ||
``` | ||
|
||
#### Implement the Log-determinant of the Jacobian | ||
Notice that here we wrap the transformation and the log-determinant of the Jacobian into a single function, `with_logabsdet_jacobian`. | ||
|
||
```@example afc | ||
function Bijectors.with_logabsdet_jacobian(af::AffineCoupling, x::AbstractVector) | ||
x_1, x_2, x_3 = Bijectors.partition(af.mask, x) | ||
y_1 = af.s(x_2) .* x_1 .+ af.t(x_2) | ||
logjac = sum(log ∘ abs, af.s(x_2)) | ||
return combine(af.mask, y_1, x_2, x_3), logjac | ||
end | ||
function Bijectors.with_logabsdet_jacobian( | ||
iaf::Inverse{<:AffineCoupling}, y::AbstractVector | ||
) | ||
af = iaf.orig | ||
# partition vector using `af.mask::PartitionMask` | ||
y_1, y_2, y_3 = partition(af.mask, y) | ||
# inverse transformation | ||
x_1 = (y_1 .- af.t(y_2)) ./ af.s(y_2) | ||
logjac = -sum(log ∘ abs, af.s(y_2)) | ||
return combine(af.mask, x_1, y_2, y_3), logjac | ||
end | ||
``` | ||
#### Construct Normalizing Flow | ||
|
||
Now with all the above implementations, we are ready to use the `AffineCoupling` layer for normalizing flow | ||
by applying it to a base distribution $q_0$. | ||
|
||
```@example afc | ||
using Random, Distributions, LinearAlgebra | ||
dim = 4 | ||
hdims = 10 | ||
Ls = [ | ||
AffineCoupling(dim, hdims, 1:2), | ||
AffineCoupling(dim, hdims, 3:4), | ||
AffineCoupling(dim, hdims, 1:2), | ||
AffineCoupling(dim, hdims, 3:4), | ||
] | ||
ts = reduce(∘, Ls) | ||
q₀ = MvNormal(zeros(Float32, dim), I) | ||
flow = Bijectors.transformed(q₀, ts) | ||
``` | ||
We can now sample from the flow: | ||
```@example afc | ||
x = rand(flow, 10) | ||
``` | ||
And evaluate the density of the flow: | ||
```@example afc | ||
logpdf(flow, x[:,1]) | ||
``` | ||
|
||
|
||
## Reference | ||
Dinh, L., Sohl-Dickstein, J. and Bengio, S., 2016. *Density estimation using real nvp.* | ||
arXiv:1605.08803. |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.