Skip to content

Commit

Permalink
Documentation (#27)
Browse files Browse the repository at this point in the history
* add NF intro

* set up doc files

* add gitignore

* minor update to readme

* update home page

* update docs for each funciton

* update docs

* src

* update function docs

* update docs

* fix readme math rendering issue

* update docs

* update example doc

* update customize layer docs

* finish docs

* finish docs

* Update README.md

Co-authored-by: Cameron Pfiffer <[email protected]>

* Update README.md

Co-authored-by: Cameron Pfiffer <[email protected]>

* Update README.md

Co-authored-by: Cameron Pfiffer <[email protected]>

* Update docs/src/index.md

Co-authored-by: Xianda Sun <[email protected]>

* Update README.md

Co-authored-by: Xianda Sun <[email protected]>

* Update docs/src/index.md

Co-authored-by: Xianda Sun <[email protected]>

* Update docs/src/customized_layer.md

Co-authored-by: Xianda Sun <[email protected]>

* Update docs/src/customized_layer.md

Co-authored-by: Xianda Sun <[email protected]>

* Update docs/src/customized_layer.md

Co-authored-by: Xianda Sun <[email protected]>

* Update docs/src/customized_layer.md

Co-authored-by: Xianda Sun <[email protected]>

* Update docs/src/customized_layer.md

Co-authored-by: Xianda Sun <[email protected]>

* Update docs/src/index.md

Co-authored-by: Xianda Sun <[email protected]>

* Update docs/src/index.md

Co-authored-by: Xianda Sun <[email protected]>

* Update docs/src/index.md

Co-authored-by: Xianda Sun <[email protected]>

* Update docs/src/example.md

Co-authored-by: Xianda Sun <[email protected]>

* Update docs/src/example.md

Co-authored-by: Xianda Sun <[email protected]>

* Update docs/src/example.md

Co-authored-by: Xianda Sun <[email protected]>

* Update docs/src/example.md

Co-authored-by: Xianda Sun <[email protected]>

* Update docs/src/example.md

Co-authored-by: Xianda Sun <[email protected]>

* Update docs/src/example.md

Co-authored-by: Xianda Sun <[email protected]>

* Update docs/src/example.md

Co-authored-by: Xianda Sun <[email protected]>

* Update docs/src/customized_layer.md

Co-authored-by: Xianda Sun <[email protected]>

* Update docs/src/customized_layer.md

Co-authored-by: Xianda Sun <[email protected]>

* Update docs/src/customized_layer.md

Co-authored-by: Xianda Sun <[email protected]>

* Update docs/src/customized_layer.md

Co-authored-by: Xianda Sun <[email protected]>

* minor ed

* minor ed to fix latex issue

* minor update

---------

Co-authored-by: Cameron Pfiffer <[email protected]>
Co-authored-by: Xianda Sun <[email protected]>
  • Loading branch information
3 people authored Aug 23, 2023
1 parent 8f4371d commit 45101e0
Show file tree
Hide file tree
Showing 14 changed files with 579 additions and 13 deletions.
86 changes: 86 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,3 +2,89 @@

[![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://turinglang.github.io/NormalizingFlows.jl/dev/)
[![Build Status](https://github.com/TuringLang/NormalizingFlows.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/TuringLang/NormalizingFlows.jl/actions/workflows/CI.yml?query=branch%3Amain)


A normalizing flow library for Julia.

The purpose of this package is to provide a simple and flexible interface for variational inference (VI) and normalizing flows (NF) for Bayesian computation or generative modeling.
The key focus is to ensure modularity and extensibility, so that users can easily
construct (e.g., define customized flow layers) and combine various components
(e.g., choose different VI objectives or gradient estimates)
for variational approximation of general target distributions,
without being tied to specific probabilistic programming frameworks or applications.

See the [documentation](https://turinglang.org/NormalizingFlows.jl/dev/) for more.

## Installation
To install the package, run the following command in the Julia REPL:
```julia
] # enter Pkg mode
(@v1.9) pkg> add git@github.com:TuringLang/NormalizingFlows.jl.git
```
Then simply run the following command to use the package:
```julia
using NormalizingFlows
```

## Quick recap of normalizing flows
Normalizing flows transform a simple reference distribution $q_0$ (sometimes known as base distribution) to
a complex distribution $q$ using invertible functions.

In more details, given the base distribution, usually a standard Gaussian distribution, i.e., $q_0 = \mathcal{N}(0, I)$,
we apply a series of parameterized invertible transformations (called flow layers), $T_{1, \theta_1}, \cdots, T_{N, \theta_k}$, yielding that
```math
Z_N = T_{N, \theta_N} \circ \cdots \circ T_{1, \theta_1} (Z_0) , \quad Z_0 \sim q_0,\quad Z_N \sim q_{\theta},
```
where $\theta = (\theta_1, \dots, \theta_N)$ is the parameter to be learned, and $q_{\theta}$ is the variational distribution (flow distribution). This describes **sampling procedure** of normalizing flows, which requires sending draws through a forward pass of these flow layers.

Since all the transformations are invertible (techinically [diffeomorphic](https://en.wikipedia.org/wiki/Diffeomorphism)), we can evaluate the density of a normalizing flow distribution $q_{\theta}$ by the change of variable formula:
```math
q_\theta(x)=\frac{q_0\left(T_1^{-1} \circ \cdots \circ
T_N^{-1}(x)\right)}{\prod_{n=1}^N J_n\left(T_n^{-1} \circ \cdots \circ
T_N^{-1}(x)\right)} \quad J_n(x)=\left|\operatorname{det} \nabla_x
T_n(x)\right|.
```
Here we drop the subscript $\theta_n, n = 1, \dots, N$ for simplicity.
Density evaluation of normalizing flow requires computing the **inverse** and the
**Jacobian determinant** of each flow layer.

Given the feasibility of i.i.d. sampling and density evaluation, normalizing flows can be trained by minimizing some statistical distances to the target distribution $p$. The typical choice of the statistical distance is the forward and backward Kullback-Leibler (KL) divergence, which leads to the following optimization problems:
```math
\begin{aligned}
\text{Reverse KL:}\quad
&\argmin _{\theta} \mathbb{E}_{q_{\theta}}\left[\log q_{\theta}(Z)-\log p(Z)\right] \\
&= \argmin _{\theta} \mathbb{E}_{q_0}\left[\log \frac{q_\theta(T_N\circ \cdots \circ T_1(Z_0))}{p(T_N\circ \cdots \circ T_1(Z_0))}\right] \\
&= \argmax _{\theta} \mathbb{E}_{q_0}\left[ \log p\left(T_N \circ \cdots \circ T_1(Z_0)\right)-\log q_0(X)+\sum_{n=1}^N \log J_n\left(F_n \circ \cdots \circ F_1(X)\right)\right]
\end{aligned}
```
and
```math
\begin{aligned}
\text{Forward KL:}\quad
&\argmin _{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)-\log p(Z)\right] \\
&= \argmin _{\theta} \mathbb{E}_{p}\left[\log q_\theta(Z)\right]
\end{aligned}
```
Both problems can be solved via standard stochastic optimization algorithms,
such as stochastic gradient descent (SGD) and its variants.

Reverse KL minimization is typically used for **Bayesian computation**, where one
wants to approximate a posterior distribution $p$ that is only known up to a
normalizing constant.
In contrast, forward KL minimization is typically used for **generative modeling**, where one wants to approximate a complex distribution $p$ that is known up to a normalizing constant.

## Current status and TODOs

- [x] general interface development
- [x] documentation
- [ ] including more flow examples
- [ ] GPU compatibility
- [ ] benchmarking

## Related packages
- [Bijectors.jl](https://github.com/TuringLang/Bijectors.jl): a package for defining bijective transformations, which can be used for defining customized flow layers.
- [Flux.jl](https://fluxml.ai/Flux.jl/stable/)
- [Optimisers.jl](https://github.com/FluxML/Optimisers.jl)
- [AdvancedVI.jl](https://github.com/TuringLang/AdvancedVI.jl)


2 changes: 2 additions & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
build/
site/
7 changes: 6 additions & 1 deletion docs/make.jl
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,12 @@ makedocs(;
repo="https://github.com/TuringLang/NormalizingFlows.jl/blob/{commit}{path}#{line}",
sitename="NormalizingFlows.jl",
format=Documenter.HTML(),
pages=["Home" => "index.md"],
pages=[
"Home" => "index.md",
"API" => "api.md",
"Example" => "example.md",
"Customize your own flow layer" => "customized_layer.md",
],
)

deploydocs(; repo="github.com/TuringLang/NormalizingFlows.jl", devbranch="main")
93 changes: 93 additions & 0 deletions docs/src/api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
## API

```@index
```


## Main Function

```@docs
NormalizingFlows.train_flow
```

The flow object can be constructed by `transformed` function in `Bijectors.jl` package.
For example of Gaussian VI, we can construct the flow as follows:
```@julia
using Distributions, Bijectors
T= Float32
q₀ = MvNormal(zeros(T, 2), ones(T, 2))
flow = Bijectors.transformed(q₀, Bijectors.Shift(zeros(T,2)) ∘ Bijectors.Scale(ones(T, 2)))
```
To train the Gaussian VI targeting at distirbution $p$ via ELBO maiximization, we can run
```@julia
using NormalizingFlows
sample_per_iter = 10
flow_trained, stats, _ = train_flow(
elbo,
flow,
logp,
sample_per_iter;
max_iters=2_000,
optimiser=Optimisers.ADAM(0.01 * one(T)),
)
```
## Variational Objectives
We have implemented two variational objectives, namely, ELBO and the log-likelihood objective.
Users can also define their own objective functions, and pass it to the [`train_flow`](@ref) function.
`train_flow` will optimize the flow parameters by maximizing `vo`.
The objective function should take the following general form:
```julia
vo(rng, flow, args...)
```
where `rng` is the random number generator, `flow` is the flow object, and `args...` are the
additional arguments that users can pass to the objective function.

#### Evidence Lower Bound (ELBO)
By maximizing the ELBO, it is equivalent to minimizing the reverse KL divergence between $q_\theta$ and $p$, i.e.,
```math
\begin{aligned}
&\min _{\theta} \mathbb{E}_{q_{\theta}}\left[\log q_{\theta}(Z)-\log p(Z)\right] \quad \text{(Reverse KL)}\\
& = \max _{\theta} \mathbb{E}_{q_0}\left[ \log p\left(T_N \circ \cdots \circ
T_1(Z_0)\right)-\log q_0(X)+\sum_{n=1}^N \log J_n\left(F_n \circ \cdots \circ
F_1(X)\right)\right] \quad \text{(ELBO)}
\end{aligned}
```
Reverse KL minimization is typically used for **Bayesian computation**,
where one only has access to the log-(unnormalized)density of the target distribution $p$ (e.g., a Bayesian posterior distribution),
and hope to generate approximate samples from it.

```@docs
NormalizingFlows.elbo
```
#### Log-likelihood

By maximizing the log-likelihood, it is equivalent to minimizing the forward KL divergence between $q_\theta$ and $p$, i.e.,
```math
\begin{aligned}
& \min_{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)-\log p(Z)\right] \quad \text{(Forward KL)} \\
& = \max_{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)\right] \quad \text{(Expected log-likelihood)}
\end{aligned}
```
Forward KL minimization is typically used for **generative modeling**,
where one is given a set of samples from the target distribution $p$ (e.g., images)
and aims to learn the density or a generative process that outputs high quality samples.

```@docs
NormalizingFlows.loglikelihood
```


## Training Loop

```@docs
NormalizingFlows.optimize
```


## Utility Functions for Taking Gradient
```@docs
NormalizingFlows.grad!
NormalizingFlows.value_and_gradient!
```

Binary file added docs/src/banana.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/src/comparison.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
180 changes: 180 additions & 0 deletions docs/src/customized_layer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
# Defining Your Own Flow Layer

In practice, user might want to define their own normalizing flow.
As briefly noted in [What are normalizing flows?](@ref), the key is to define a
customized normalizing flow layer, including its transformation and inverse,
as well as the log-determinant of the Jacobian of the transformation.
`Bijectors.jl` offers a convenient interface to define a customized bijection.
We refer users to [the documentation of
`Bijectors.jl`](https://turinglang.org/Bijectors.jl/dev/transforms/#Implementing-a-transformation)
for more details.
`Flux.jl` is also a useful package, offering a convenient interface to define neural networks.


In this tutorial, we demonstrate how to define a customized normalizing flow
layer -- an `Affine Coupling Layer` (Dinh *et al.*, 2016) -- using `Bijectors.jl` and `Flux.jl`.

## Affine Coupling Flow

Given an input vector $\boldsymbol{x}$, the general *coupling transformation* splits it into two
parts: $\boldsymbol{x}_{I_1}$ and $\boldsymbol{x}_{I\setminus I_1}$. Only one
part (e.g., $\boldsymbol{x}_{I_1}$) undergoes a bijective transformation $f$, noted as the *coupling law*,
based on the values of the other part (e.g., $\boldsymbol{x}_{I\setminus I_1}$), which remains unchanged.
```math
\begin{array}{llll}
c_{I_1}(\cdot ; f, \theta): & \mathbb{R}^d \rightarrow \mathbb{R}^d & c_{I_1}^{-1}(\cdot ; f, \theta): & \mathbb{R}^d \rightarrow \mathbb{R}^d \\
& \boldsymbol{x}_{I \backslash I_1} \mapsto \boldsymbol{x}_{I \backslash I_1} & & \boldsymbol{y}_{I \backslash I_1} \mapsto \boldsymbol{y}_{I \backslash I_1} \\
& \boldsymbol{x}_{I_1} \mapsto f\left(\boldsymbol{x}_{I_1} ; \theta\left(\boldsymbol{x}_{I\setminus I_1}\right)\right) & & \boldsymbol{y}_{I_1} \mapsto f^{-1}\left(\boldsymbol{y}_{I_1} ; \theta\left(\boldsymbol{y}_{I\setminus I_1}\right)\right)
\end{array}
```
Here $\theta$ can be an arbitrary function, e.g., a neural network.
As long as $f(\cdot; \theta(\boldsymbol{x}_{I\setminus I_1}))$ is invertible, $c_{I_1}$ is invertible, and the
Jacobian determinant of $c_{I_1}$ is easy to compute:
```math
\left|\text{det} \nabla_x c_{I_1}(x)\right| = \left|\text{det} \nabla_{x_{I_1}} f(x_{I_1}; \theta(x_{I\setminus I_1}))\right|
```

The affine coupling layer is a special case of the coupling transformation, where the coupling law $f$ is an affine function:
```math
\begin{aligned}
\boldsymbol{x}_{I_1} &\mapsto \boldsymbol{x}_{I_1} \odot s\left(\boldsymbol{x}_{I\setminus I_1}\right) + t\left(\boldsymbol{x}_{I \setminus I_1}\right) \\
\boldsymbol{x}_{I \backslash I_1} &\mapsto \boldsymbol{x}_{I \backslash I_1}
\end{aligned}
```
Here, $s$ and $t$ are arbitrary functions (often neural networks) called the "scaling" and "translation" functions, respectively.
They produce vectors of the
same dimension as $\boldsymbol{x}_{I_1}$.


## Implementing Affine Coupling Layer

We start by defining a simple 3-layer multi-layer perceptron (MLP) using `Flux.jl`,
which will be used to define the scaling $s$ and translation functions $t$ in the affine coupling layer.
```@example afc
using Flux
function MLP_3layer(input_dim::Int, hdims::Int, output_dim::Int; activation=Flux.leakyrelu)
return Chain(
Flux.Dense(input_dim, hdims, activation),
Flux.Dense(hdims, hdims, activation),
Flux.Dense(hdims, output_dim),
)
end
```

#### Construct the Object

Following the user interface of `Bijectors.jl`, we define a struct `AffineCoupling` as a subtype of `Bijectors.Bijector`.
The functions `parition` , `combine` are used to partition and recombine a vector into 3 disjoint subvectors.
And `PartitionMask` is used to store this partition rule.
These three functions are
all defined in `Bijectors.jl`; see the [documentaion](https://github.com/TuringLang/Bijectors.jl/blob/49c138fddd3561c893592a75b211ff6ad949e859/src/bijectors/coupling.jl#L3) for more details.

```@example afc
using Functors
using Bijectors
using Bijectors: partition, combine, PartitionMask
struct AffineCoupling <: Bijectors.Bijector
dim::Int
mask::Bijectors.PartitionMask
s::Flux.Chain
t::Flux.Chain
end
# to apply functions to the parameters that are contained in AffineCoupling.s and AffineCoupling.t,
# and to re-build the struct from the parameters, we use the functor interface of `Functors.jl`
# see https://fluxml.ai/Flux.jl/stable/models/functors/#Functors.functor
@functor AffineCoupling (s, t)
function AffineCoupling(
dim::Int, # dimension of input
hdims::Int, # dimension of hidden units for s and t
mask_idx::AbstractVector, # index of dimension that one wants to apply transformations on
)
cdims = length(mask_idx) # dimension of parts used to construct coupling law
s = MLP_3layer(cdims, hdims, cdims)
t = MLP_3layer(cdims, hdims, cdims)
mask = PartitionMask(dim, mask_idx)
return AffineCoupling(dim, mask, s, t)
end
```
By default, we define $s$ and $t$ using the `MLP_3layer` function, which is a
3-layer MLP with leaky ReLU activation function.

#### Implement the Forward and Inverse Transformations


```@example afc
function Bijectors.transform(af::AffineCoupling, x::AbstractVector)
# partition vector using 'af.mask::PartitionMask`
x₁, x₂, x₃ = partition(af.mask, x)
y₁ = x₁ .* af.s(x₂) .+ af.t(x₂)
return combine(af.mask, y₁, x₂, x₃)
end
function Bijectors.transform(iaf::Inverse{<:AffineCoupling}, y::AbstractVector)
af = iaf.orig
# partition vector using `af.mask::PartitionMask`
y_1, y_2, y_3 = partition(af.mask, y)
# inverse transformation
x_1 = (y_1 .- af.t(y_2)) ./ af.s(y_2)
return combine(af.mask, x_1, y_2, y_3)
end
```

#### Implement the Log-determinant of the Jacobian
Notice that here we wrap the transformation and the log-determinant of the Jacobian into a single function, `with_logabsdet_jacobian`.

```@example afc
function Bijectors.with_logabsdet_jacobian(af::AffineCoupling, x::AbstractVector)
x_1, x_2, x_3 = Bijectors.partition(af.mask, x)
y_1 = af.s(x_2) .* x_1 .+ af.t(x_2)
logjac = sum(log ∘ abs, af.s(x_2))
return combine(af.mask, y_1, x_2, x_3), logjac
end
function Bijectors.with_logabsdet_jacobian(
iaf::Inverse{<:AffineCoupling}, y::AbstractVector
)
af = iaf.orig
# partition vector using `af.mask::PartitionMask`
y_1, y_2, y_3 = partition(af.mask, y)
# inverse transformation
x_1 = (y_1 .- af.t(y_2)) ./ af.s(y_2)
logjac = -sum(log ∘ abs, af.s(y_2))
return combine(af.mask, x_1, y_2, y_3), logjac
end
```
#### Construct Normalizing Flow

Now with all the above implementations, we are ready to use the `AffineCoupling` layer for normalizing flow
by applying it to a base distribution $q_0$.

```@example afc
using Random, Distributions, LinearAlgebra
dim = 4
hdims = 10
Ls = [
AffineCoupling(dim, hdims, 1:2),
AffineCoupling(dim, hdims, 3:4),
AffineCoupling(dim, hdims, 1:2),
AffineCoupling(dim, hdims, 3:4),
]
ts = reduce(∘, Ls)
q₀ = MvNormal(zeros(Float32, dim), I)
flow = Bijectors.transformed(q₀, ts)
```
We can now sample from the flow:
```@example afc
x = rand(flow, 10)
```
And evaluate the density of the flow:
```@example afc
logpdf(flow, x[:,1])
```


## Reference
Dinh, L., Sohl-Dickstein, J. and Bengio, S., 2016. *Density estimation using real nvp.*
arXiv:1605.08803.
Binary file added docs/src/elbo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 45101e0

Please sign in to comment.