Skip to content

Commit 45101e0

Browse files
zuhengxucpfiffersunxd3
authored
Documentation (#27)
* add NF intro * set up doc files * add gitignore * minor update to readme * update home page * update docs for each funciton * update docs * src * update function docs * update docs * fix readme math rendering issue * update docs * update example doc * update customize layer docs * finish docs * finish docs * Update README.md Co-authored-by: Cameron Pfiffer <[email protected]> * Update README.md Co-authored-by: Cameron Pfiffer <[email protected]> * Update README.md Co-authored-by: Cameron Pfiffer <[email protected]> * Update docs/src/index.md Co-authored-by: Xianda Sun <[email protected]> * Update README.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/index.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/customized_layer.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/customized_layer.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/customized_layer.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/customized_layer.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/customized_layer.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/index.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/index.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/index.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/example.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/example.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/example.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/example.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/example.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/example.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/example.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/customized_layer.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/customized_layer.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/customized_layer.md Co-authored-by: Xianda Sun <[email protected]> * Update docs/src/customized_layer.md Co-authored-by: Xianda Sun <[email protected]> * minor ed * minor ed to fix latex issue * minor update --------- Co-authored-by: Cameron Pfiffer <[email protected]> Co-authored-by: Xianda Sun <[email protected]>
1 parent 8f4371d commit 45101e0

File tree

14 files changed

+579
-13
lines changed

14 files changed

+579
-13
lines changed

README.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,3 +2,89 @@
22

33
[![Dev](https://img.shields.io/badge/docs-dev-blue.svg)](https://turinglang.github.io/NormalizingFlows.jl/dev/)
44
[![Build Status](https://github.com/TuringLang/NormalizingFlows.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/TuringLang/NormalizingFlows.jl/actions/workflows/CI.yml?query=branch%3Amain)
5+
6+
7+
A normalizing flow library for Julia.
8+
9+
The purpose of this package is to provide a simple and flexible interface for variational inference (VI) and normalizing flows (NF) for Bayesian computation or generative modeling.
10+
The key focus is to ensure modularity and extensibility, so that users can easily
11+
construct (e.g., define customized flow layers) and combine various components
12+
(e.g., choose different VI objectives or gradient estimates)
13+
for variational approximation of general target distributions,
14+
without being tied to specific probabilistic programming frameworks or applications.
15+
16+
See the [documentation](https://turinglang.org/NormalizingFlows.jl/dev/) for more.
17+
18+
## Installation
19+
To install the package, run the following command in the Julia REPL:
20+
```julia
21+
] # enter Pkg mode
22+
(@v1.9) pkg> add git@github.com:TuringLang/NormalizingFlows.jl.git
23+
```
24+
Then simply run the following command to use the package:
25+
```julia
26+
using NormalizingFlows
27+
```
28+
29+
## Quick recap of normalizing flows
30+
Normalizing flows transform a simple reference distribution $q_0$ (sometimes known as base distribution) to
31+
a complex distribution $q$ using invertible functions.
32+
33+
In more details, given the base distribution, usually a standard Gaussian distribution, i.e., $q_0 = \mathcal{N}(0, I)$,
34+
we apply a series of parameterized invertible transformations (called flow layers), $T_{1, \theta_1}, \cdots, T_{N, \theta_k}$, yielding that
35+
```math
36+
Z_N = T_{N, \theta_N} \circ \cdots \circ T_{1, \theta_1} (Z_0) , \quad Z_0 \sim q_0,\quad Z_N \sim q_{\theta},
37+
```
38+
where $\theta = (\theta_1, \dots, \theta_N)$ is the parameter to be learned, and $q_{\theta}$ is the variational distribution (flow distribution). This describes **sampling procedure** of normalizing flows, which requires sending draws through a forward pass of these flow layers.
39+
40+
Since all the transformations are invertible (techinically [diffeomorphic](https://en.wikipedia.org/wiki/Diffeomorphism)), we can evaluate the density of a normalizing flow distribution $q_{\theta}$ by the change of variable formula:
41+
```math
42+
q_\theta(x)=\frac{q_0\left(T_1^{-1} \circ \cdots \circ
43+
T_N^{-1}(x)\right)}{\prod_{n=1}^N J_n\left(T_n^{-1} \circ \cdots \circ
44+
T_N^{-1}(x)\right)} \quad J_n(x)=\left|\operatorname{det} \nabla_x
45+
T_n(x)\right|.
46+
```
47+
Here we drop the subscript $\theta_n, n = 1, \dots, N$ for simplicity.
48+
Density evaluation of normalizing flow requires computing the **inverse** and the
49+
**Jacobian determinant** of each flow layer.
50+
51+
Given the feasibility of i.i.d. sampling and density evaluation, normalizing flows can be trained by minimizing some statistical distances to the target distribution $p$. The typical choice of the statistical distance is the forward and backward Kullback-Leibler (KL) divergence, which leads to the following optimization problems:
52+
```math
53+
\begin{aligned}
54+
\text{Reverse KL:}\quad
55+
&\argmin _{\theta} \mathbb{E}_{q_{\theta}}\left[\log q_{\theta}(Z)-\log p(Z)\right] \\
56+
&= \argmin _{\theta} \mathbb{E}_{q_0}\left[\log \frac{q_\theta(T_N\circ \cdots \circ T_1(Z_0))}{p(T_N\circ \cdots \circ T_1(Z_0))}\right] \\
57+
&= \argmax _{\theta} \mathbb{E}_{q_0}\left[ \log p\left(T_N \circ \cdots \circ T_1(Z_0)\right)-\log q_0(X)+\sum_{n=1}^N \log J_n\left(F_n \circ \cdots \circ F_1(X)\right)\right]
58+
\end{aligned}
59+
```
60+
and
61+
```math
62+
\begin{aligned}
63+
\text{Forward KL:}\quad
64+
&\argmin _{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)-\log p(Z)\right] \\
65+
&= \argmin _{\theta} \mathbb{E}_{p}\left[\log q_\theta(Z)\right]
66+
\end{aligned}
67+
```
68+
Both problems can be solved via standard stochastic optimization algorithms,
69+
such as stochastic gradient descent (SGD) and its variants.
70+
71+
Reverse KL minimization is typically used for **Bayesian computation**, where one
72+
wants to approximate a posterior distribution $p$ that is only known up to a
73+
normalizing constant.
74+
In contrast, forward KL minimization is typically used for **generative modeling**, where one wants to approximate a complex distribution $p$ that is known up to a normalizing constant.
75+
76+
## Current status and TODOs
77+
78+
- [x] general interface development
79+
- [x] documentation
80+
- [ ] including more flow examples
81+
- [ ] GPU compatibility
82+
- [ ] benchmarking
83+
84+
## Related packages
85+
- [Bijectors.jl](https://github.com/TuringLang/Bijectors.jl): a package for defining bijective transformations, which can be used for defining customized flow layers.
86+
- [Flux.jl](https://fluxml.ai/Flux.jl/stable/)
87+
- [Optimisers.jl](https://github.com/FluxML/Optimisers.jl)
88+
- [AdvancedVI.jl](https://github.com/TuringLang/AdvancedVI.jl)
89+
90+

docs/.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
build/
2+
site/

docs/make.jl

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,12 @@ makedocs(;
1010
repo="https://github.com/TuringLang/NormalizingFlows.jl/blob/{commit}{path}#{line}",
1111
sitename="NormalizingFlows.jl",
1212
format=Documenter.HTML(),
13-
pages=["Home" => "index.md"],
13+
pages=[
14+
"Home" => "index.md",
15+
"API" => "api.md",
16+
"Example" => "example.md",
17+
"Customize your own flow layer" => "customized_layer.md",
18+
],
1419
)
1520

1621
deploydocs(; repo="github.com/TuringLang/NormalizingFlows.jl", devbranch="main")

docs/src/api.md

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
## API
2+
3+
```@index
4+
```
5+
6+
7+
## Main Function
8+
9+
```@docs
10+
NormalizingFlows.train_flow
11+
```
12+
13+
The flow object can be constructed by `transformed` function in `Bijectors.jl` package.
14+
For example of Gaussian VI, we can construct the flow as follows:
15+
```@julia
16+
using Distributions, Bijectors
17+
T= Float32
18+
q₀ = MvNormal(zeros(T, 2), ones(T, 2))
19+
flow = Bijectors.transformed(q₀, Bijectors.Shift(zeros(T,2)) ∘ Bijectors.Scale(ones(T, 2)))
20+
```
21+
To train the Gaussian VI targeting at distirbution $p$ via ELBO maiximization, we can run
22+
```@julia
23+
using NormalizingFlows
24+
25+
sample_per_iter = 10
26+
flow_trained, stats, _ = train_flow(
27+
elbo,
28+
flow,
29+
logp,
30+
sample_per_iter;
31+
max_iters=2_000,
32+
optimiser=Optimisers.ADAM(0.01 * one(T)),
33+
)
34+
```
35+
## Variational Objectives
36+
We have implemented two variational objectives, namely, ELBO and the log-likelihood objective.
37+
Users can also define their own objective functions, and pass it to the [`train_flow`](@ref) function.
38+
`train_flow` will optimize the flow parameters by maximizing `vo`.
39+
The objective function should take the following general form:
40+
```julia
41+
vo(rng, flow, args...)
42+
```
43+
where `rng` is the random number generator, `flow` is the flow object, and `args...` are the
44+
additional arguments that users can pass to the objective function.
45+
46+
#### Evidence Lower Bound (ELBO)
47+
By maximizing the ELBO, it is equivalent to minimizing the reverse KL divergence between $q_\theta$ and $p$, i.e.,
48+
```math
49+
\begin{aligned}
50+
&\min _{\theta} \mathbb{E}_{q_{\theta}}\left[\log q_{\theta}(Z)-\log p(Z)\right] \quad \text{(Reverse KL)}\\
51+
& = \max _{\theta} \mathbb{E}_{q_0}\left[ \log p\left(T_N \circ \cdots \circ
52+
T_1(Z_0)\right)-\log q_0(X)+\sum_{n=1}^N \log J_n\left(F_n \circ \cdots \circ
53+
F_1(X)\right)\right] \quad \text{(ELBO)}
54+
\end{aligned}
55+
```
56+
Reverse KL minimization is typically used for **Bayesian computation**,
57+
where one only has access to the log-(unnormalized)density of the target distribution $p$ (e.g., a Bayesian posterior distribution),
58+
and hope to generate approximate samples from it.
59+
60+
```@docs
61+
NormalizingFlows.elbo
62+
```
63+
#### Log-likelihood
64+
65+
By maximizing the log-likelihood, it is equivalent to minimizing the forward KL divergence between $q_\theta$ and $p$, i.e.,
66+
```math
67+
\begin{aligned}
68+
& \min_{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)-\log p(Z)\right] \quad \text{(Forward KL)} \\
69+
& = \max_{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)\right] \quad \text{(Expected log-likelihood)}
70+
\end{aligned}
71+
```
72+
Forward KL minimization is typically used for **generative modeling**,
73+
where one is given a set of samples from the target distribution $p$ (e.g., images)
74+
and aims to learn the density or a generative process that outputs high quality samples.
75+
76+
```@docs
77+
NormalizingFlows.loglikelihood
78+
```
79+
80+
81+
## Training Loop
82+
83+
```@docs
84+
NormalizingFlows.optimize
85+
```
86+
87+
88+
## Utility Functions for Taking Gradient
89+
```@docs
90+
NormalizingFlows.grad!
91+
NormalizingFlows.value_and_gradient!
92+
```
93+

docs/src/banana.png

43.1 KB
Loading

docs/src/comparison.png

40.1 KB
Loading

docs/src/customized_layer.md

Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
# Defining Your Own Flow Layer
2+
3+
In practice, user might want to define their own normalizing flow.
4+
As briefly noted in [What are normalizing flows?](@ref), the key is to define a
5+
customized normalizing flow layer, including its transformation and inverse,
6+
as well as the log-determinant of the Jacobian of the transformation.
7+
`Bijectors.jl` offers a convenient interface to define a customized bijection.
8+
We refer users to [the documentation of
9+
`Bijectors.jl`](https://turinglang.org/Bijectors.jl/dev/transforms/#Implementing-a-transformation)
10+
for more details.
11+
`Flux.jl` is also a useful package, offering a convenient interface to define neural networks.
12+
13+
14+
In this tutorial, we demonstrate how to define a customized normalizing flow
15+
layer -- an `Affine Coupling Layer` (Dinh *et al.*, 2016) -- using `Bijectors.jl` and `Flux.jl`.
16+
17+
## Affine Coupling Flow
18+
19+
Given an input vector $\boldsymbol{x}$, the general *coupling transformation* splits it into two
20+
parts: $\boldsymbol{x}_{I_1}$ and $\boldsymbol{x}_{I\setminus I_1}$. Only one
21+
part (e.g., $\boldsymbol{x}_{I_1}$) undergoes a bijective transformation $f$, noted as the *coupling law*,
22+
based on the values of the other part (e.g., $\boldsymbol{x}_{I\setminus I_1}$), which remains unchanged.
23+
```math
24+
\begin{array}{llll}
25+
c_{I_1}(\cdot ; f, \theta): & \mathbb{R}^d \rightarrow \mathbb{R}^d & c_{I_1}^{-1}(\cdot ; f, \theta): & \mathbb{R}^d \rightarrow \mathbb{R}^d \\
26+
& \boldsymbol{x}_{I \backslash I_1} \mapsto \boldsymbol{x}_{I \backslash I_1} & & \boldsymbol{y}_{I \backslash I_1} \mapsto \boldsymbol{y}_{I \backslash I_1} \\
27+
& \boldsymbol{x}_{I_1} \mapsto f\left(\boldsymbol{x}_{I_1} ; \theta\left(\boldsymbol{x}_{I\setminus I_1}\right)\right) & & \boldsymbol{y}_{I_1} \mapsto f^{-1}\left(\boldsymbol{y}_{I_1} ; \theta\left(\boldsymbol{y}_{I\setminus I_1}\right)\right)
28+
\end{array}
29+
```
30+
Here $\theta$ can be an arbitrary function, e.g., a neural network.
31+
As long as $f(\cdot; \theta(\boldsymbol{x}_{I\setminus I_1}))$ is invertible, $c_{I_1}$ is invertible, and the
32+
Jacobian determinant of $c_{I_1}$ is easy to compute:
33+
```math
34+
\left|\text{det} \nabla_x c_{I_1}(x)\right| = \left|\text{det} \nabla_{x_{I_1}} f(x_{I_1}; \theta(x_{I\setminus I_1}))\right|
35+
```
36+
37+
The affine coupling layer is a special case of the coupling transformation, where the coupling law $f$ is an affine function:
38+
```math
39+
\begin{aligned}
40+
\boldsymbol{x}_{I_1} &\mapsto \boldsymbol{x}_{I_1} \odot s\left(\boldsymbol{x}_{I\setminus I_1}\right) + t\left(\boldsymbol{x}_{I \setminus I_1}\right) \\
41+
\boldsymbol{x}_{I \backslash I_1} &\mapsto \boldsymbol{x}_{I \backslash I_1}
42+
\end{aligned}
43+
```
44+
Here, $s$ and $t$ are arbitrary functions (often neural networks) called the "scaling" and "translation" functions, respectively.
45+
They produce vectors of the
46+
same dimension as $\boldsymbol{x}_{I_1}$.
47+
48+
49+
## Implementing Affine Coupling Layer
50+
51+
We start by defining a simple 3-layer multi-layer perceptron (MLP) using `Flux.jl`,
52+
which will be used to define the scaling $s$ and translation functions $t$ in the affine coupling layer.
53+
```@example afc
54+
using Flux
55+
56+
function MLP_3layer(input_dim::Int, hdims::Int, output_dim::Int; activation=Flux.leakyrelu)
57+
return Chain(
58+
Flux.Dense(input_dim, hdims, activation),
59+
Flux.Dense(hdims, hdims, activation),
60+
Flux.Dense(hdims, output_dim),
61+
)
62+
end
63+
```
64+
65+
#### Construct the Object
66+
67+
Following the user interface of `Bijectors.jl`, we define a struct `AffineCoupling` as a subtype of `Bijectors.Bijector`.
68+
The functions `parition` , `combine` are used to partition and recombine a vector into 3 disjoint subvectors.
69+
And `PartitionMask` is used to store this partition rule.
70+
These three functions are
71+
all defined in `Bijectors.jl`; see the [documentaion](https://github.com/TuringLang/Bijectors.jl/blob/49c138fddd3561c893592a75b211ff6ad949e859/src/bijectors/coupling.jl#L3) for more details.
72+
73+
```@example afc
74+
using Functors
75+
using Bijectors
76+
using Bijectors: partition, combine, PartitionMask
77+
78+
struct AffineCoupling <: Bijectors.Bijector
79+
dim::Int
80+
mask::Bijectors.PartitionMask
81+
s::Flux.Chain
82+
t::Flux.Chain
83+
end
84+
85+
# to apply functions to the parameters that are contained in AffineCoupling.s and AffineCoupling.t,
86+
# and to re-build the struct from the parameters, we use the functor interface of `Functors.jl`
87+
# see https://fluxml.ai/Flux.jl/stable/models/functors/#Functors.functor
88+
@functor AffineCoupling (s, t)
89+
90+
function AffineCoupling(
91+
dim::Int, # dimension of input
92+
hdims::Int, # dimension of hidden units for s and t
93+
mask_idx::AbstractVector, # index of dimension that one wants to apply transformations on
94+
)
95+
cdims = length(mask_idx) # dimension of parts used to construct coupling law
96+
s = MLP_3layer(cdims, hdims, cdims)
97+
t = MLP_3layer(cdims, hdims, cdims)
98+
mask = PartitionMask(dim, mask_idx)
99+
return AffineCoupling(dim, mask, s, t)
100+
end
101+
```
102+
By default, we define $s$ and $t$ using the `MLP_3layer` function, which is a
103+
3-layer MLP with leaky ReLU activation function.
104+
105+
#### Implement the Forward and Inverse Transformations
106+
107+
108+
```@example afc
109+
function Bijectors.transform(af::AffineCoupling, x::AbstractVector)
110+
# partition vector using 'af.mask::PartitionMask`
111+
x₁, x₂, x₃ = partition(af.mask, x)
112+
y₁ = x₁ .* af.s(x₂) .+ af.t(x₂)
113+
return combine(af.mask, y₁, x₂, x₃)
114+
end
115+
116+
function Bijectors.transform(iaf::Inverse{<:AffineCoupling}, y::AbstractVector)
117+
af = iaf.orig
118+
# partition vector using `af.mask::PartitionMask`
119+
y_1, y_2, y_3 = partition(af.mask, y)
120+
# inverse transformation
121+
x_1 = (y_1 .- af.t(y_2)) ./ af.s(y_2)
122+
return combine(af.mask, x_1, y_2, y_3)
123+
end
124+
```
125+
126+
#### Implement the Log-determinant of the Jacobian
127+
Notice that here we wrap the transformation and the log-determinant of the Jacobian into a single function, `with_logabsdet_jacobian`.
128+
129+
```@example afc
130+
function Bijectors.with_logabsdet_jacobian(af::AffineCoupling, x::AbstractVector)
131+
x_1, x_2, x_3 = Bijectors.partition(af.mask, x)
132+
y_1 = af.s(x_2) .* x_1 .+ af.t(x_2)
133+
logjac = sum(log ∘ abs, af.s(x_2))
134+
return combine(af.mask, y_1, x_2, x_3), logjac
135+
end
136+
137+
function Bijectors.with_logabsdet_jacobian(
138+
iaf::Inverse{<:AffineCoupling}, y::AbstractVector
139+
)
140+
af = iaf.orig
141+
# partition vector using `af.mask::PartitionMask`
142+
y_1, y_2, y_3 = partition(af.mask, y)
143+
# inverse transformation
144+
x_1 = (y_1 .- af.t(y_2)) ./ af.s(y_2)
145+
logjac = -sum(log ∘ abs, af.s(y_2))
146+
return combine(af.mask, x_1, y_2, y_3), logjac
147+
end
148+
```
149+
#### Construct Normalizing Flow
150+
151+
Now with all the above implementations, we are ready to use the `AffineCoupling` layer for normalizing flow
152+
by applying it to a base distribution $q_0$.
153+
154+
```@example afc
155+
using Random, Distributions, LinearAlgebra
156+
dim = 4
157+
hdims = 10
158+
Ls = [
159+
AffineCoupling(dim, hdims, 1:2),
160+
AffineCoupling(dim, hdims, 3:4),
161+
AffineCoupling(dim, hdims, 1:2),
162+
AffineCoupling(dim, hdims, 3:4),
163+
]
164+
ts = reduce(∘, Ls)
165+
q₀ = MvNormal(zeros(Float32, dim), I)
166+
flow = Bijectors.transformed(q₀, ts)
167+
```
168+
We can now sample from the flow:
169+
```@example afc
170+
x = rand(flow, 10)
171+
```
172+
And evaluate the density of the flow:
173+
```@example afc
174+
logpdf(flow, x[:,1])
175+
```
176+
177+
178+
## Reference
179+
Dinh, L., Sohl-Dickstein, J. and Bengio, S., 2016. *Density estimation using real nvp.*
180+
arXiv:1605.08803.

docs/src/elbo.png

22.7 KB
Loading

0 commit comments

Comments
 (0)