Skip to content

Commit

Permalink
Fix math display errors in readme (#28)
Browse files Browse the repository at this point in the history
* fix readme math err

* minor update readme

* add PR links to TODO
  • Loading branch information
zuhengxu authored Aug 23, 2023
1 parent 43c06ab commit 2f78875
Showing 1 changed file with 14 additions and 11 deletions.
25 changes: 14 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
[![Build Status](https://github.com/TuringLang/NormalizingFlows.jl/actions/workflows/CI.yml/badge.svg?branch=main)](https://github.com/TuringLang/NormalizingFlows.jl/actions/workflows/CI.yml?query=branch%3Amain)


**Last updated: 2023-Aug-23**

A normalizing flow library for Julia.

The purpose of this package is to provide a simple and flexible interface for variational inference (VI) and normalizing flows (NF) for Bayesian computation or generative modeling.
Expand Down Expand Up @@ -37,11 +39,11 @@ Z_N = T_{N, \theta_N} \circ \cdots \circ T_{1, \theta_1} (Z_0) , \quad Z_0 \sim
```
where $\theta = (\theta_1, \dots, \theta_N)$ is the parameter to be learned, and $q_{\theta}$ is the variational distribution (flow distribution). This describes **sampling procedure** of normalizing flows, which requires sending draws through a forward pass of these flow layers.

Since all the transformations are invertible (techinically [diffeomorphic](https://en.wikipedia.org/wiki/Diffeomorphism)), we can evaluate the density of a normalizing flow distribution $q_{\theta}$ by the change of variable formula:
Since all the transformations are invertible (technically [diffeomorphic](https://en.wikipedia.org/wiki/Diffeomorphism)), we can evaluate the density of a normalizing flow distribution $q_{\theta}$ by the change of variable formula:
```math
q_\theta(x)=\frac{q_0\left(T_1^{-1} \circ \cdots \circ
T_N^{-1}(x)\right)}{\prod_{n=1}^N J_n\left(T_n^{-1} \circ \cdots \circ
T_N^{-1}(x)\right)} \quad J_n(x)=\left|\operatorname{det} \nabla_x
T_N^{-1}(x)\right)} \quad J_n(x)=\left|\text{det} \nabla_x
T_n(x)\right|.
```
Here we drop the subscript $\theta_n, n = 1, \dots, N$ for simplicity.
Expand All @@ -52,17 +54,17 @@ Given the feasibility of i.i.d. sampling and density evaluation, normalizing flo
```math
\begin{aligned}
\text{Reverse KL:}\quad
&\argmin _{\theta} \mathbb{E}_{q_{\theta}}\left[\log q_{\theta}(Z)-\log p(Z)\right] \\
&= \argmin _{\theta} \mathbb{E}_{q_0}\left[\log \frac{q_\theta(T_N\circ \cdots \circ T_1(Z_0))}{p(T_N\circ \cdots \circ T_1(Z_0))}\right] \\
&= \argmax _{\theta} \mathbb{E}_{q_0}\left[ \log p\left(T_N \circ \cdots \circ T_1(Z_0)\right)-\log q_0(X)+\sum_{n=1}^N \log J_n\left(F_n \circ \cdots \circ F_1(X)\right)\right]
&\arg\min _{\theta} \mathbb{E}_{q_{\theta}}\left[\log q_{\theta}(Z)-\log p(Z)\right] \\
&= \arg\min _{\theta} \mathbb{E}_{q_0}\left[\log \frac{q_\theta(T_N\circ \cdots \circ T_1(Z_0))}{p(T_N\circ \cdots \circ T_1(Z_0))}\right] \\
&= \arg\max _{\theta} \mathbb{E}_{q_0}\left[ \log p\left(T_N \circ \cdots \circ T_1(Z_0)\right)-\log q_0(X)+\sum_{n=1}^N \log J_n\left(F_n \circ \cdots \circ F_1(X)\right)\right]
\end{aligned}
```
and
```math
\begin{aligned}
\text{Forward KL:}\quad
&\argmin _{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)-\log p(Z)\right] \\
&= \argmin _{\theta} \mathbb{E}_{p}\left[\log q_\theta(Z)\right]
&\arg\min _{\theta} \mathbb{E}_{p}\left[\log q_{\theta}(Z)-\log p(Z)\right] \\
&= \arg\min _{\theta} \mathbb{E}_{p}\left[\log q_\theta(Z)\right]
\end{aligned}
```
Both problems can be solved via standard stochastic optimization algorithms,
Expand All @@ -71,20 +73,21 @@ such as stochastic gradient descent (SGD) and its variants.
Reverse KL minimization is typically used for **Bayesian computation**, where one
wants to approximate a posterior distribution $p$ that is only known up to a
normalizing constant.
In contrast, forward KL minimization is typically used for **generative modeling**, where one wants to approximate a complex distribution $p$ that is known up to a normalizing constant.
In contrast, forward KL minimization is typically used for **generative modeling**,
where one wants to learn the underlying distribution of some data.

## Current status and TODOs

- [x] general interface development
- [x] documentation
- [ ] including more flow examples
- [ ] including more NF examples/Tutorials
- WIP: [PR#11](https://github.com/TuringLang/NormalizingFlows.jl/pull/11)
- [ ] GPU compatibility
- WIP: [PR#25](https://github.com/TuringLang/NormalizingFlows.jl/pull/25)
- [ ] benchmarking

## Related packages
- [Bijectors.jl](https://github.com/TuringLang/Bijectors.jl): a package for defining bijective transformations, which can be used for defining customized flow layers.
- [Flux.jl](https://fluxml.ai/Flux.jl/stable/)
- [Optimisers.jl](https://github.com/FluxML/Optimisers.jl)
- [AdvancedVI.jl](https://github.com/TuringLang/AdvancedVI.jl)


0 comments on commit 2f78875

Please sign in to comment.