diff --git a/content/bayes_nn/bayes.md b/content/bayes_nn/bayes.md new file mode 100644 index 0000000..45f7b0a --- /dev/null +++ b/content/bayes_nn/bayes.md @@ -0,0 +1,318 @@ +# Bayesian Neural Network + +The usual Neural Network are optimized in way to get fixed value of weights and biases that allows the model perform specific task successfully. Instead in +Bayesian Neural Network the weights and biases are the distribution, this type of model could be treated as a ensemble of many neural networks trained by the Bayesian inference. + +Bayesian approach for the neural networks allows to estimate the uncertainty and make the decision of the model more robust according to the input data. + + + +### Difference between usual NN and BNN + + +![Placeholder](../images/bayes_nn/diff.png) + + +### Training of NN and BNN +=== "NN" + ![Placeholder](../images/bayes_nn/trainingNN.png) + The parameters ![formula](https://render.githubusercontent.com/render/math?math=\theta ) are optimized in order to minimaze the loss function + +=== "BNN" + ![Placeholder](../images/bayes_nn/bayesNN.png) + The process is to learn the probability distributions for weights and biases that maximize the likelihood of getting a high probability for the correct data/label ![formula](https://render.githubusercontent.com/render/math?math=D(x,y) ) pairs. + The parameters of the weights distributions - mean and standart deviation are the product of the loss function optimization + +#### Training Procedure + 1. Introduce the prior distribution over model parameter w + 2. Compute posterio p(w|D) using Bayesian rule + 3. Make the average over the posterior distribution + + + +### Prediction of NN and BNN +=== "NN" + ![Placeholder](../images/bayes_nn/PredictionNN.png) + +=== "BNN" + ![Placeholder](../images/bayes_nn/PredictionBNN.png) + + +### Uncertainty +The uncertainty that are quatified by the BNN are categorized in the next way: +=== "Alletonic" + Alletonic - uncertainties due to the lack of knowledge, comes from data or enviroment + ![formula](https://render.githubusercontent.com/render/math?math=p (\theta|D) ) +=== "Epistemic" + Epistemic - uncertainties of the model parameter + ![formula](https://render.githubusercontent.com/render/math?math=p(y|x,\theta)) + + + + + + + +## Packages +The are several packages for the probabilistic neural network, the tensorflow probability and pyro are the most consistent + +=== "Tensorflow" + ```python linenums="1" + pip install --upgrade tensorflow-probability + ``` +=== "Pyro" + ```python linenums="1" + pip install pyro + ``` + + +## Modules Description: + +### Distribution and sampling + +=== "Tensorflow" + +=== "Pyro" + +### Distribution and sampling + +=== "Tensorflow" + +=== "Pyro" + + + + +Let's consider simple linear regression as an example and compare it to the bayesian analog. + +## Linear Regression + +Lets consider simple dataset D(x, y) and we want to fit some linear function: +y=ax+b+e, where a,b are learnable parameters and e is observation noise. + +### Synthetic dataset +=== "Synthetic dataset" + ```python linenums="1" + + import numpy as np + w0 = 0.125 + b0 = 5. + x_range = [-20, 60] + + def load_dataset(n=150, n_tst=150): + np.random.seed(43) + def s(x): + g = (x - x_range[0]) / (x_range[1] - x_range[0]) + return 3 * (0.25 + g**2.) + x = (x_range[1] - x_range[0]) * np.random.rand(n) + x_range[0] + eps = np.random.randn(n) * s(x) + y = (w0 * x * (1. + np.sin(x)) + b0) + eps + x = x[..., np.newaxis] + x_tst = np.linspace(*x_range, num=n_tst).astype(np.float32) + x_tst = x_tst[..., np.newaxis] + return y, x, x_tst + + y, x, x_tst = load_dataset() + ``` + +### Probabilistic Linear regression +=== "tensorflow_probability" + + Let's consider you write your network model in a single `tf.function`. + + ```python linenums="1" + import tensorflow as tf + import tensorflow_probability as tfp + tfd = tfp.distributions + + # Build model. + model = tf.keras.Sequential([ + tf.keras.layers.Dense(1), + tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t, scale=1)), + ]) + + # Define the loss: + negloglik = lambda y, rv_y: -rv_y.log_prob(y) + + # Do inference. + model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.05), loss=negloglik) + model.fit(x, y, epochs=500, verbose=False) + + # Make predictions. + yhat = model(x_tst) + ``` + +=== "pyro" + + ```python linenums="1" + # coding: utf-8 + + from pyro.nn import PyroSample + + # Specify model. + + class BayesianRegression(PyroModule): + def __init__(self, in_features, out_features): + super().__init__() + self.linear = PyroModule[nn.Linear](in_features, out_features) + self.linear.weight = PyroSample(dist.Normal(0., 1.).expand([out_features, in_features]).to_event(2)) + self.linear.bias = PyroSample(dist.Normal(0., 10.).expand([out_features]).to_event(1)) + + def forward(self, x, y=None): + sigma = pyro.sample("sigma", dist.Uniform(0., 10.)) + mean = self.linear(x).squeeze(-1) + with pyro.plate("data", x.shape[0]): + obs = pyro.sample("obs", dist.Normal(mean, sigma), obs=y) + return mean + + + + # Build model. + model = BayesianRegression() + + # Fit model given data. + coeffs, linear_response, is_converged, num_iter = tfp.glm.fit( + model_matrix=features[:, tf.newaxis], + response=tf.cast(labels, dtype=tf.float32), + model=model) + # ==> coeffs is approximately [1.618] (We're golden!) + + # Do inference. + model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik) + model.fit(x, y, epochs=1000, verbose=False); + + # Profit. + [print(np.squeeze(w.numpy())) for w in model.weights]; + yhat = model(x_tst) + assert isinstance(yhat, tfd.Distribution) + + ``` + + +The output of the model: + +![Placeholder](../images/bayes_nn/lr.png) + + + + +## Variational Autoencoder + +The generative models could be build using the bayesian neural network. +The Variantional Autoencoder is the popular way for data synthethis. + +Let's consider the example of generating the images: + +The generating process consist of two steps: + +1. Samling the latent variable from prior distribution + +2. Drawing the sample from stochastic process ![formula](https://render.githubusercontent.com/render/math?math=x-p(z|x)) + +Objective: + +![formula](https://render.githubusercontent.com/render/math?math=p(z)) the prior on the latent representation ![formula](https://render.githubusercontent.com/render/math?math=z) , +![formula](https://render.githubusercontent.com/render/math?math=q(z|x)), the variational encoder, and +![formula](https://render.githubusercontent.com/render/math?math=p(x|z)), the decoder — how likely is the image x given the latent representation z. + +### Loss + +Once we define the procedure for the generation process the Objective function should be chosen for the optimization process. In order to train the network, we maximize the ELBO (Evidence Lower Bound) objective. + + +### Prior +p(z), the prior on the latent representation z, + +q(z|x), the variational encoder, and + +p(x|z), the decoder — how likely is the image x given the latent representation z. + + +### Encoder and Decoder +=== "tensorflow" + + ```python linenums="1" + ``` +=== "pyro" + + ```python linenums="1" + ``` + +### Training +=== "tensorflow" + + ```python linenums="1" + ``` +=== "pyro" + + ```python linenums="1" + ``` + + +### Results +=== "tensorflow" + + ```python linenums="1" + ``` +=== "pyro" + + ```python linenums="1" + ``` + + + +## Normalizing Flows + +### Defition + +=== "tensorflow" + + ```python linenums="1" + ``` +=== "pyro" + + ```python linenums="1" + ``` + +### Training +=== "tensorflow" + + ```python linenums="1" + ``` +=== "pyro" + + ```python linenums="1" + ``` + +### Inference +=== "tensorflow" + + ```python linenums="1" + ``` +=== "pyro" + + ```python linenums="1" + ``` + +## Resources + + +### Bayesian NN + + 1. https://arxiv.org/pdf/2007.06823.pdf + 2. http://krasserm.github.io/2019/03/14/bayesian-neural-networks/ + 3. https://arxiv.org/pdf/1807.02811.pdf + +### Normalizing Flow: + + 1. https://arxiv.org/abs/1908.09257 + 2. https://arxiv.org/pdf/1505.05770.pdf + +### Variational AutoEncoder: + + 1. https://arxiv.org/abs/1312.6114 + 2. https://pyro.ai/examples/vae.html + 3. https://www.tensorflow.org/probability/examples/Probabilistic_Layers_VAE + + + diff --git a/content/images/bayes_nn/PredictionBNN.png b/content/images/bayes_nn/PredictionBNN.png new file mode 100644 index 0000000..341081f Binary files /dev/null and b/content/images/bayes_nn/PredictionBNN.png differ diff --git a/content/images/bayes_nn/PredictionNN.png b/content/images/bayes_nn/PredictionNN.png new file mode 100644 index 0000000..0162cf5 Binary files /dev/null and b/content/images/bayes_nn/PredictionNN.png differ diff --git a/content/images/bayes_nn/VAE/function.png b/content/images/bayes_nn/VAE/function.png new file mode 100644 index 0000000..a5f87da Binary files /dev/null and b/content/images/bayes_nn/VAE/function.png differ diff --git a/content/images/bayes_nn/bayesNN.png b/content/images/bayes_nn/bayesNN.png new file mode 100644 index 0000000..11def67 Binary files /dev/null and b/content/images/bayes_nn/bayesNN.png differ diff --git a/content/images/bayes_nn/diff copy.png b/content/images/bayes_nn/diff copy.png new file mode 100644 index 0000000..68afde3 Binary files /dev/null and b/content/images/bayes_nn/diff copy.png differ diff --git a/content/images/bayes_nn/diff.png b/content/images/bayes_nn/diff.png new file mode 100644 index 0000000..68afde3 Binary files /dev/null and b/content/images/bayes_nn/diff.png differ diff --git a/content/images/bayes_nn/lr.png b/content/images/bayes_nn/lr.png new file mode 100644 index 0000000..687f6ad Binary files /dev/null and b/content/images/bayes_nn/lr.png differ diff --git a/content/images/bayes_nn/trainingNN.png b/content/images/bayes_nn/trainingNN.png new file mode 100644 index 0000000..8ed689b Binary files /dev/null and b/content/images/bayes_nn/trainingNN.png differ diff --git a/content/images/no_unc.png b/content/images/no_unc.png new file mode 100644 index 0000000..c7015c5 Binary files /dev/null and b/content/images/no_unc.png differ diff --git a/mkdocs.yml b/mkdocs.yml index 6cf5dd4..03e4dd0 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -69,7 +69,8 @@ markdown_extensions: - pymdownx.tasklist: custom_checkbox: true - pymdownx.tilde - + - pymdownx.arithmatex: + generic: true extra_javascript: - https://unpkg.com/mermaid@8.6/dist/mermaid.min.js @@ -91,6 +92,8 @@ nav: - Introduction: starter/introduction.md - Optimization: - Model optimization: optimization/introduction.md + - Bayesian Optimization: optimization/bayes.md + - Inference: - Direct inference: - TensorFlow 2: inference/tensorflow2.md @@ -98,6 +101,8 @@ nav: - ONNX: inference/onnx.md - XGBoost: inference/xgboost.md - hls4ml: inference/hls4ml.md + - Bayesian NN: bayes_nn/bayes.md + - Inference as a service: - Sonic/Triton: inference/sonic_triton.md - Integration checklist: inference/checklist.md diff --git a/site/optimization/bayes.html b/site/optimization/bayes.html new file mode 100644 index 0000000..1d20862 --- /dev/null +++ b/site/optimization/bayes.html @@ -0,0 +1,184 @@ + Bayesian Optimization - CMS Machine Learning Documentation
Skip to content

Bayesian Neural Network:

The usual Neural Network are optimized in order to get fixed value of weights and biases. Instead Bayesian Neural Network the weights and biases are the distribution, and this type of model could be treated as a ensemble of many neural networks.

Bayesian approach for the neural networks allows to estimate the uncertainty, and make the desicion of the model more robust.

Difference between usual NN and bayesian:

Normal Neural Network | Bayesian Neural Network:

Placeholder

Training

Placeholder

Placeholder

Prediction

Placeholder

Placeholder

Uncertainty:

- Alletonic - comes from data or enviroment *p(|D)*
+- Epistemic - uncertainties of the model parameter *p(y|w)*
+

Training Procedure:

1. Introduce the prior distribution over model parameter w
+2. Comput posterio p(w|D) using Bayesian rule
+3. Make the average over the posterior disrtibution
+

Package installation:

1
    pip install --upgrade tensorflow-probability
+
1
    pip install pyro
+

Let's consider simple linear regression as an example and compare it to the bayesian analog.

Linear regression:

Let's consider you write your network model in a single tf.function.

 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
import tensorflow as tf
+import tensorflow_probability as tfp
+tfd = tfp.distributions
+
+# Build model.
+model = tf.keras.Sequential([
+tf.keras.layers.Dense(1),
+tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t, scale=1)),
+])
+
+# Do inference.
+model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.05), loss=negloglik)
+model.fit(x, y, epochs=500, verbose=False)
+
+# Make predictions.
+yhat = model(x_tst)
+

The output of the model:

Placeholder

Bayesian Linear regression:

Let's consider you write your network model in a single tf.function.

 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
# coding: utf-8
+
+import tensorflow as tf
+import tensorflow_probability as tfp
+
+# Pretend to load synthetic data set.
+features = tfp.distributions.Normal(loc=0., scale=1.).sample(int(100e3))
+labels = tfp.distributions.Bernoulli(logits=1.618 * features).sample()
+
+# Specify model.
+model = tfp.glm.Bernoulli()
+
+# Fit model given data.
+coeffs, linear_response, is_converged, num_iter = tfp.glm.fit(
+model_matrix=features[:, tf.newaxis],
+response=tf.cast(labels, dtype=tf.float32),
+model=model)
+# ==> coeffs is approximately [1.618] (We're golden!)
+
+# Build model.
+model = tf.keras.Sequential([
+tf.keras.layers.Dense(1 + 1),
+tfp.layers.DistributionLambda(
+    lambda t: tfd.Normal(loc=t[..., :1],
+                        scale=1e-3 + tf.math.softplus(0.05 * t[...,1:]))),
+])
+
+# Do inference.
+model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
+model.fit(x, y, epochs=1000, verbose=False);
+
+# Profit.
+[print(np.squeeze(w.numpy())) for w in model.weights];
+yhat = model(x_tst)
+assert isinstance(yhat, tfd.Distribution)
+

Placeholder

Bayesian Linear regression:

Let's consider you write your network model in a single tf.function.

 1
+ 2
+ 3
+ 4
+ 5
+ 6
+ 7
+ 8
+ 9
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
# Specify the surrogate posterior over `keras.layers.Dense` `kernel` and `bias`.
+def posterior_mean_field(kernel_size, bias_size=0, dtype=None):
+    n = kernel_size + bias_size
+    c = np.log(np.expm1(1.))
+    return tf.keras.Sequential([
+        tfp.layers.VariableLayer(2 * n, dtype=dtype),
+        tfp.layers.DistributionLambda(lambda t: tfd.Independent(
+            tfd.Normal(loc=t[..., :n],
+                        scale=1e-5 + tf.nn.softplus(c + t[..., n:])),
+            reinterpreted_batch_ndims=1)),
+    ])
+
+# Specify the prior over `keras.layers.Dense` `kernel` and `bias`.
+def prior_trainable(kernel_size, bias_size=0, dtype=None):
+    n = kernel_size + bias_size
+    return tf.keras.Sequential([
+        tfp.layers.VariableLayer(n, dtype=dtype),
+        tfp.layers.DistributionLambda(lambda t: tfd.Independent(
+            tfd.Normal(loc=t, scale=1),
+            reinterpreted_batch_ndims=1)),
+    ])
+
+
+# Build model.
+model = tf.keras.Sequential([
+tfp.layers.DenseVariational(1, posterior_mean_field, prior_trainable, kl_weight=1/x.shape[0]),
+tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t, scale=1)),
+])
+
+# Do inference.
+model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
+model.fit(x, y, epochs=1000, verbose=False);
+
+# Profit.
+[print(np.squeeze(w.numpy())) for w in model.weights];
+yhat = model(x_tst)
+assert isinstance(yhat, tfd.Distribution)
+

Placeholder

Put here some conclusion;

Variational Autoencoder:

The Variantional Autoencoder

Resources:

1. https://arxiv.org/pdf/2007.06823.pdf
+2. http://krasserm.github.io/2019/03/14/bayesian-neural-networks/
+3. https://arxiv.org/pdf/1807.02811.pdf
+

Last update: May 17, 2021
\ No newline at end of file