Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation update #34

Merged
merged 3 commits into from
Aug 4, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 7 additions & 4 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Introduction

:margarine: Marginal Bayesian Statistics
:Authors: Harry T.J. Bevins
:Version: 1.0.0
:Version: 1.0.1
:Homepage: https://github.com/htjb/margarine
:Documentation: https://margarine.readthedocs.io/

Expand Down Expand Up @@ -46,8 +46,7 @@ Details/Examples
feasible given a set of samples from an MCMC or nested sampling run.

An example of how to use the code can be found on the github in the
jupyter notebook `notebook/Tutorial.ipynb`, alternatively
in the compiled documentation or at
jupyter notebook `notebook/Tutorial.ipynb` or alternatively at
`here <https://mybinder.org/v2/gh/htjb/margarine/7f55f9a9d3f3adb2356cb94b32c599caac8ea1ef?urlpath=lab%2Ftree%2Fnotebook%2FTutorial.ipynb>`_.

Documentation
Expand All @@ -62,7 +61,11 @@ To compile it locally you can run
cd docs
sphinx-build source html-build

after cloning the repo and installing the relevant packages.
after cloning the repo and installing the relevant packages with

.. code:: bash

pip install sphinx numpydoc sphinx_rtd_theme

Licence and Citation
--------------------
Expand Down
2 changes: 1 addition & 1 deletion docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
sphinx
sphinx_rtd_theme
numpydoc==1.1.0
numpydoc
297 changes: 3 additions & 294 deletions docs/source/tutorial.rst
Original file line number Diff line number Diff line change
@@ -1,297 +1,6 @@
Tutorial
========

Preliminaries
-------------

The following set of examples shows the user how to train a Masked
Autoregressive Flow (MAF) and an example Kernel Density Estimator (KDE).
We further demonstrate how to use ``margarine`` to estimate the Kullback
Leibler divergence and Bayesian Dimensionality with the trained MAF and
KDE.

The code requires `anesthetic <https://pypi.org/project/anesthetic/>`__
to run.

.. code:: ipython3

import numpy as np
import matplotlib.pyplot as plt
from anesthetic.samples import NestedSamples

In order to demonstrate the applications of the code we need to load
some example samples from a nested sampling run and we can visualise the
posterior distributions with ``anesthetic``. We write a helper function
to load the chains and transform the parameters in the first three
columns, which were generated with a log-uniform prior, into the unifrom
parameter space.

``margarine`` currently assumes that the parameters are uniformly
distributed when calculating the KL divergence and bayesian
dimensionality. It is therefore important to transform the parameters
into the uniform parameter space before we train our MAF and KDE.

.. code:: ipython3

def load_chains(root):
"""
Function uses anesthetic to load in a set of chains and returns
the pandas table of samples, a numpy
array of the parameters in the uniform space and weights.
"""

samples = NestedSamples(root=root)

try:
names = ['p' + str(i) for i in range(ndims)]
theta = samples[names].values
except:
names = [i for i in range(ndims)]
theta = samples[names].values

weights = samples.weights

return samples, theta, weights

ndims=5

root = '../tests/test_samples/test'
samples, theta, weights = load_chains(root)


To visualise the posterior we define another helper function that will
be useful later in the notebook.

.. code:: ipython3

from anesthetic.plot import hist_plot_1d, hist_plot_2d

def plotter(theta, names, w=None, ndims=5):
""" Helper function that uses anesthetic to produce corner plots """
fig, axes = plt.subplots(ndims, ndims, figsize=(8, 8), sharex='col')
for i in range(ndims):
for j in range(ndims):
if i < j:
axes[i, j].axis('off')
if i == j:
hist_plot_1d(axes[i, j], theta[:, i], weights=w,
xmin=theta[:, i].min(), xmax=theta[:, i].max(), color='r',
histtype='step', bins=25, density=True)
if i < j:
hist_plot_2d(axes[j, i], theta[:, i], theta[:, j],
weights=w,
xmin=theta[:, i].min(), xmax=theta[:, i].max(),
ymin=theta[:, j].min(), ymax=theta[:, j].max(),
bins=25, density=True, cmap=plt.get_cmap('Reds'))
if j not in set([0, ndims]):
axes[i, j].set_yticklabels([])
if j == 0:
if i == 0:
axes[i, j].set_yticklabels([])
else:
axes[i, j].set_ylabel(names[i])
if i == ndims-1:
axes[i, j].set_xlabel(names[j])
plt.tight_layout()
plt.subplots_adjust(hspace=0, wspace=0)
plt.show()

names = ['log(p' + str(i) + ')' if i in [0, 1, 2] else 'p' + str(i) for i in range(ndims)]
plotter(theta, names, weights)



.. image:: Tutorial_files/Tutorial_5_0.png


Masked Autoregressive Flows
---------------------------

Firstly we will look at training a Masked Autoregressive Flow or MAF
with ``margarine``. To train the MAF we first need to initalise the
class with the samples and corresponding weights.

.. code:: ipython3

import os
os.chdir('../')

from margarine.maf import MAF
from margarine.kde import KDE

bij = MAF(theta, weights)
bij.train(100)

We can then generate samples from the bijector using the following code
which technically takes samples on the hypercube and transforms them
into samples on the target posterior distribution,

.. code:: ipython3

x = bij(np.random.uniform(0, 1, size=(len(theta), theta.shape[-1])))

plotter(x, names)



.. image:: Tutorial_files/Tutorial_9_0.png


Alternatively we can generate samples with the following code which
takes in an integer and returns an array of shape (int, 5). The
``.sample()`` function is a proxy for ``__call__``.

.. code:: ipython3

x = bij.sample(5000)

We can then go ahead an calculate the corresponding kl divergence and
Bayesian dimensionality.

The samples presented here were generated using a gaussian likelihood
and fitting with nested sampling for 5 parameters. We can use
``anesthetic`` to calculate the KL divergence and Bayesian
dimensionality for the samples for comparison. We see very similar
results and note that the similarity improves with the number of epochs.

.. code:: ipython3

from margarine.marginal_stats import calculate

stats = calculate(bij).statistics()
print(stats.iloc[0, 0], samples.D())
print(stats.iloc[1, 0], samples.d())


.. parsed-literal::

3.310774486919951 3.3308079438366938
4.839712964526534 5.013952162478263


.. parsed-literal::

/Users/harry/Documents/margarine/margarine/marginal_stats.py:72: UserWarning: If prior samples are not provided the prior is assumed to be uniform and the posterior samples are are assumed to be from the same uniform space.
warnings.warn('If prior samples are not provided the prior is ' +
/Users/harry/Documents/margarine/margarine/marginal_stats.py:194: FutureWarning: The 'inplace' keyword in DataFrame.set_index is deprecated and will be removed in a future version. Use `df = df.set_index(..., copy=False)` instead.
results.set_index('Statistic', inplace=True)


We could imagine that the above set of parameters is a sub-sample of
perhaps signal parameters that we are interested in and having
marginalised out the nuisance parameters we can use ``margarine`` to
determine how well constrained the sub-space is.

As an example we can train a MAF on three of the parameters in this
distribution.

.. code:: ipython3

theta_reduced = theta[:, 1:-1]
names_reduced = names[1:-1]

bij = MAF(theta_reduced, weights)
bij.train(100)
x = bij.sample(5000)

plotter(x, names_reduced, ndims=3)

stats = calculate(bij).statistics()
print(stats)



.. image:: Tutorial_files/Tutorial_15_0.png


.. parsed-literal::

Value Lower Bound Upper Bound
Statistic
KL Divergence 1.994334 1.992596 1.997777
BMD 2.804455 2.562036 3.311070


.. parsed-literal::

/Users/harry/Documents/margarine/margarine/marginal_stats.py:72: UserWarning: If prior samples are not provided the prior is assumed to be uniform and the posterior samples are are assumed to be from the same uniform space.
warnings.warn('If prior samples are not provided the prior is ' +
/Users/harry/Documents/margarine/margarine/marginal_stats.py:194: FutureWarning: The 'inplace' keyword in DataFrame.set_index is deprecated and will be removed in a future version. Use `df = df.set_index(..., copy=False)` instead.
results.set_index('Statistic', inplace=True)


Kernel Density Estimators
=========================

We can perform a similar analysis using Kernel Density Estimators rather
than MAFs which is done with the following code. Note that the
generation of the ‘trained’ model is significantly quicker than when
performed with the MAFs.

.. code:: ipython3

from margarine.kde import KDE
kde = KDE(theta, weights)
kde.generate_kde()
x = kde.sample(5000)

plotter(x, names)

stats = calculate(kde).statistics()
print(stats.iloc[0, 0], '+(-)', stats.iloc[0, 2] - stats.iloc[0, 0],
'(', stats.iloc[0, 0] - stats.iloc[0, 1], ')', samples.D())
print(stats.iloc[1, 0], '+(-)', stats.iloc[1, 2] - stats.iloc[1, 0],
'(', stats.iloc[1, 0] - stats.iloc[1, 1], ')', samples.d())



.. image:: Tutorial_files/Tutorial_17_0.png


.. parsed-literal::

/Users/harry/Documents/margarine/margarine/marginal_stats.py:72: UserWarning: If prior samples are not provided the prior is assumed to be uniform and the posterior samples are are assumed to be from the same uniform space.
warnings.warn('If prior samples are not provided the prior is ' +


.. parsed-literal::

3.609496187306833 +(-) 0.2765981957821131 ( 0.48614620453886204 ) 3.3308079438366938
1.913573635193844 +(-) 2.048906422381939 ( 0.5664883865810604 ) 5.013952162478263


.. parsed-literal::

/Users/harry/Documents/margarine/margarine/marginal_stats.py:194: FutureWarning: The 'inplace' keyword in DataFrame.set_index is deprecated and will be removed in a future version. Use `df = df.set_index(..., copy=False)` instead.
results.set_index('Statistic', inplace=True)


Rather than using the ``kde.sample()`` function to generate samples we
could transform samples from the hypercube with the following code and
the ``__call__()`` function. However, we note that this is a much slower
method of generating samples as it is designed to be bijective.
Transformation from the hypercube is useful if we would like to use a
trained KDE or MAF as the prior in a subseqeunt nested sampling run
however is not necessary if we simply want to calcualte marginal
Bayesian statistics.

.. code:: ipython3

x = kde(np.random.uniform(0, 1, size=(10, theta.shape[-1])))
print(x)


.. parsed-literal::

[[ 0.00866746 -1.3674294 -1.04610725 0.74177354 0.59200377]
[ 1.25609155 -1.55385202 0.17241134 1.7487886 -0.85378741]
[-0.42623049 -1.28543977 0.6308928 -0.95923815 -0.5000081 ]
[-0.91927021 1.82074337 -1.4875557 1.13219455 1.80085807]
[-0.43319482 0.51338045 -0.58847524 -1.32000678 2.11012111]
[-0.48190147 0.53976851 -0.73276747 0.63523436 1.8162574 ]
[ 0.3549924 -0.13545207 1.38762986 -1.22339099 1.09403153]
[-0.51115334 0.47585545 -0.88141388 2.62513102 -0.1775158 ]
[ 0.28904819 -0.26387664 0.80479892 0.02123311 -0.50959412]
[ 0.88906095 -1.04406146 -1.67107321 -1.62705777 -0.03433262]]


A tutorial can be found on the github in the
jupyter notebook `notebook/Tutorial.ipynb` or alternatively at
`here <https://mybinder.org/v2/gh/htjb/margarine/7f55f9a9d3f3adb2356cb94b32c599caac8ea1ef?urlpath=lab%2Ftree%2Fnotebook%2FTutorial.ipynb>`_.
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ numpy
tensorflow-macos; sys_platform == 'darwin'
tensorflow; sys_platform != 'darwin'
tensorflow_probability
anesthetic
scipy
pandas
scikit-learn
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ def readme(short=False):

setup(
name='margarine',
version='1.0.0',
version='1.0.1',
description='margarine: Posterior Sampling and Marginal Bayesian Statistics',
long_description=readme(),
author='Harry T. J. Bevins',
Expand Down
Loading