Skip to content

Latest commit

 

History

History
208 lines (161 loc) · 8.31 KB

README.md

File metadata and controls

208 lines (161 loc) · 8.31 KB

htnorm

This repo provides a C implementation of a fast and exact sampling algorithm for a multivariate normal distribution (MVN) truncated on a hyperplane as described here

This repo implements the following from the paper:

  • Efficient sampling from a MVN truncated on a hyperplane:

    hptrunc

  • Efficient sampling from a MVN with a stuctured precision matrix that is a sum of an invertible matrix and a low rank matrix:

    struc

  • Efficient sampling from a MVN with a structured precision and mean:

    strucmean

The algorithms implemented have the following practical applications:

  • Topic models when unknown parameters can be interpreted as fractions.
  • Admixture models
  • discrete graphical models
  • Sampling from the posterior distribution of an Intrinsic Conditional Autoregressive prior icar
  • Sampling from the posterior conditional distributions of various bayesian regression problems.

Dependencies

  • A C compiler that implements the C99 standard or later
  • An installation of LAPACK.

Usage

Building a shared library of htnorm can be done with the following:

# optionally set path to LAPACK shared library
$ export LIBS_DIR="some/path/to/lib/"
$ make lib

Afterwards the shared library will be found in a lib/ directory of the project root, and the library can be linked dynamically via -lhtnorm.

The puplic interface exposes the samplers through the function declarations

 int htn_hyperplane_truncated_mvn(rng_t* rng, const ht_config_t* conf, double* out);
 int htn_structured_precision_mvn(rng_t* rng, const sp_config_t* conf, double* out);

The details of the parameters are documented in ther header files "htnorm.h".

Random number generation is done using PCG64 or Xoroshiro128plus bitgenerators. The interface allows using a custom bitgenerator, and the details are documented in the header file "rng.h".

Example

#include "htnorm.h"

int main (void)
{
    ...
    // instantiate a random number generator
    rng_t* rng = rng_new_pcg64_seeded(12345);
    ht_config_t config;
    init_ht_config(&config, ...);
    double* out = ...; // array to store the samples
    int res = htn_hyperplane_truncated_mvn(rng, &config, out);
    // res contains a number that indicates whether sampling failed or not.
    ...
    // finally free the RNG pointer at some point
    rng_free(rng);
    ...
    return 0;
}

Python Interface

PyPI - Wheel PyPI CI Codecov PyPI - License

Dependencies

  • NumPy >= 1.19.0

A high level python interface to the library is also provided. Linux and MacOS users can install it using wheels via pip (thus not needing to worry about availability of C libraries). Windows OS is currently not supported.

pip install -U pyhtnorm

Wheels are not provided for MacOS. To install via pip, one can run the following commands:

pip install -U pyhtnorm

Alternatively, one can install it from source using the following shell commands:

$ git clone https://github.com/zoj613/htnorm.git
$ cd htnorm/
$ export PYHT_LIBS_DIR=<some directory with blas and lapack shared library files> # this is optional
$ pip install .

Below is an example of how to use htnorm in python to sample from a multivariate gaussian truncated on the hyperplane sumzero (i.e. making sure the sampled values sum to zero). The python interface is such that the code can be easily integrated into other existing libraries. Since v1.0.0, it supports passing a numpy.random.Generator instance as a parameter to aid reproducibility.

from pyhtnorm import hyperplane_truncated_mvnorm, structured_precision_mvnorm
import numpy as np

rng = np.random.default_rng()

# generate example input
k1, k2 = 1000, 1
temp = rng.random((k1, k1))
cov = temp @ temp.T
G = np.ones((k2, k1))
r = np.zeros(k2)
mean = rng.random(k1)

# passing `random_state` is optional. If the argument is not used, a fresh
# random generator state is instantiated internally using system entropy.
o = hyperplane_truncated_mvnorm(mean, cov, G, r, random_state=rng)
print(o.sum())  # verify if sampled values sum to zero
# alternatively one can pass an array to store the results in
hyperplane_truncated_mvnorm(mean, cov, G, r, out=o)

For more information about the function's arguments, refer to its docstring.

A pure numpy implementation is demonstrated in this example script.

R Interface

One can also use the package in R. To install, use one the following commands:

devtools::install_github("zoj613/htnorm")
pak::pkg_install("zoj613/htnorm")

Below is an R translation of the above python example:

library(htnorm)

# make dummy data
mean <- rnorm(1000)
cov <- matrix(rnorm(1000 * 1000), ncol=1000)
cov <- cov %*% t(cov)
G <- matrix(rep(1, 1000), ncol=1000)
r <- c(0)

# initialize the Generator instance
rng <- HTNGenerator(seed=12345, gen="pcg64")

samples <- rng$hyperplane_truncated_mvnorm(mean, cov, G, r)
#verify if sampled values sum to zero
sum(samples)

# optionally pass a vector to store the results in
out <- rep(0, 1000)
rng$hyperplane_truncated_mvnorm(mean, cov, G, r, out = out)
sum(out)  #verify

out <- rep(0, 1000)
eig <- eigen(cov)
phi <- eig$vectors
omega <- diag(eig$values)
a <- diag(runif(length(mean)))
rng$structured_precision_mvnorm(mean, a, phi, omega, a_type = "diagonal", out = out)

Licensing

htnorm is free software made available under the BSD-3 License. For details see the LICENSE file.

References

  • Cong, Yulai; Chen, Bo; Zhou, Mingyuan. Fast Simulation of Hyperplane-Truncated Multivariate Normal Distributions. Bayesian Anal. 12 (2017), no. 4, 1017--1037. doi:10.1214/17-BA1052.
  • Bhattacharya, A., Chakraborty, A., and Mallick, B. K. (2016). “Fast sampling with Gaussian scale mixture priors in high-dimensional regression.” Biometrika, 103(4):985.