Skip to content

Commit

Permalink
Commit files from stored zip.
Browse files Browse the repository at this point in the history
  • Loading branch information
lawrennd committed May 28, 2015
0 parents commit 61f6077
Show file tree
Hide file tree
Showing 11 changed files with 885 additions and 0 deletions.
49 changes: 49 additions & 0 deletions README
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@

Bayesian Committee Machine
Version 1.0, November 2005

The Bayesian Committee Machine (BCM) is an approximation method for
large-scale Gaussian process regression.

What you should know beforehand:

- The code is for Matlab

- It requires the Netlab toolbox. You can download Netlab from
http://www.ncrg.aston.ac.uk/netlab/

- Install Netlab *before* trying to run any of the programs here.

- To get started and to check your installation, try running 'dembcm.m'

- If you are looking for example code to run the BCM, have a look at dembcm.m
All of the main features are used and explained there.


Relevant publications:

V. Tresp. A bayesian committee machine. Neural Computation, 12, 2000

A. Schwaighofer and V. Tresp. Transductive and inductive methods for
approximate Gaussian process regression. In S. Becker, S. Thrun, and
K. Obermayer, editors, Advances in Neural Information Processing Systems
15. MIT Press, 2003

============================================================

This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the
Free Software Foundation, Inc.,
59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.


62 changes: 62 additions & 0 deletions bcm.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
function net = bcm(gpnet)
% bcm - Bayesian Committee Machine
%
% Synopsis:
% net = bcm(gpnet)
%
% Arguments:
% gpnet: A Gaussian process template for BCM modules, as output by Netlab's
% function gp.m. Each module of the BCM will inherit its initial
% parameters from gpnet.
%
% Returns:
% net: Structure describing the BCM
%
% Description:
% The Bayesian Committee Machine (BCM) is an approximation method for
% large-scale Gaussian process regression. The training data is split
% into a number of blocks, for which individual Gaussian process
% predictors ("modules") are trained. The prediction of a BCM is a
% weighted combination of the predictions of individual modules on the
% test data. Also, test data is processed in blocks, which leads to
% improved performance.
% The code here is a wrapper routine for Gaussian process routines
% provided by the Netlab toolbox. Netlab is thus required for this code
% to run.
%
% Examples:
% Building a BCM for 7-dimensional input, where each module is a GP
% with squared-exponential kernel:
% gpnet = gp(7, 'sqexp');
% net = bcm(gpnet);
% Equip the BCM with its training data, split up into modules of size
% 500:
% net = bcminit(net, Xtrain, Ytrain, 500);
% Fit each module's hyperparameters, and pre-compute a few matrices:
% net = bcmtrain(net, 'individual');
% net = bcmprepare(net);
% For increased performance: cluster the training data beforehand (10
% clusters in the example below) then assign clusters to modules:
% options = [1 1e-5 1e-4 0 0 0 0 0 0 0 0 0 0 30];
% r = randperm(size(Xtrain,1));
% [centres,opt,post] = kmeans(Xtrain(r(1:10)),Xtrain,options);
% [m,assignment] = max(post,[],2);
% net = bcminit(net, Xtrain, Ytrain, assignment);
% net = bcmprepare(net);
% Now can do prediction:
% [pred, errorBar] = bcmfwd(net, Xtest, 400);
%
% See also: bcminit,bcmprepare,bcmtrain,bcmfwd,bcmerr,bcmgrad,bcmpak,bcmunpak
%

% Author(s): Anton Schwaighofer, Nov 2004
% $Id: bcm.m,v 1.1 2004/11/18 21:18:24 anton Exp $

error(nargchk(1, 1, nargin));

net = struct('type', 'bcm', 'gpnet', gpnet);
net.nin = gpnet.nin;
net.nout = 1;
net.module = [];
net.invPrior = {};
net.weights = {};
114 changes: 114 additions & 0 deletions bcmerr.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
function [e,edata,eprior] = bcmerr(net, x, t, exactCov, Xtest)
% bcmerr - Error function for Bayesian Committee Machine
%
% Synopsis:
% [e,edata,eprior] = bcmerr(net)
% e = bcmerr(net, [], [], Xtest) (use with care, see below)
%
% Arguments:
% net: BCM structure
% Xtest: [Q net.nin] matrix of test data
%
% Returns:
% e: Value of the error function (marginal likelihood)
% edata: Data contribution to e
% eprior: Prior contribution to e
%
% Description:
% This function returns the sum of the error functions of each module,
% that is, the unnormalized negative log-likelihood. Error function is
% computed on the basis of the pre-initialized data in each GP module,
% thus no data is required as input. Still, to be compatible with the
% standard Netlab error functions, bcmerr.m accepts input arguments in
% the form bcmerr(net, x, t).
%
% In the second calling syntax, bcmerr(net, [], [], Xtest), the exact
% BCM evidence is returned. This is given by
% -1/2 log det C - 1/2 t' C^{-1} t + const
% with C given by
% C = BD[K] + sigma^2 I + (K_c K_t^{-1} K_c' - BD[K_c K_t^{-1} K_c']
% where BD[...] denotes a block-diagonal approximation of the argument,
% K_c is the kernel matrix of all training points versus test points,
% K_t is the test point kernel matrix, and t is a vector of all
% training targets.
% C is a matrix of size [N N] for N training data, thus the exact BCM
% evidence can only be computed for cases where also an exact GP
% solution can be found. Use this feature only with moderately sized
% problems.
%
%
% See also: bcm,bcmtrain,bcminit
%

% Author(s): Anton Schwaighofer, Nov 2004
% $Id: bcmerr.m,v 1.2 2004/11/23 21:43:51 anton Exp $

% Input arguments x and t are effectively ignored

error(nargchk(3, 4, nargin));
if nargin<4,
Xtest = [];
end

if isempty(Xtest),
% No test data given: Default case of summing up individual modules' evidence
e = 0;
edata = 0;
eprior = 0;
for i = 1:length(net.module),
netI = net.module(i);
[a, b, c] = gperr(netI, netI.tr_in, netI.tr_targets);
e = e+a;
edata = edata+b;
eprior = eprior+c;
end
else
% Compute BCM error with the actual BCM covariance matrix. This matrix
% has size [N N] for N training points, thus we can typically not hold
% it in memory. For this, knowledge of the test data is required.

% For the block diagonal approximation, we need to know the number of
% data in each module:
modSize = zeros(1, length(net.module));
for i = 1:length(net.module),
modSize(i) = length(net.module(i).tr_targets);
end
% Reconstruct the full training data
N = sum(modSize);
Xtrain = zeros(N, net.nin);
ind = 1;
for i = 1:length(net.module),
netI = net.module(i);
Xtrain(ind:(ind+length(netI.tr_targets)-1),:) = netI.tr_in;
ind = ind+length(netI.tr_targets);
end
% Major part of the overall kernel matrix is a form of Schur complement:
Kt = gpcovarp(net.gpnet, Xtest, Xtest);
Kc = gpcovarp(net.gpnet, Xtrain, Xtest);
smallEye = eps^(2/3)*speye(size(Kt));
C = Kc*inv(Kt+smallEye)*Kc';
% Overwrite diagonal blocks with exact covariance matrix, meaning that
% the kernel matrix is exact for points within the same module
startInd = 1;
for i = 1:length(net.module),
ind = startInd:(startInd+modSize(i)-1);
netI = net.module(i);
% Use gpcovar here, so that the contribution of the noise variance is
% already taken into account
C(ind,ind) = gpcovar(net.gpnet, Xtrain(ind,:));
startInd = startInd+modSize(i);
end
% With this matrix C, we can compute evidence as usual:
C(isnan(C)) = realmax;
C(isinf(C)&(C<0)) = -realmax;
C(isinf(C)&(C>0)) = realmax;
eigC = eig(C, 'nobalance');
% Guard against eventual tiny negative eigenvalues (eg. in the Matern
% kernel with large values of nu)
if any(eigC<=0),
warning('Skipping some negative eigenvalues. Results may be inaccurate');
end
edata = 0.5*(sum(log(eigC(eigC>0)))+t'*inv(C)*t);
eprior = 0;
e = edata+eprior;
end
134 changes: 134 additions & 0 deletions bcmfwd.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
function [Ypred, Yvar] = bcmfwd(net,Xtest,querySize,verbosity)
% bcmfwd - Forward propagation in Bayesian Committee Machine
%
% Synopsis:
% Ypred = bcmfwd(net,Xtest)
% [Ypred,Yvar] = bcmfwd(net,Xtest,querySize,verbosity)
%
% Arguments:
% net: BCM structure
% Xtest: [Q d] matrix of test data, Q points in d dimensions
% querySize: Size of query set (prediction is based on blocks of test
% points of size querySize). Default value, if omitted: 500.
% verbosity: (optional) Use a value >0 to display progress information
%
% Returns:
% Ypred: [Q 1] vector of predictions (predictive mean) for each test point
% Yvar: [Q 1] vector of predictive variances (error bars) for each test
% point
%
% Description:
% Forward propagation in Bayesian Committee Machine. The test data is
% split up into blocks of size querySize. For each block, all GP
% modules in the BCM make their prediction, the prediction is then
% weighted by the inverse predictive covariance, summed and normalized
% to give the BCM output.
% Typically, the performance of the BCM increases as querySize
% increases.
% Instead of passing querySize as a parameter, it can also be set in a
% field 'querySize' of the BCM structure.
%
% Examples:
% Build a BCM with modules that contain 500 training points each:
% gp1 = gp(5, 'sqexp');
% bcm1 = bcm(gp1);
% bcm1.querySize = 500;
% bcm1 = bcminit(bcm1, Xtrain, Ytrain);
% Train the BCM, by maximizing the training data marginal likelihood
% for each module individually:
% bcm1 = bcmtrain(bcm1,'individual','scg');
% Compare the predictions of the BCM with different query set size:
% pred1 = bcmfwd(bcm1, Xtest, 10);
% pred2 = bcmfwd(bcm1, Xtest, 800);
%
%
% See also: bcm,bcminit,bcmtrain,bcmprepare
%

% Author(s): Anton Schwaighofer, Nov 2004
% $Id: bcmfwd.m,v 1.2 2004/11/23 23:23:58 anton Exp $

error(nargchk(2, 4, nargin));
error(consist(net, 'bcm', Xtest));
if nargin<3,
querySize = [];
end
if isempty(querySize),
if isfield(net, 'querySize'),
querySize = net.querySize;
else
querySize = 500;
end
end
if nargin<4,
verbosity=0;
end

if isempty(net.invPrior) | isempty(net.weights),
net = bcmprepare(net);
end
P = size(Xtest, 1);
% Number of query sets of maximum size net.querySize
nQueries = ceil(P/querySize);
nModules = length(net.module);
Ypred = zeros([P 1]);
Yvar = zeros([P 1]);

if verbosity>0,
fprintf('\nStarting forward propagation (%i query sets).\n', nQueries);
end
if verbosity==1,
fprintf('Query set ');
end
t1 = cputime;
for j = 1:nQueries,
if verbosity==1,
fprintf('%i ', j);
end
if verbosity==2,
fprintf('Query set %i: ', j);
end
ind1 = (1+(j-1)*querySize):min(P, j*querySize);
Xtest1 = Xtest(ind1, :);
% A small regularization matrix for inversions
smallEye = eps^(2/3)*speye(length(ind1));
% Prediction for the current query set
Ypred1 = zeros([length(ind1) 1]);
% Overall covariance matrix for current query set
Ycov1 = 0;
% The original BCM where all modules share the same hyperparameters:
% $$$ K11 = gpcovarp(net.module(1), Xtest1, Xtest1);
% $$$ Ycov1 = -(nModules-1)*inv(K11+smallEye);
startInd = 1;
for i = 1:length(net.module),
netI = net.module(i);
K11 = gpcovarp(netI, Xtest1, Xtest1);
K12 = gpcovarp(netI, Xtest1, netI.tr_in);
% Prediction of current module
Ypred2 = K12*net.weight{i};
% Covariance of current module
Ycov2 = K11-K12*net.invPrior{i}*K12';
invYcov2 = inv(Ycov2+smallEye);
% Add weighted prediction of the current module
Ypred1 = Ypred1+invYcov2*Ypred2;
% Update overall covariance matrix
Ycov1 = Ycov1+invYcov2;
% Instead of the above (M-1)*inv(K11) term: this has one contribution for
% the prior covariance for each module but the last/first one. The
% last module is usually the smallest, drop this one
if i~=length(net.module),
Ycov1 = Ycov1 - inv(K11+smallEye);
end
if verbosity==2,
fprintf('.');
end
end
% Ycov1 is the *inverse* covariance of the overall prediction
Ycov1 = inv(Ycov1+smallEye);
% Rescale the sum of the modules' predictions and write into result
Ypred(ind1) = Ycov1*Ypred1;
Yvar(ind1) = diag(Ycov1);
if verbosity>0,
fprintf('\n');
end
end
31 changes: 31 additions & 0 deletions bcmgrad.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
function g = bcmgrad(net, x, t)
% bcmgrad - Error gradient for Bayesian Committee Machine
%
% Synopsis:
% g = bcmgrad(net)
%
% Arguments:
% net: BCM structure
%
% Returns:
% g: Gradient of the error function (marginal likelihood) with respect
% to the kernel parameters
%
% Description:
% Error function and gradient are computed on the basis of the
% pre-initialized data in each GP module, thus no data is required as
% input.
%
%
% See also: bcm,bcmtrain,bcminit,bcmerr
%

% Author(s): Anton Schwaighofer, Nov 2004
% $Id: bcmgrad.m,v 1.1 2004/11/18 21:19:46 anton Exp $

g = 0;
for i = 1:length(net.module),
netI = net.module(i);
gI = gpgrad(netI, netI.tr_in, netI.tr_targets);
g = g+gI;
end
Loading

0 comments on commit 61f6077

Please sign in to comment.