Commit files from stored zip.

sods · May 28, 2015 · 61f6077 · 61f6077
commit 61f6077
Show file tree

Hide file tree

Showing 11 changed files with 885 additions and 0 deletions.
diff --git a/README b/README
@@ -0,0 +1,49 @@
+
+Bayesian Committee Machine
+Version 1.0, November 2005
+
+The Bayesian Committee Machine (BCM) is an approximation method for
+large-scale Gaussian process regression.
+
+What you should know beforehand:
+
+- The code is for Matlab
+
+- It requires the Netlab toolbox. You can download Netlab from 
+  http://www.ncrg.aston.ac.uk/netlab/
+
+- Install Netlab *before* trying to run any of the programs here.
+
+- To get started and to check your installation, try running 'dembcm.m'
+
+- If you are looking for example code to run the BCM, have a look at dembcm.m
+  All of the main features are used and explained there.
+
+
+Relevant publications:
+
+V. Tresp. A bayesian committee machine. Neural Computation, 12, 2000
+
+A. Schwaighofer and V. Tresp. Transductive and inductive methods for
+approximate Gaussian process regression. In S. Becker, S. Thrun, and
+K. Obermayer, editors, Advances in Neural Information Processing Systems
+15. MIT Press, 2003
+
+============================================================
+
+This program is free software; you can redistribute it and/or
+modify it under the terms of the GNU General Public License
+as published by the Free Software Foundation; either version 2
+of the License, or (at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; if not, write to the 
+Free Software Foundation, Inc.,
+59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.
+
+
diff --git a/bcm.m b/bcm.m
@@ -0,0 +1,62 @@
+function net = bcm(gpnet)
+% bcm - Bayesian Committee Machine
+%
+% Synopsis:
+%   net = bcm(gpnet)
+%   
+% Arguments:
+%   gpnet: A Gaussian process template for BCM modules, as output by Netlab's
+%       function gp.m. Each module of the BCM will inherit its initial
+%       parameters from gpnet.
+%   
+% Returns:
+%   net: Structure describing the BCM
+%   
+% Description:
+%   The Bayesian Committee Machine (BCM) is an approximation method for
+%   large-scale Gaussian process regression. The training data is split
+%   into a number of blocks, for which individual Gaussian process
+%   predictors ("modules") are trained. The prediction of a BCM is a
+%   weighted combination of the predictions of individual modules on the
+%   test data. Also, test data is processed in blocks, which leads to
+%   improved performance.
+%   The code here is a wrapper routine for Gaussian process routines
+%   provided by the Netlab toolbox. Netlab is thus required for this code
+%   to run.
+%
+% Examples:
+%   Building a BCM for 7-dimensional input, where each module is a GP
+%   with squared-exponential kernel:
+%       gpnet = gp(7, 'sqexp');
+%       net = bcm(gpnet);
+%   Equip the BCM with its training data, split up into modules of size
+%   500: 
+%       net = bcminit(net, Xtrain, Ytrain, 500);
+%   Fit each module's hyperparameters, and pre-compute a few matrices:
+%       net = bcmtrain(net, 'individual');
+%       net = bcmprepare(net);
+%   For increased performance: cluster the training data beforehand (10
+%   clusters in the example below) then assign clusters to modules:
+%       options = [1 1e-5 1e-4 0 0 0 0 0 0 0 0 0 0 30];
+%       r = randperm(size(Xtrain,1));
+%       [centres,opt,post] = kmeans(Xtrain(r(1:10)),Xtrain,options);
+%       [m,assignment] = max(post,[],2);
+%       net = bcminit(net, Xtrain, Ytrain, assignment);
+%       net = bcmprepare(net);
+%   Now can do prediction:
+%       [pred, errorBar] = bcmfwd(net, Xtest, 400);
+%   
+% See also: bcminit,bcmprepare,bcmtrain,bcmfwd,bcmerr,bcmgrad,bcmpak,bcmunpak
+% 
+
+% Author(s): Anton Schwaighofer, Nov 2004
+% $Id: bcm.m,v 1.1 2004/11/18 21:18:24 anton Exp $
+
+error(nargchk(1, 1, nargin));
+
+net = struct('type', 'bcm', 'gpnet', gpnet);
+net.nin = gpnet.nin;
+net.nout = 1;
+net.module = [];
+net.invPrior = {};
+net.weights = {};
diff --git a/bcmerr.m b/bcmerr.m
@@ -0,0 +1,114 @@
+function [e,edata,eprior] = bcmerr(net, x, t, exactCov, Xtest)
+% bcmerr - Error function for Bayesian Committee Machine
+%
+% Synopsis:
+%   [e,edata,eprior] = bcmerr(net)
+%   e = bcmerr(net, [], [], Xtest) (use with care, see below)
+%   
+% Arguments:
+%   net: BCM structure
+%   Xtest: [Q net.nin] matrix of test data
+%   
+% Returns:
+%   e: Value of the error function (marginal likelihood)
+%   edata: Data contribution to e
+%   eprior: Prior contribution to e
+%   
+% Description:
+%   This function returns the sum of the error functions of each module,
+%   that is, the unnormalized negative log-likelihood.  Error function is
+%   computed on the basis of the pre-initialized data in each GP module,
+%   thus no data is required as input. Still, to be compatible with the
+%   standard Netlab error functions, bcmerr.m accepts input arguments in
+%   the form bcmerr(net, x, t).
+%
+%   In the second calling syntax, bcmerr(net, [], [], Xtest), the exact
+%   BCM evidence is returned. This is given by
+%     -1/2 log det C - 1/2 t' C^{-1} t + const
+%   with C given by
+%      C = BD[K] + sigma^2 I + (K_c K_t^{-1} K_c' - BD[K_c K_t^{-1} K_c']
+%   where BD[...] denotes a block-diagonal approximation of the argument,
+%   K_c is the kernel matrix of all training points versus test points,
+%   K_t is the test point kernel matrix, and t is a vector of all
+%   training targets.
+%   C is a matrix of size [N N] for N training data, thus the exact BCM
+%   evidence can only be computed for cases where also an exact GP
+%   solution can be found. Use this feature only with moderately sized
+%   problems.
+%   
+%   
+% See also: bcm,bcmtrain,bcminit
+% 
+
+% Author(s): Anton Schwaighofer, Nov 2004
+% $Id: bcmerr.m,v 1.2 2004/11/23 21:43:51 anton Exp $
+
+% Input arguments x and t are effectively ignored
+
+error(nargchk(3, 4, nargin));
+if nargin<4,
+  Xtest = [];
+end
+
+if isempty(Xtest),
+  % No test data given: Default case of summing up individual modules' evidence
+  e = 0;
+  edata = 0;
+  eprior = 0;
+  for i = 1:length(net.module),
+    netI = net.module(i);
+    [a, b, c] = gperr(netI, netI.tr_in, netI.tr_targets);
+    e = e+a;
+    edata = edata+b;
+    eprior = eprior+c;
+  end
+else
+  % Compute BCM error with the actual BCM covariance matrix. This matrix
+  % has size [N N] for N training points, thus we can typically not hold
+  % it in memory. For this, knowledge of the test data is required.
+
+  % For the block diagonal approximation, we need to know the number of
+  % data in each module:
+  modSize = zeros(1, length(net.module));
+  for i = 1:length(net.module),
+    modSize(i) = length(net.module(i).tr_targets);
+  end
+  % Reconstruct the full training data
+  N = sum(modSize);
+  Xtrain = zeros(N, net.nin);
+  ind = 1;
+  for i = 1:length(net.module),
+    netI = net.module(i);
+    Xtrain(ind:(ind+length(netI.tr_targets)-1),:) = netI.tr_in;
+    ind = ind+length(netI.tr_targets);
+  end
+  % Major part of the overall kernel matrix is a form of Schur complement:
+  Kt = gpcovarp(net.gpnet, Xtest, Xtest);
+  Kc = gpcovarp(net.gpnet, Xtrain, Xtest);
+  smallEye = eps^(2/3)*speye(size(Kt));
+  C = Kc*inv(Kt+smallEye)*Kc';
+  % Overwrite diagonal blocks with exact covariance matrix, meaning that
+  % the kernel matrix is exact for points within the same module
+  startInd = 1;
+  for i = 1:length(net.module),
+    ind = startInd:(startInd+modSize(i)-1);
+    netI = net.module(i);
+    % Use gpcovar here, so that the contribution of the noise variance is
+    % already taken into account
+    C(ind,ind) = gpcovar(net.gpnet, Xtrain(ind,:));
+    startInd = startInd+modSize(i);
+  end
+  % With this matrix C, we can compute evidence as usual:
+  C(isnan(C)) = realmax;
+  C(isinf(C)&(C<0)) = -realmax;
+  C(isinf(C)&(C>0)) = realmax;
+  eigC = eig(C, 'nobalance');
+  % Guard against eventual tiny negative eigenvalues (eg. in the Matern
+  % kernel with large values of nu)
+  if any(eigC<=0),
+    warning('Skipping some negative eigenvalues. Results may be inaccurate');
+  end
+  edata = 0.5*(sum(log(eigC(eigC>0)))+t'*inv(C)*t);
+  eprior = 0;
+  e = edata+eprior;
+end
diff --git a/bcmfwd.m b/bcmfwd.m
@@ -0,0 +1,134 @@
+function [Ypred, Yvar] = bcmfwd(net,Xtest,querySize,verbosity)
+% bcmfwd - Forward propagation in Bayesian Committee Machine
+%
+% Synopsis:
+%   Ypred = bcmfwd(net,Xtest)
+%   [Ypred,Yvar] = bcmfwd(net,Xtest,querySize,verbosity)
+%   
+% Arguments:
+%   net: BCM structure
+%   Xtest: [Q d] matrix of test data, Q points in d dimensions
+%   querySize: Size of query set (prediction is based on blocks of test
+%       points of size querySize). Default value, if omitted: 500.
+%   verbosity: (optional) Use a value >0 to display progress information
+%   
+% Returns:
+%   Ypred: [Q 1] vector of predictions (predictive mean) for each test point
+%   Yvar: [Q 1] vector of predictive variances (error bars) for each test
+%       point 
+%   
+% Description:
+%   Forward propagation in Bayesian Committee Machine. The test data is
+%   split up into blocks of size querySize. For each block, all GP
+%   modules in the BCM make their prediction, the prediction is then
+%   weighted by the inverse predictive covariance, summed and normalized
+%   to give the BCM output.
+%   Typically, the performance of the BCM increases as querySize
+%   increases.
+%   Instead of passing querySize as a parameter, it can also be set in a
+%   field 'querySize' of the BCM structure.
+%   
+% Examples:
+%   Build a BCM with modules that contain 500 training points each:
+%     gp1 = gp(5, 'sqexp');
+%     bcm1 = bcm(gp1);
+%     bcm1.querySize = 500;
+%     bcm1 = bcminit(bcm1, Xtrain, Ytrain);
+%   Train the BCM, by maximizing the training data marginal likelihood
+%   for each module individually:
+%     bcm1 = bcmtrain(bcm1,'individual','scg');
+%   Compare the predictions of the BCM with different query set size:
+%     pred1 = bcmfwd(bcm1, Xtest, 10);
+%     pred2 = bcmfwd(bcm1, Xtest, 800);
+%   
+%   
+% See also: bcm,bcminit,bcmtrain,bcmprepare
+% 
+
+% Author(s): Anton Schwaighofer, Nov 2004
+% $Id: bcmfwd.m,v 1.2 2004/11/23 23:23:58 anton Exp $
+
+error(nargchk(2, 4, nargin));
+error(consist(net, 'bcm', Xtest));
+if nargin<3,
+  querySize = [];
+end
+if isempty(querySize),
+  if isfield(net, 'querySize'),
+    querySize = net.querySize;
+  else
+    querySize = 500;
+  end
+end
+if nargin<4,
+  verbosity=0;
+end
+
+if isempty(net.invPrior) | isempty(net.weights),
+  net = bcmprepare(net);
+end
+P = size(Xtest, 1);
+% Number of query sets of maximum size net.querySize
+nQueries = ceil(P/querySize);
+nModules = length(net.module);
+Ypred = zeros([P 1]);
+Yvar = zeros([P 1]);
+
+if verbosity>0,
+  fprintf('\nStarting forward propagation (%i query sets).\n', nQueries);
+end
+if verbosity==1,
+  fprintf('Query set ');
+end
+t1 = cputime;
+for j = 1:nQueries,
+  if verbosity==1,
+    fprintf('%i ', j);
+  end
+  if verbosity==2,
+    fprintf('Query set %i: ', j);
+  end
+  ind1 = (1+(j-1)*querySize):min(P, j*querySize);
+  Xtest1 = Xtest(ind1, :);
+  % A small regularization matrix for inversions
+  smallEye = eps^(2/3)*speye(length(ind1));
+  % Prediction for the current query set
+  Ypred1 = zeros([length(ind1) 1]);
+  % Overall covariance matrix for current query set
+  Ycov1 = 0;
+  % The original BCM where all modules share the same hyperparameters:
+% $$$   K11 = gpcovarp(net.module(1), Xtest1, Xtest1);
+% $$$   Ycov1 = -(nModules-1)*inv(K11+smallEye);
+  startInd = 1;
+  for i = 1:length(net.module),
+    netI = net.module(i);
+    K11 = gpcovarp(netI, Xtest1, Xtest1);
+    K12 = gpcovarp(netI, Xtest1, netI.tr_in);
+    % Prediction of current module
+    Ypred2 = K12*net.weight{i};
+    % Covariance of current module
+    Ycov2 = K11-K12*net.invPrior{i}*K12';
+    invYcov2 = inv(Ycov2+smallEye);
+    % Add weighted prediction of the current module
+    Ypred1 = Ypred1+invYcov2*Ypred2;
+    % Update overall covariance matrix
+    Ycov1 = Ycov1+invYcov2;
+    % Instead of the above (M-1)*inv(K11) term: this has one contribution for
+    % the prior covariance for each module but the last/first one. The
+    % last module is usually the smallest, drop this one
+    if i~=length(net.module),
+      Ycov1 = Ycov1 - inv(K11+smallEye);
+    end
+    if verbosity==2,
+      fprintf('.');
+    end
+  end
+  % Ycov1 is the *inverse* covariance of the overall prediction
+  Ycov1 = inv(Ycov1+smallEye);
+  % Rescale the sum of the modules' predictions and write into result
+  Ypred(ind1) = Ycov1*Ypred1;
+  Yvar(ind1) = diag(Ycov1);
+  if verbosity>0,
+    fprintf('\n');
+  end
+end
diff --git a/bcmgrad.m b/bcmgrad.m
@@ -0,0 +1,31 @@
+function g = bcmgrad(net, x, t)
+% bcmgrad - Error gradient for Bayesian Committee Machine
+%
+% Synopsis:
+%   g = bcmgrad(net)
+%   
+% Arguments:
+%   net: BCM structure
+%   
+% Returns:
+%   g: Gradient of the error function (marginal likelihood) with respect
+%       to the kernel parameters
+%   
+% Description:
+%   Error function and gradient are computed on the basis of the
+%   pre-initialized data in each GP module, thus no data is required as
+%   input.
+%   
+%   
+% See also: bcm,bcmtrain,bcminit,bcmerr
+% 
+
+% Author(s): Anton Schwaighofer, Nov 2004
+% $Id: bcmgrad.m,v 1.1 2004/11/18 21:19:46 anton Exp $
+
+g = 0;
+for i = 1:length(net.module),
+  netI = net.module(i);
+  gI = gpgrad(netI, netI.tr_in, netI.tr_targets);
+  g = g+gI;
+end