-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 61f6077
Showing
11 changed files
with
885 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
|
||
Bayesian Committee Machine | ||
Version 1.0, November 2005 | ||
|
||
The Bayesian Committee Machine (BCM) is an approximation method for | ||
large-scale Gaussian process regression. | ||
|
||
What you should know beforehand: | ||
|
||
- The code is for Matlab | ||
|
||
- It requires the Netlab toolbox. You can download Netlab from | ||
http://www.ncrg.aston.ac.uk/netlab/ | ||
|
||
- Install Netlab *before* trying to run any of the programs here. | ||
|
||
- To get started and to check your installation, try running 'dembcm.m' | ||
|
||
- If you are looking for example code to run the BCM, have a look at dembcm.m | ||
All of the main features are used and explained there. | ||
|
||
|
||
Relevant publications: | ||
|
||
V. Tresp. A bayesian committee machine. Neural Computation, 12, 2000 | ||
|
||
A. Schwaighofer and V. Tresp. Transductive and inductive methods for | ||
approximate Gaussian process regression. In S. Becker, S. Thrun, and | ||
K. Obermayer, editors, Advances in Neural Information Processing Systems | ||
15. MIT Press, 2003 | ||
|
||
============================================================ | ||
|
||
This program is free software; you can redistribute it and/or | ||
modify it under the terms of the GNU General Public License | ||
as published by the Free Software Foundation; either version 2 | ||
of the License, or (at your option) any later version. | ||
|
||
This program is distributed in the hope that it will be useful, | ||
but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
GNU General Public License for more details. | ||
|
||
You should have received a copy of the GNU General Public License | ||
along with this program; if not, write to the | ||
Free Software Foundation, Inc., | ||
59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
function net = bcm(gpnet) | ||
% bcm - Bayesian Committee Machine | ||
% | ||
% Synopsis: | ||
% net = bcm(gpnet) | ||
% | ||
% Arguments: | ||
% gpnet: A Gaussian process template for BCM modules, as output by Netlab's | ||
% function gp.m. Each module of the BCM will inherit its initial | ||
% parameters from gpnet. | ||
% | ||
% Returns: | ||
% net: Structure describing the BCM | ||
% | ||
% Description: | ||
% The Bayesian Committee Machine (BCM) is an approximation method for | ||
% large-scale Gaussian process regression. The training data is split | ||
% into a number of blocks, for which individual Gaussian process | ||
% predictors ("modules") are trained. The prediction of a BCM is a | ||
% weighted combination of the predictions of individual modules on the | ||
% test data. Also, test data is processed in blocks, which leads to | ||
% improved performance. | ||
% The code here is a wrapper routine for Gaussian process routines | ||
% provided by the Netlab toolbox. Netlab is thus required for this code | ||
% to run. | ||
% | ||
% Examples: | ||
% Building a BCM for 7-dimensional input, where each module is a GP | ||
% with squared-exponential kernel: | ||
% gpnet = gp(7, 'sqexp'); | ||
% net = bcm(gpnet); | ||
% Equip the BCM with its training data, split up into modules of size | ||
% 500: | ||
% net = bcminit(net, Xtrain, Ytrain, 500); | ||
% Fit each module's hyperparameters, and pre-compute a few matrices: | ||
% net = bcmtrain(net, 'individual'); | ||
% net = bcmprepare(net); | ||
% For increased performance: cluster the training data beforehand (10 | ||
% clusters in the example below) then assign clusters to modules: | ||
% options = [1 1e-5 1e-4 0 0 0 0 0 0 0 0 0 0 30]; | ||
% r = randperm(size(Xtrain,1)); | ||
% [centres,opt,post] = kmeans(Xtrain(r(1:10)),Xtrain,options); | ||
% [m,assignment] = max(post,[],2); | ||
% net = bcminit(net, Xtrain, Ytrain, assignment); | ||
% net = bcmprepare(net); | ||
% Now can do prediction: | ||
% [pred, errorBar] = bcmfwd(net, Xtest, 400); | ||
% | ||
% See also: bcminit,bcmprepare,bcmtrain,bcmfwd,bcmerr,bcmgrad,bcmpak,bcmunpak | ||
% | ||
|
||
% Author(s): Anton Schwaighofer, Nov 2004 | ||
% $Id: bcm.m,v 1.1 2004/11/18 21:18:24 anton Exp $ | ||
|
||
error(nargchk(1, 1, nargin)); | ||
|
||
net = struct('type', 'bcm', 'gpnet', gpnet); | ||
net.nin = gpnet.nin; | ||
net.nout = 1; | ||
net.module = []; | ||
net.invPrior = {}; | ||
net.weights = {}; |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
function [e,edata,eprior] = bcmerr(net, x, t, exactCov, Xtest) | ||
% bcmerr - Error function for Bayesian Committee Machine | ||
% | ||
% Synopsis: | ||
% [e,edata,eprior] = bcmerr(net) | ||
% e = bcmerr(net, [], [], Xtest) (use with care, see below) | ||
% | ||
% Arguments: | ||
% net: BCM structure | ||
% Xtest: [Q net.nin] matrix of test data | ||
% | ||
% Returns: | ||
% e: Value of the error function (marginal likelihood) | ||
% edata: Data contribution to e | ||
% eprior: Prior contribution to e | ||
% | ||
% Description: | ||
% This function returns the sum of the error functions of each module, | ||
% that is, the unnormalized negative log-likelihood. Error function is | ||
% computed on the basis of the pre-initialized data in each GP module, | ||
% thus no data is required as input. Still, to be compatible with the | ||
% standard Netlab error functions, bcmerr.m accepts input arguments in | ||
% the form bcmerr(net, x, t). | ||
% | ||
% In the second calling syntax, bcmerr(net, [], [], Xtest), the exact | ||
% BCM evidence is returned. This is given by | ||
% -1/2 log det C - 1/2 t' C^{-1} t + const | ||
% with C given by | ||
% C = BD[K] + sigma^2 I + (K_c K_t^{-1} K_c' - BD[K_c K_t^{-1} K_c'] | ||
% where BD[...] denotes a block-diagonal approximation of the argument, | ||
% K_c is the kernel matrix of all training points versus test points, | ||
% K_t is the test point kernel matrix, and t is a vector of all | ||
% training targets. | ||
% C is a matrix of size [N N] for N training data, thus the exact BCM | ||
% evidence can only be computed for cases where also an exact GP | ||
% solution can be found. Use this feature only with moderately sized | ||
% problems. | ||
% | ||
% | ||
% See also: bcm,bcmtrain,bcminit | ||
% | ||
|
||
% Author(s): Anton Schwaighofer, Nov 2004 | ||
% $Id: bcmerr.m,v 1.2 2004/11/23 21:43:51 anton Exp $ | ||
|
||
% Input arguments x and t are effectively ignored | ||
|
||
error(nargchk(3, 4, nargin)); | ||
if nargin<4, | ||
Xtest = []; | ||
end | ||
|
||
if isempty(Xtest), | ||
% No test data given: Default case of summing up individual modules' evidence | ||
e = 0; | ||
edata = 0; | ||
eprior = 0; | ||
for i = 1:length(net.module), | ||
netI = net.module(i); | ||
[a, b, c] = gperr(netI, netI.tr_in, netI.tr_targets); | ||
e = e+a; | ||
edata = edata+b; | ||
eprior = eprior+c; | ||
end | ||
else | ||
% Compute BCM error with the actual BCM covariance matrix. This matrix | ||
% has size [N N] for N training points, thus we can typically not hold | ||
% it in memory. For this, knowledge of the test data is required. | ||
|
||
% For the block diagonal approximation, we need to know the number of | ||
% data in each module: | ||
modSize = zeros(1, length(net.module)); | ||
for i = 1:length(net.module), | ||
modSize(i) = length(net.module(i).tr_targets); | ||
end | ||
% Reconstruct the full training data | ||
N = sum(modSize); | ||
Xtrain = zeros(N, net.nin); | ||
ind = 1; | ||
for i = 1:length(net.module), | ||
netI = net.module(i); | ||
Xtrain(ind:(ind+length(netI.tr_targets)-1),:) = netI.tr_in; | ||
ind = ind+length(netI.tr_targets); | ||
end | ||
% Major part of the overall kernel matrix is a form of Schur complement: | ||
Kt = gpcovarp(net.gpnet, Xtest, Xtest); | ||
Kc = gpcovarp(net.gpnet, Xtrain, Xtest); | ||
smallEye = eps^(2/3)*speye(size(Kt)); | ||
C = Kc*inv(Kt+smallEye)*Kc'; | ||
% Overwrite diagonal blocks with exact covariance matrix, meaning that | ||
% the kernel matrix is exact for points within the same module | ||
startInd = 1; | ||
for i = 1:length(net.module), | ||
ind = startInd:(startInd+modSize(i)-1); | ||
netI = net.module(i); | ||
% Use gpcovar here, so that the contribution of the noise variance is | ||
% already taken into account | ||
C(ind,ind) = gpcovar(net.gpnet, Xtrain(ind,:)); | ||
startInd = startInd+modSize(i); | ||
end | ||
% With this matrix C, we can compute evidence as usual: | ||
C(isnan(C)) = realmax; | ||
C(isinf(C)&(C<0)) = -realmax; | ||
C(isinf(C)&(C>0)) = realmax; | ||
eigC = eig(C, 'nobalance'); | ||
% Guard against eventual tiny negative eigenvalues (eg. in the Matern | ||
% kernel with large values of nu) | ||
if any(eigC<=0), | ||
warning('Skipping some negative eigenvalues. Results may be inaccurate'); | ||
end | ||
edata = 0.5*(sum(log(eigC(eigC>0)))+t'*inv(C)*t); | ||
eprior = 0; | ||
e = edata+eprior; | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,134 @@ | ||
function [Ypred, Yvar] = bcmfwd(net,Xtest,querySize,verbosity) | ||
% bcmfwd - Forward propagation in Bayesian Committee Machine | ||
% | ||
% Synopsis: | ||
% Ypred = bcmfwd(net,Xtest) | ||
% [Ypred,Yvar] = bcmfwd(net,Xtest,querySize,verbosity) | ||
% | ||
% Arguments: | ||
% net: BCM structure | ||
% Xtest: [Q d] matrix of test data, Q points in d dimensions | ||
% querySize: Size of query set (prediction is based on blocks of test | ||
% points of size querySize). Default value, if omitted: 500. | ||
% verbosity: (optional) Use a value >0 to display progress information | ||
% | ||
% Returns: | ||
% Ypred: [Q 1] vector of predictions (predictive mean) for each test point | ||
% Yvar: [Q 1] vector of predictive variances (error bars) for each test | ||
% point | ||
% | ||
% Description: | ||
% Forward propagation in Bayesian Committee Machine. The test data is | ||
% split up into blocks of size querySize. For each block, all GP | ||
% modules in the BCM make their prediction, the prediction is then | ||
% weighted by the inverse predictive covariance, summed and normalized | ||
% to give the BCM output. | ||
% Typically, the performance of the BCM increases as querySize | ||
% increases. | ||
% Instead of passing querySize as a parameter, it can also be set in a | ||
% field 'querySize' of the BCM structure. | ||
% | ||
% Examples: | ||
% Build a BCM with modules that contain 500 training points each: | ||
% gp1 = gp(5, 'sqexp'); | ||
% bcm1 = bcm(gp1); | ||
% bcm1.querySize = 500; | ||
% bcm1 = bcminit(bcm1, Xtrain, Ytrain); | ||
% Train the BCM, by maximizing the training data marginal likelihood | ||
% for each module individually: | ||
% bcm1 = bcmtrain(bcm1,'individual','scg'); | ||
% Compare the predictions of the BCM with different query set size: | ||
% pred1 = bcmfwd(bcm1, Xtest, 10); | ||
% pred2 = bcmfwd(bcm1, Xtest, 800); | ||
% | ||
% | ||
% See also: bcm,bcminit,bcmtrain,bcmprepare | ||
% | ||
|
||
% Author(s): Anton Schwaighofer, Nov 2004 | ||
% $Id: bcmfwd.m,v 1.2 2004/11/23 23:23:58 anton Exp $ | ||
|
||
error(nargchk(2, 4, nargin)); | ||
error(consist(net, 'bcm', Xtest)); | ||
if nargin<3, | ||
querySize = []; | ||
end | ||
if isempty(querySize), | ||
if isfield(net, 'querySize'), | ||
querySize = net.querySize; | ||
else | ||
querySize = 500; | ||
end | ||
end | ||
if nargin<4, | ||
verbosity=0; | ||
end | ||
|
||
if isempty(net.invPrior) | isempty(net.weights), | ||
net = bcmprepare(net); | ||
end | ||
P = size(Xtest, 1); | ||
% Number of query sets of maximum size net.querySize | ||
nQueries = ceil(P/querySize); | ||
nModules = length(net.module); | ||
Ypred = zeros([P 1]); | ||
Yvar = zeros([P 1]); | ||
|
||
if verbosity>0, | ||
fprintf('\nStarting forward propagation (%i query sets).\n', nQueries); | ||
end | ||
if verbosity==1, | ||
fprintf('Query set '); | ||
end | ||
t1 = cputime; | ||
for j = 1:nQueries, | ||
if verbosity==1, | ||
fprintf('%i ', j); | ||
end | ||
if verbosity==2, | ||
fprintf('Query set %i: ', j); | ||
end | ||
ind1 = (1+(j-1)*querySize):min(P, j*querySize); | ||
Xtest1 = Xtest(ind1, :); | ||
% A small regularization matrix for inversions | ||
smallEye = eps^(2/3)*speye(length(ind1)); | ||
% Prediction for the current query set | ||
Ypred1 = zeros([length(ind1) 1]); | ||
% Overall covariance matrix for current query set | ||
Ycov1 = 0; | ||
% The original BCM where all modules share the same hyperparameters: | ||
% $$$ K11 = gpcovarp(net.module(1), Xtest1, Xtest1); | ||
% $$$ Ycov1 = -(nModules-1)*inv(K11+smallEye); | ||
startInd = 1; | ||
for i = 1:length(net.module), | ||
netI = net.module(i); | ||
K11 = gpcovarp(netI, Xtest1, Xtest1); | ||
K12 = gpcovarp(netI, Xtest1, netI.tr_in); | ||
% Prediction of current module | ||
Ypred2 = K12*net.weight{i}; | ||
% Covariance of current module | ||
Ycov2 = K11-K12*net.invPrior{i}*K12'; | ||
invYcov2 = inv(Ycov2+smallEye); | ||
% Add weighted prediction of the current module | ||
Ypred1 = Ypred1+invYcov2*Ypred2; | ||
% Update overall covariance matrix | ||
Ycov1 = Ycov1+invYcov2; | ||
% Instead of the above (M-1)*inv(K11) term: this has one contribution for | ||
% the prior covariance for each module but the last/first one. The | ||
% last module is usually the smallest, drop this one | ||
if i~=length(net.module), | ||
Ycov1 = Ycov1 - inv(K11+smallEye); | ||
end | ||
if verbosity==2, | ||
fprintf('.'); | ||
end | ||
end | ||
% Ycov1 is the *inverse* covariance of the overall prediction | ||
Ycov1 = inv(Ycov1+smallEye); | ||
% Rescale the sum of the modules' predictions and write into result | ||
Ypred(ind1) = Ycov1*Ypred1; | ||
Yvar(ind1) = diag(Ycov1); | ||
if verbosity>0, | ||
fprintf('\n'); | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
function g = bcmgrad(net, x, t) | ||
% bcmgrad - Error gradient for Bayesian Committee Machine | ||
% | ||
% Synopsis: | ||
% g = bcmgrad(net) | ||
% | ||
% Arguments: | ||
% net: BCM structure | ||
% | ||
% Returns: | ||
% g: Gradient of the error function (marginal likelihood) with respect | ||
% to the kernel parameters | ||
% | ||
% Description: | ||
% Error function and gradient are computed on the basis of the | ||
% pre-initialized data in each GP module, thus no data is required as | ||
% input. | ||
% | ||
% | ||
% See also: bcm,bcmtrain,bcminit,bcmerr | ||
% | ||
|
||
% Author(s): Anton Schwaighofer, Nov 2004 | ||
% $Id: bcmgrad.m,v 1.1 2004/11/18 21:19:46 anton Exp $ | ||
|
||
g = 0; | ||
for i = 1:length(net.module), | ||
netI = net.module(i); | ||
gI = gpgrad(netI, netI.tr_in, netI.tr_targets); | ||
g = g+gI; | ||
end |
Oops, something went wrong.