Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Warning of FixedNoiseGaussianLikelihood when size of train_x is different than test_x #1883

Closed
Kamuish opened this issue Jan 12, 2022 · 4 comments

Comments

@Kamuish
Copy link

Kamuish commented Jan 12, 2022

As far as I understand, to pass the variance in the data (i.e. the noise in the observations) we must use the FixedNoiseGaussianLikelihood (instead of the now-deprecated WhiteNoise kernel). However, at sampling time this likelihood raises a warning if the sizes of test and train data are different.

Below follows a small example (directly taken from the docs but with this likelihood):

import math
import torch
import gpytorch

from gpytorch.models import ExactGP
from gpytorch.kernels import GaussianSymmetrizedKLKernel, ScaleKernel
from gpytorch.means import ConstantMean
from matplotlib import pyplot as plt

# Training data is 100 points in [0,1] inclusive regularly spaced
train_x_mean = torch.linspace(0, 1, 20)
test_x = torch.linspace(0, 1, 10)
train_y = torch.sin(train_x_mean * (2 * math.pi)) + torch.randn(train_x_mean.size()) * 0.2

train_x_stdv = torch.linspace(0.03, 0.01, 20)
train_x_distributional = torch.stack((train_x_mean, (train_x_stdv ** 2).log()), dim=1)
test_x_distributional = torch.stack((test_x, (1e-2 * torch.ones_like(test_x)).log()), dim=1)


class ExactGPModel(ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(ExactGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = ConstantMean()
        self.covar_module = ScaleKernel(GaussianSymmetrizedKLKernel())

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)


# initialize likelihood and model

likelihood = gpytorch.likelihoods.FixedNoiseGaussianLikelihood(torch.rand(train_y.shape))
model = ExactGPModel(train_x_distributional, train_y, likelihood)

# this is for running the notebook in our testing framework
import os

smoke_test = ('CI' in os.environ)
training_iter = 2 if smoke_test else 500

# Find optimal model hyperparameters
model.train()
likelihood.train()

# Use the adam optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.25)  # Includes GaussianLikelihood parameters

# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

for i in range(training_iter):
    # Zero gradients from previous iteration
    optimizer.zero_grad()
    # Output from model
    output = model(train_x_distributional)
    # Calc loss and backprop gradients
    loss = -mll(output, train_y)
    loss.backward()
    print('Iter %d/%d - Loss: %.3f   lengthscale: %.3f  ' % (
        i + 1, training_iter, loss.item(),
        model.covar_module.base_kernel.lengthscale.item(),
    ))
    optimizer.step()

# Get into evaluation (predictive posterior) mode
model.eval()
likelihood.eval()

# Test points are regularly spaced along [0,1]
# Make predictions by feeding model through likelihood
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    observed_pred = likelihood(model(test_x_distributional))

with torch.no_grad():
    # Initialize plot
    f, ax = plt.subplots(1, 1, figsize=(8, 3))

    # Get upper and lower confidence bounds
    lower, upper = observed_pred.confidence_region()
    # Plot training data as black stars
    ax.errorbar(train_x_mean.numpy(), train_y.numpy(), fmt='k*')
    # Plot predictive means as blue line
    ax.plot(test_x.numpy(), observed_pred.mean.numpy(), 'b')
    # Shade between the lower and upper confidence bounds
    ax.fill_between(test_x.numpy(), lower.numpy(), upper.numpy(), alpha=0.5)
    ax.set_ylim([-3, 3])
    ax.legend(['Observed Data', 'Mean', 'Confidence'])

plt.show()

After running the code, I get:

/home/amiguel/GP_spectral_modelling/venv/lib/python3.8/site-packages/gpytorch/likelihoods/gaussian_likelihood.py:270: GPInputWarning: You have passed data through a FixedNoiseGaussianLikelihood that did not match the size of the fixed noise, *and* you did not specify noise. This is treated as a no-op.
@wjmaddox
Copy link
Collaborator

Yes, the warning occurs because you didn't specify the noise on the test values. To do so, you just need to specify the noise values in the likelihood like this:

with torch.no_grad(), gpytorch.settings.fast_pred_var():
    observed_pred = likelihood(model(test_x_distributional), noise=torch.rand(test_x_distributional.shape[0]))

which produces pretty different confidence bands than the plot from your script (which doesn't add any noise in).

@Kamuish
Copy link
Author

Kamuish commented Jan 13, 2022

Thanks for the quick reply and help.

However, your answer leaves me with two extra questions:

  • As far as I understand from the docs, if I want to add the variance of my observations to the diagonal of the covariance matrix (and avoid including any extra jitter term in the model) I have to use the FixedNoiseGaussianLikelihood as exemplified in the script from above, right? After training the GP (the model) I don't understand why I must know the variance in the regions where I want to predict new values. Could you shed some light on what is going on for this to be needed?

  • When the two "X tensors" are equal, is there any kind of mapping for the errors associated with the Y measurements? Assuming that you know the uncertainties of the data on the X_train data and you use a different X_test tensor (with the same size), will the uncertainties of X_test be matched by index or by closeness to the X_train values?

@wjmaddox
Copy link
Collaborator

For the first question, you only need to know the variance if you want to know p(y | D) rather than p(f | D).. For p(f | D) you can just use model(test_x).

For the second, I don't follow... but you could setup a heteroscedastic GP that does actually have a noise model. See #982 and botorch's canned heteroscedastic GP (https://botorch.org/api/models.html#botorch.models.gp_regression.HeteroskedasticSingleTaskGP).

@Kamuish
Copy link
Author

Kamuish commented Jan 13, 2022

For the first question, you only need to know the variance if you want to know p(y | D) rather than p(f | D).. For p(f | D) you can just use model(test_x).

I was under the impression that the calls for the model had to be wrapped with the likelihood, I must have misread the docs. Sorry about that .

For the second, I don't follow... but you could setup a heteroscedastic GP that does actually have a noise model. See #982 and botorch's canned heteroscedastic GP (https://botorch.org/api/models.html#botorch.models.gp_regression.HeteroskedasticSingleTaskGP).

I will take a look at this.

Once again, thank you for the help!

@Kamuish Kamuish closed this as completed Jan 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants