-
Notifications
You must be signed in to change notification settings - Fork 560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docs] Way to model heteroskedastic noise? #982
Comments
If I were to do it my way, I would build a kernel in which the latent variables are extra parameters to optimise over, but I wouldn't know how to put priors on them, neither how to integrate over them as in the paper. |
Do you have observations for the observation noise? If so, take a look at the HeteroskedasticSingleTaskGP that is implemented in BoTorch. That uses a nested GP model to model the log-variance of the observation noise. If not, we also have pytorch/botorch#250 that uses a "most likely heteroskedastic GP" to infer the noise. This will still need to be cleaned up some though. |
Wouldn't you say that 2 layer GP is sufficient? |
Hi, I've tried to run the example you gave, and the following issue appeared:
I do believe it comes from this call hetero_model = HeteroskedasticSingleTaskGP(train_X=train_X, train_Y=train_Y,
train_Yvar=observed_var) with observed variances obtained as the result of a GPytorch homoscedastic GP fit botorch.fit.fit_gpytorch_model(homo_mll)
# get estimates of noise
homo_mll.eval()
with torch.no_grad():
homo_posterior = homo_mll.model.posterior(train_X.clone())
homo_predictive_posterior = homo_mll.model.posterior(train_X.clone(),
observation_noise=True)
sampler = IIDNormalSampler(num_samples=num_var_samples, resample=True)
predictive_samples = sampler(homo_predictive_posterior)
observed_var = 0.5 * ((predictive_samples - train_Y.reshape(-1,1))**2).mean(dim=0) I do not know what happens inside of the However, it does not seem that the original paper suggests one should change the target GP parameters to fit the likelihood of the noise ( r(x) ) model (it is written the two GP should be independent). I am referring to the following paragraphs:
And for what G3 and its likelihood are:
This makes clear that at each step, the noise levels of the previous GP should be fixed, and not change as to fit G2. hetero_model = HeteroskedasticSingleTaskGP(train_X=train_X, train_Y=train_Y,
train_Yvar=observed_var.detach()) I no longer have the aforementioned issue. Essentially, I would like to know
Any links, explanation or hint would help me greatly ! |
For reference: the collab link showing how to use a simpler model uses the following snippet to observe the variance: with torch.no_grad():
# watch broadcasting here
observed_var = torch.tensor(
np.power(mll.model.posterior(X_train).mean.numpy().reshape(-1,) - y_train.numpy(), 2),
dtype=torch.float
) Detaching variance seems the correct way. |
Another Related question: If I wanted to save the previous heteroskedastic model and resume training instead of fitting a new one, would there be a principled/generic way of doing so? |
Hi @ArnoVel , sorry for the delay here. Let me see if I can check off your questions. Generally, I should say that #250 is quite old and does not make use of a number of changes that have happened in gpytorch since. We need top clean this up and get it up to date.
That would be a reasonable way to model things as well, depending on the application. One way to do something like this wold be to use a multi-task GP (e.g. w. an ICM kernel) to model both the output and the noise level. I'm not sure how to go about sticking that modeled noise level into the kernel for computing the posterior though.
Yes, you'd want to do
Correct, this should be a classing EM-style fitting procedure. If it's not then there is likely something wrong going on.
From the plot, the noise level of the heteroskedastic version seems too overfit the noise variance (i.e. ends up with very short lengthscales.). Not quite sure which version you're running this on, but in the initial version we may not have properly accounted for the log-likelihood of the noise model itself. I put the infra in place for fixing this in #870. Note that this requires to add the
Yes you should be able to do the standard pytorch thing of calling
Yes that's how it works
Depends on how you call it. If you use the constructor I hope this clarifies some things. |
Hi @Balandat thanks for this detailed answer! About "warm starting", I mentioned it because the paper suggests to "set G1=G3", which might imply that the current parameters of G3 should be kept as initial values for the next loop.. About Maybe the overfitting comes from using the most likely noise, and not sampling from the noise posterior? Anyway, I will be trying to implement different versions of this method in the future, so any help would be greatly appreciated 😀 |
Hi @ArnoVel, you may find useful this answer in another issue: #1158 (comment) |
📚 Documentation/Examples
Hi,
I am fairly new to gpytorch and have very basic knowledge of GPs in general.
I found this paper which uses latent variables (with a gaussian prior) as additional variables to model heteroskedastic relationships.
What would be the easiest way to implement this using gpytorch?
Is there documentation missing?
Perhaps some documentation might already exists on this issue. If so, please direct me to the relevant pages!
Thanks,
A V
The text was updated successfully, but these errors were encountered: