Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing the most-likely heteroskedastic GP described in #180 #250

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mc-robinson
Copy link

Hi @Balandat, here is a very initial draft of the PR related to #180 .

A couple of things:

  • Normalization and standardization are really important for this function, but I am having trouble checking these requirements without the added batch dimension (check_standardization requires it). I'll probably need your help on this because it is unclear to me how I should enforce the batch shape since it is not a class extending a model with _set_dimensions

  • I have not yet implemented test functions, and may again need your expertise on this one. It's also a bit confusing since it is written as a function and not a test case for a model. I should also mention that none of the test cases I used for benchmarking in [Feature Request] Implement most likely heteroskedastic GP and write a tutorial notebook #180 had multidimensional outputs, so that may need to be checked.

Sorry this is such an incomplete PR, but hopefully it gives a draft function to work with, which you all can then extend. Happy to help any way I can. Also, I'll start trying to draft up the text for a notebook tutorial as suggested in #180

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Aug 26, 2019
@Balandat
Copy link
Contributor

Thanks @mc-robinson, this is a great start. Will take a closer look in the next couple of days and help out with the shape challenges.

Re the test: This should be pretty straightforward since the model is a just a standard HeteroskedasticGP - so we can just write test cases for the model fitting and then do a very basic sanity check on the model. From a purely mechanical pov this is the same as testing any other function, see e.g. this test case.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shubhankaray has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@matthewcarbone
Copy link

@Balandat I'm digging up a really old PR here but it seemed like the right place to ask after doing a bit of searching for the right topic. It appears that this heteroskedastic noise GP is actually implemented and as such this PR can be closed (botorch.models.gp_regression. HeteroskedasticSingleTaskGP ). Is this correct? Just trying to make sure I'm using the right model. Thanks!

@CompRhys
Copy link
Contributor

CompRhys commented Mar 8, 2023

@Balandat I'm digging up a really old PR here but it seemed like the right place to ask after doing a bit of searching for the right topic. It appears that this heteroskedastic noise GP is actually implemented and as such this PR can be closed (botorch.models.gp_regression. HeteroskedasticSingleTaskGP ). Is this correct? Just trying to make sure I'm using the right model. Thanks!

Not @Balandat but have been looking at this today also so will chime in. Short answer no. The HeteroskedasticSingleTaskGP in Botorch requires a y variance for each input whereas this PR attempts to implement the most likely hetroskedastic GP model that takes just the inputs x and y and fits the noise GP using an EM loop. If you have uncertainty or precision measurements the class that is implemented is what you want if not then this PR implements a model that doesn't require the variances as inputs although I believe there is some bug/API issue preventing it being merged.

@matthewcarbone
Copy link

@CompRhys ah of course. This is what happens when I comment on PRs after just waking up. Thanks for clarifying!

@Balandat
Copy link
Contributor

Balandat commented Mar 8, 2023

I believe there is some bug/API issue preventing it being merged

I don't think there is an API issue; basically there were some issues with the fitting stability (and there may be bugs). We'd basically need to get back to this and spend some time on it but it has been hard to find that time given our other priorities. If anyone needs this model it would be great if they could take a stab at it; we're very happy to provide feedback and input on it!

@CompRhys
Copy link
Contributor

CompRhys commented Mar 8, 2023

#861 - This is the potential API issue I think is related as when trying to piece this together it links to issues where the noise is still logged internally and this is having some impact as the inverse transform not happening

@CompRhys
Copy link
Contributor

CompRhys commented Mar 8, 2023

If anyone needs this model it would be great if they could take a stab at it; we're very happy to provide feedback and input on it!

Would you want it like this PR as a utility function as opposed to a Class with overridden fit method?

@matthewcarbone
Copy link

I'm trying to get approval to contribute upstream to BoTorch (re: CLA) so if I end up getting the green light this would definitely be something I'd be interested in trying to help with. Will see what happens 👍

@sgbaird
Copy link
Contributor

sgbaird commented Mar 9, 2023

Hey @CompRhys! Great to see you here! I was reading through a recent article from Milad Abolhasani and Keith Brown (DOI: 10.1557/s43577-023-00482-y), which led me back to heteroskedastic noise. Here's a brief snippet:

While [Bayesian optimization] has gained a great deal of traction in guiding experiments, Noack and Reyes discuss the limitations of these off-the-shelf tools and how they should be modified to improve progress. Specifically, key simplifying assumptions of most GPR approaches are that points that are equidistant in parameter space are equally correlated and each location in parameter space is expected to feature the same amount of noise.

The article from Noack and Reyes was an interesting read. My main takeaways were that non-stationary kernels are especially useful for materials science problems but are non-trivial to find optimal solutions due to the large number of hyperparameters, and (relevant to this PR) shifting towards a heteroskedastic treatment of noise is useful.

@matthewcarbone
Copy link

@sgbaird If you'd like, I can ask Kris [Reyes] about this next time I meet with him, we chat ~weekly 😁

I'm slowly being convinced that kernel flexibility is important. That said, I don't know how broadly useful very flexible or niche kernels are for the larger community. Definitely going to read that paper though, thanks for sharing.

@Balandat
Copy link
Contributor

Would you want it like this PR as a utility function as opposed to a Class with overridden fit method?

Hmm so our classes don't actually have a fit method themselves, but we have moved to multiple-dispatch based fitting. So you could register a custom fitting method for this model as is done e.g. here: https://github.com/pytorch/botorch/blob/main/botorch/fit.py#LL286-L286C2

So there are two options here:

  1. Implement a MostLikelyHeteroskeadsticGP model class and hook it up to the custom fitting strategy using multiple dispatch. This is a bit of a weird setup though - currently models that are fit are modified in-place during fitting, and they don't change type. The most likely heteroskedastic GP is really just a way of generating a HeteroskedasticSingleTaskGP object, it's just generated somewhat differently. I guess the way you could do this is subclass MostLikelyHeteroskeadsticGP from HeteroskedasticSingleTaskGP, change the constructor to not take in train_Yvar, and then in the custom fitting method just generate a sequence of intermediate models to estimate the variances, and load the (estimated) train_Yvar into the final model.
  2. Just use a helper function as in this PR. This should be quite a bit simpler and not cause any issues / challenges with the class structure. I suggest going with this approach as a first pass (can always rethink that when we have a working model).

@Balandat
Copy link
Contributor

I'm slowly being convinced that kernel flexibility is important. That said, I don't know how broadly useful very flexible or niche kernels are for the larger community. Definitely going to read that paper though, thanks for sharing.

Yeah It think this is a good point. If you're a domain expert and understand the problem well, customizing kernels, priors, etc., can be very helpful to improve performance on the problem at hand. But this increases complexity and the number of failure modes, so it's a tradeoff how much we want to have more specialized / flexible kernels supported in general purpose software.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants