Questions about prediction of SGNP #288

JianxiangFENG · 2021-02-02T21:42:09Z

I have a few questions about the inference stage of SGNP:

According to the Eq 9) and Algorithrm 1) in the paper, shouldn't there be K precision matrix for each dimension of the output, where K is the number of class? And the dimension of each one is [ batch_size, batch_size], but the total matrix should be [K, batch_size, batch_size], am I understanding something wrong? And in the codes, I can just find the a single covariance matrix with size of [batch_size, batch_size].
After searching the codes for a while, I couldn't find the sampling step which is the 5th step in Algorithm 2). Without this sampling step, the prediction is similar to MAP prediction except for the difference during training. This way to make prediction should be essential in this method, right?

I would appreciate if you can explain more to me.

Best,
Jianxiang

jereliu · 2021-02-03T00:32:41Z

Hi Jianxiang,

Thanks for getting in touch! Sorry for the confusion about the mismatch between the paper and this implementation. Yes we made two changes for computational feasibility / performance reasons:

After some experimentation, we replaced the Laplace-approximated posterior variance with that under Gaussian likelihood. So that one matrix is shared across all classes. Two reasons for making this change are (1) computationally feasibility (esp for ImageNet type of tasks), (2) empirically better OOD performance.
We replaced the Monte-Carlo approximation with the Mean-field approximation for computational feasibility (e.g., here. This is mentioned in Appendix A).

JianxiangFENG · 2021-02-05T14:55:48Z

Thank you for the quick reply!

After some experimentation, we replaced the Laplace-approximated posterior variance with that under Gaussian likelihood. So that one matrix is shared across all classes. Two reasons for making this change are (1) computationally feasibility (esp for ImageNet type of tasks), (2) empirically better OOD performance.

Ok, it's more computationally efficient. However, I don't get the intuition that one variance for the classes can lead to better performance. Because one variance for all classes doesn't seem to make a lot of sense. It's just like temperature scaling with one temperature hyperparamter, instead of modelling the uncertainty for each class. Maybe for other scenarios different variances for different classes are needed. But thanks for letting me know about this.

We replaced the Monte-Carlo approximation with the Mean-field approximation for computational feasibility (e.g., here. This is mentioned in Appendix A).

This is a neat and simple approximation. I am wondering how large is the difference between the sampling and the approximation. I am kind of sure you have done experiments on that. Any systematic comparisons or take-home messages about this?
Thank you in advance!

mdabbah · 2021-03-01T23:06:44Z

Hi,
just throwing a possible explanation here for 1.
maybe one covariance matrix for all classes is better because it reduces the overfitting.
maybe on Large datasets, we would see the opposite (more intuitive) effect: better performance when using covariance matrix for each class, there we would have enough data to better approximate a covariance matrix for each class.

Jordy-VL · 2021-06-03T21:04:30Z

Thank you for the quick reply!

After some experimentation, we replaced the Laplace-approximated posterior variance with that under Gaussian likelihood. So that one matrix is shared across all classes. Two reasons for making this change are (1) computationally feasibility (esp for ImageNet type of tasks), (2) empirically better OOD performance.

Ok, it's more computationally efficient. However, I don't get the intuition that one variance for the classes can lead to better performance. Because one variance for all classes doesn't seem to make a lot of sense. It's just like temperature scaling with one temperature hyperparamter, instead of modelling the uncertainty for each class. Maybe for other scenarios different variances for different classes are needed. But thanks for letting me know about this.

We replaced the Monte-Carlo approximation with the Mean-field approximation for computational feasibility (e.g., here. This is mentioned in Appendix A).

This is a neat and simple approximation. I am wondering how large is the difference between the sampling and the approximation. I am kind of sure you have done experiments on that. Any systematic comparisons or take-home messages about this?
Thank you in advance!

@JianxiangFENG Did you get or figure out an answer to your last question? I am wondering this myself :)

JianxiangFENG · 2021-06-05T10:52:36Z

@Jordy-VL hey, I did not follow it in the end. But the paper relevant paper (https://arxiv.org/abs/2006.0758) is worth reading.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about prediction of SGNP #288

Questions about prediction of SGNP #288

JianxiangFENG commented Feb 2, 2021

jereliu commented Feb 3, 2021

JianxiangFENG commented Feb 5, 2021 •

edited

Loading

mdabbah commented Mar 1, 2021

Jordy-VL commented Jun 3, 2021

JianxiangFENG commented Jun 5, 2021

Questions about prediction of SGNP #288

Questions about prediction of SGNP #288

Comments

JianxiangFENG commented Feb 2, 2021

jereliu commented Feb 3, 2021

JianxiangFENG commented Feb 5, 2021 • edited Loading

mdabbah commented Mar 1, 2021

Jordy-VL commented Jun 3, 2021

JianxiangFENG commented Jun 5, 2021

JianxiangFENG commented Feb 5, 2021 •

edited

Loading