Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add stein variational gradient descent as sampling possibility, for increased performance #6

Open
jdehning opened this issue Apr 17, 2020 · 6 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@jdehning
Copy link
Member

jdehning commented Apr 17, 2020

In PyMC3, stein variational gradient is already implemented, but it has to be tested how well it works/how biased it is for small number of particles. In addition, the optimal type of optimizer and learning rate has to found out. Eventually it could be tried to reparametrize the model, to get a simpler posterior
Reference for the bias: https://ferrine.github.io/blog/2017/06/04/asvgd-sanity-check/

@jdehning jdehning added enhancement New feature or request help wanted Extra attention is needed labels Apr 17, 2020
@ac-schneider
Copy link
Member

ac-schneider commented Apr 17, 2020

This is a 6 page short summary on Stein Variational Gradient Descent:
http://www.cs.utexas.edu/~lqiang/PDF/svgd_aabi2016.pdf

This short paper gives a brief overview of the key idea of SVGD, and outline several directionsfor future work, including a new theoretical framework that interprets SVGD as a natural gradientdescent of the KL divergence functional on a Riemannian-like metric structure on the space ofdistributions, and extensions of SVGD that allow us to train neural networks to draw approximatesamples from given distributions, and develop new adaptive importance sampling methods withoutassuming parametric forms on the proposals.

@ac-schneider
Copy link
Member

ac-schneider commented Apr 17, 2020

In principle one can use the ASVGD inference, however, one might have to play with the priors a bit to help the algorithm converge. Also, it is not yet recommended to use. What helped in my case to make it converge more robustly:

  • mean prior for lambda_0 of 0.6.
  • prior for I_0 of around 50.
  • adam with learning rate 0.02
  • 300 steps at least .. which takes around 6 minutes in a google colab notebook

If you know how to apply SVGD or a similar VI to boost computing time of the posteriors, let us know here.

@tarbaig
Copy link

tarbaig commented Apr 17, 2020

General comment: If you are hell bent on using variational methods, consider looking at pyro since that framework originally was conceived for exactly that. SVGD exists, but the implementation claims to be 'basic'.

https://pyro.ai/
http://pyro.ai/examples/svi_part_i.html

Any particular reason why variational stuff should be better? People use it to train bayesian neural networks and with those number of parameters sampling becomes unfeasible. But given the few parameters of the current model that should not be a problem.

@jdehning
Copy link
Member Author

We are not hell bent on it. My thought was that variational methods could eventually be parallelized, when we will look at the level of Landkreise.
But that I find that interesting, intrinsically are Monte-Carlo methods scaling worse than variational methods with the number of parameters? My take was that the advanced methods like Hamiltonian MC scale pretty well with the number of parameters. But my knowledge is pretty superficial.

@jdehning
Copy link
Member Author

One this issue, no one is actively working on. So if someone want to have a look...
Things learned so far:

  • the pymc3.ASVGD needs very high temperature, around 2 to give more or less the correct posterior, one the 1 dimensional problem.
  • It takes time until ASVGD and SVGD converges, mainly probably because of wide distribution of I_begin. Learning rates around 0.01 seems to work.

Next steps would be use pymc3.SVGD with about 100 particles and try to apply it to the example_bundeslaender, to see whether one gets faster to approximate good results. There one can also test whether theano uses multiprocessing.

@dmarinere
Copy link

So basically how can people help @jdehning

A different question
how do we infer these rates for different countries?
Are they universal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

4 participants