-
Notifications
You must be signed in to change notification settings - Fork 77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thoughts on reusing/improving the bounding information #232
Comments
This is a good suggestion and overall would definitely be a good step towards improving the stability and performance. For context, the strategy right now is to construct the bounding distributions from scratch each time using a "recursive splitting k-means" approach. This ensures repeatability by starting from the same place each time, but obviously throws out information from the last set of bounding ellipsoids. One of the biggest barriers to incorporating previous information with the current scheme is it only includes steps to further split bounds, rather than merge them. With respect to the things you're suggesting:
|
Just noting that the recent fix merged as a part of #248 tangentially touches this issue (but doesn't resolve it). |
Yes, I also thought that the new add_batch can potentially discover new posterior features using all the accumulated up to a point history. My only potential worry is the possible quadratic behaviour of it. I.e. add_batch relies on the whole history, which will grow with time. |
I'll close this and keep the #237 as the main issue on the bounding ellipsoids discussion. |
Hi Josh,
This is definitely not a bug, but just some thoughts on a sampling issues I had recently.
The problem that I've been struggling over last few months features a very complicated posterior in high-ish (11) dimensional space with very narrow and veryb broad features as well as multiple modes.
A constant struggle I have with this is missing parts of the posterior due to either missed modes or incorrect approximation by bounding ellipsoids etc.
A default way of dealing with this is to use large number of live points, but obviously the code doesn't scale really well with n> few thousands. An alternative way would be doing multiple runs and then merge the runs, but as far as I understand that's not really correct if the different runs discover/miss different areas of the posterior.
Also, an annoying feature of doing multiple nested runs is that the bounding information from one set of runs is completely ignored by future runs which seems like a waste.
Also, as far as I understand the rejected samples from the run could also be used to refine the bounding information, because the number of rejected samples is >> number of accepted samples. Alternatively it would be good to be able to at least verify using all the samples in the run (accepted and rejected) that the bounds are correct or ideally adjust the bounds.
I don't have a concrete plan with this, but it would be nice to have
I don't think dynesty preserves that. (doing that would probably require storing it on disk)
It would be interesting to see what you think of this and whether trying to implement some of this functionality would be useful.
The text was updated successfully, but these errors were encountered: