-
-
Notifications
You must be signed in to change notification settings - Fork 437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interest in a weighted index based on balanced trees? #1053
Comments
I'm not sure how significant the performance gains world be, but in principle such an alternative implementation could live in |
Well, that doesn't handle insertions/deletions. One way of getting around that might be to allow dummy entries with weight zero and over-allocate initially, periodically re-allocating, and tracking a "free list". Getting good performance may require a little more work but it's not obvious which would perform best. Also note that due to floating-point inaccuracies you might want to periodically re-calculate the cumulative weights anyway. |
Sorry, I'm forgetting that one still needs to adjust all cumulative weights after the modified entry, which makes that API especially bad for random updates. Yes, a tree based structure storing the sub-tree total in each node could be a decent alternative, if you don't want to just create a new |
I've recently implemented the 'tree based structure storing the sub-tree total in each node' outlined by @dhardy, and would be interested in upstreaming it. The implementation is very straight forward. The tree is stored as a flat vector, and the value of each node is defined as follows:
For sampling an index, we sample a The runtimes of the operations are:
Would a PR for this be welcome? I'm happy to work through any discussions. Open questions are:
|
Sorry for not answering sooner.
Yes, I think so — but I'm not so sure where. We already have
For usage with a fixed distribution of weights it surely makes sense to error as soon as possible (i.e. at construction time). For usage with mutating weights, perhaps there is utility in not raising an error until sampling time. I don't think we should just switch to linear sampling (there may be arguments around rounding errors, but if an item is added with weight=0 I would expect it to never be sampled).
It sounds like the whole point of a tree-based weighted distribution is to support mutation, so yes. As a result I would expect the API may need to differ a little from other weighted distributions. |
No worries, hope you had a good new years. Sounds good. I'll work on a PR targeting Perhaps we could add a link to the tree-based implementation in the documentation of I saw that |
Please take a look at #1372 which should be as discussed. |
Please do. |
Updated the docs and also added overflow handling. PTAL when you get a chance. |
Background
What is your motivation?
The current
WeightedIndex
is pretty good. Lookups perform very well. And while it's possible to do updates, they are potentially slow as the (average) complexity is O(n). For matchmaking systems where updates are about as frequent as samples, this may be better served by an underlying datastructure which offers quick updates.What type of application is this? (E.g. cryptography, game, numerical simulation)
Real-time matchmaking
Feature request
If I'm not mistaken, balanced tree structures like AVL trees or red-black trees enable O(log(n)) operations (search/ sample, update, insert, delete). The constant coefficient will obviously be worse than an implementation using a vector, but for large n, it should pay off.
Is this something that could have a place in rand? I'd be willing to take a shot at it if so.
The text was updated successfully, but these errors were encountered: