-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster algorithm for ordered sampling with replacement #913
base: master
Are you sure you want to change the base?
Conversation
Test that is failing is using the previous algorithm at lines 83-84 of sampling.jl 83 aa = Int.(sample(r, 10; ordered=true))
84 check_sample_wrep(aa, (3, 12), 0; ordered=true, rev=rev) so I think it is unrelated right? It is maybe due to the difference in the random number consumed by the new algorithm |
Actually it seems to me that the way tests are written is not very good because no rng is set, indeed trying to run those tests 100 times, sometimes gets to failure anyway (before and after this pr), should we set an rng? edit: I tried to do it on those tests and it works, but it happens also with other parts of the sampling tests, if I loop over 100 times e.g. direct_sample!([11:20;], zeros(Int, n, 3))
check_sample_wrep(a, (11, 20), 5.0e-3; ordered=false) at lines 69-70 tests fails, so I guess establishing a rng for everything should be a good idea (actually maybe some confidence interval testing could be good practice in this case) |
closing because I want to do same more experimentation with the algorithm before proposing it, you can find them here:https://github.com/Tortar/SortedRands.jl edit: I conducted some local tests, everything seems good to me, wait to hear the opinion of someone else :-) |
gentle bump, since #927 was merged, what about taking a look at another speed-up? :D |
do you have any more review comments @devmotion? :-) |
Bump |
Bump2 |
This is based on a classical result for example described here https://stats.stackexchange.com/questions/348358/a-fast-uniform-order-statistic-generator (and in the reference of the most upvoted answer). I wasn't able to find a reference describing its modification for sampling a finite population, but I adapted it to such a case.
In particular, the performance increase is substantial, more than 5 times faster when the performance increase stabilize, e.g.
This PR:
Main:
the switching point between this algorithm and the one implemented in main is set at
k=10
because I found that empirically at that point the timings were almost equal.Numerically it should be stable enough, but let me know what you think