-
-
Notifications
You must be signed in to change notification settings - Fork 434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added new versions of choose and choose_stable #1268
Conversation
Co-authored-by: Vinzent Steinberg <[email protected]>
Co-authored-by: Vinzent Steinberg <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a nice approach, but the implementation needs a bit of polish.
A shame you didn't use criterion for the new benchmarks, but no need to rewrite it now.
I'd also like to see benchmarks for very small lengths like 1/2/3 elements ideally. Some constructs like iter::once
and Iterator::chain
might produce these. Probably not worth keeping all variants in the benchmark so a quick hack will do.
src/seq/coin_flipper.rs
Outdated
// Explanation: | ||
// We are trying to return true with a probability of n / d | ||
// If n >= d, we can just return true | ||
// Otherwise there are two possibilities 2n < d and 2n >= d | ||
// In either case we flip a coin. | ||
// If 2n < d | ||
// If it comes up tails, return false | ||
// If it comes up heads, double n and start again | ||
// This is fair because (0.5 * 0) + (0.5 * 2n / d) = n / d and 2n is less than d (if 2n was greater than d we would effectively round it down to 1 by returning true) | ||
// If 2n >= d | ||
// If it comes up tails, set n to 2n - d | ||
// If it comes up heads, return true | ||
// This is fair because (0.5 * 1) + (0.5 * (2n - d) / d) = n / d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using expected value notation:
Let X(n,d) be an event that is true with probability n/d:
X(n,d) ≡ x < n/d for uniform x drawn from [0, 1)
Let X'(n,d) ≡ x' < n/d for x' drawn from [0, 1) independent of x.
E(X(n,d) | 2n < d)
= E(x<n/d | x < 1/2) / 2 + E(x<n/d | x ≥ 1/2) / 2
= E(x' < 2n/d) / 2 + E(false) / 2
= E(X'(2n, d) / 2 + E(false) / 2
E(X(n,d) | 2n ≥ d)
= E(x<n/d | x < 1/2) / 2 + E(x<n/d | x ≥ 1/2) / 2
= E(true) / 2 + E(x' < 2*(n/d - 1/2)) / 2
= E(true) / 2 + E(x' < 2n/d - 1) / 2
= E(true) / 2 + E(x' < (2n - d)/d) / 2
= E(true) / 2 + E(X'(2n-d, d)) / 2
Thanks for having a look at this. I'm afraid I've been a bit sick for the past few days but I will fix the problems and change the benchmarks soon. |
This reverts commit 2339539.
I've updated the code to resolve the correctness issue. Unfortunately this has led to a slight performance regression for the case when you are using a window-hinted iterator and a crypto RNG. It is now about 10% slower in those cases (this is not entirely unexpected, in this situation the new code is not really being used but is also not being optimized away like in the size-hinted version). Anyway, I understand if you want to abandon this as it is a noticeable performance regression in that case. I also added benchmarks for 1,2, and 3 elements. The new method leads to a big performance improvement in these cases but that is largely because the old method is generating a random number to test the first element whereas the new method is skipping that test. I tried using criterion benchmarks. They give pretty similar results (both the default measurement and with cycles-per-byte) but I couldn't find a way to use criterion for that benchmark without adding criterion it as a dev-dependency for the whole project thereby making all of the testing ci take much longer. Also I seem to have broken two of the test builds and I'm not entirely sure how to fix them. These are the new benchmark results, hopefully in a more readable format Window HintedRegression for the
UnhintedAbout 1.2-3x performance improvement
StableAbout 1.2-3x performance improvement
Size HintedNo change. Note that each run is 1000 iterations
|
Thanks. Re-reviewing is on my to-do list.
Eventually I think we'd like to migrate all benches to criterion anyway (e.g. see this and this), so no worries about adding it as a dev-dependency. If you already wrote Criterion versions then it'd be nice to have those here. Don't worry about those CI tests; they should be resolved by #1269. |
I ran your benchmarks on my system (5800X) to see if I get similar results... broadly speaking yes (this PR is still a big improvement) but the details are quite different. Most surprising is that quite a few of your results show no significant change, while none of mine do. Also, your highest speedup is 8 where as mine is 4.5. Your ratio is copied to the last column.
|
Slightly confused by your benchmark results. My code changes shouldn't effect the size-hinted iterator at all so it's surprising that you'd see a noticeable improvement. |
Updated benchmarks using your Criterion bench. Baseline is master branch. Size hinted: 1-2% slower (excluding "from 1", which are too fast for useful timings)
Stable: good improvements (mostly 30-50% less time)
Unhinted: similarly good improvements
Window hinted: within -3% to +10% time, aside from 1-elt sizes
Overall nice improvements. There are quite a few outliers (I didn't bother limiting CPU boosting) but I think the results are still useful, excluding 1-element sizes. There are some unusual results at the 3-10 element sizes. |
The test failures are annoying; at least the 1.56 (MSRV) one is due to adding Criterion. I'll try to fix this with a new PR before merging this. |
Done: #1275. Could you rebase or merge please? |
…t-random-master
Thanks @wainwrightmark. Comparing PCG64 results to the other RNGs, they mostly look the same as PCG32 results (aside from window-hinted, where they are close to ChaCha results). If the RNG were fully utilised we should see better results (excepting short iterators), but at least these results aren't too bad. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think this is ready to be merged?
I hacked this to generate results using a 64-bit I suppose this goes to show that Pcg64 is fast enough that discarding half the output each time isn't very important, while ChaCha really is impacted by the extra discarded bytes (but only in some of the benches). Or it's just micro-benchmark weirdness. Either way, we can drop the idea of using 64-bit chunks here unless anyone wants to further investigate. |
Benchmark results
|
Related to #1266
Edit: I have made some performance improvements (now 2-3x as fast as the original and updated the benchmarks)
DO NOT MERGE as is. The old functions need to be removed and the new ones renamed. I added a lot of benchmarks to test the new changes, they don't need to be permanent. I think
Coin_Flipper
could do with a better name too, and better tests.This adds new, better performing, versions of
choose()
andchoose_stable()
which perform about 1.5x to 2x as fast.This actually uses a different method to the one described in the issue, it now only needs to generate one
u32
per 16 iterator items on average. This works much better than the other version on longer lists.This is a value breaking change.
Benchmark results below.
Summary: basically unchanged for
size_hinted
andwindow_hinted
. About 1.5-2x improvement forunhinted
andstable
, thought these are more modest when using faster RNGs and when there are fewer elements.