-
-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure BTPE is not entered when np < 10 #1484
Conversation
The second commit contains an Poisson path in the case BINV is numerically unstable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks fine, except that benchmarks regressed significantly:
$ cargo bench binomial
Compiling rand_distr v0.5.0-alpha.1 (/home/dhardy/projects/rand/rand/rand_distr)
Compiling benches v0.1.0 (/home/dhardy/projects/rand/rand/benches)
Finished `bench` profile [optimized] target(s) in 4.66s
Running benches/base_distributions.rs (target/release/deps/base_distributions-4d7c8319c6c903ac)
running 0 tests
test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 131 filtered out; finished in 0.00s
Running src/distr.rs (target/release/deps/distr-0435b5737d5e6436)
binomial/binomial time: [140103.0524 cycles 140148.2440 cycles 140209.3686 cycles]
thrpt: [17.5262 cpb 17.5185 cpb 17.5129 cpb]
change:
time: [+22.975% +23.201% +23.408%] (p = 0.00 < 0.05)
thrpt: [-18.968% -18.832% -18.683%]
Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
5 (5.00%) low severe
4 (4.00%) low mild
4 (4.00%) high mild
1 (1.00%) high severe
binomial/binomial_small time: [101471.0977 cycles 101671.4520 cycles 101873.9594 cycles]
thrpt: [12.7342 cpb 12.7089 cpb 12.6839 cpb]
change:
time: [+121.89% +122.57% +123.21%] (p = 0.00 < 0.05)
thrpt: [-55.199% -55.071% -54.932%]
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
4 (4.00%) low severe
6 (6.00%) low mild
4 (4.00%) high mild
2 (2.00%) high severe
binomial/binomial_1 time: [66256.7652 cycles 66283.7976 cycles 66311.9841 cycles]
thrpt: [8.2890 cpb 8.2855 cpb 8.2821 cpb]
change:
time: [+163.42% +163.59% +163.75%] (p = 0.00 < 0.05)
thrpt: [-62.086% -62.062% -62.038%]
Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
3 (3.00%) low mild
6 (6.00%) high mild
3 (3.00%) high severe
binomial/binomial_10 time: [112029.5595 cycles 112050.2365 cycles 112072.3636 cycles]
thrpt: [14.0090 cpb 14.0063 cpb 14.0037 cpb]
change:
time: [+64.401% +66.702% +68.728%] (p = 0.00 < 0.05)
thrpt: [-40.733% -40.013% -39.173%]
Performance has regressed.
Found 15 outliers among 100 measurements (15.00%)
2 (2.00%) low severe
6 (6.00%) low mild
5 (5.00%) high mild
2 (2.00%) high severe
binomial/binomial_100 time: [113882.4511 cycles 114051.8796 cycles 114241.4027 cycles]
thrpt: [14.2802 cpb 14.2565 cpb 14.2353 cpb]
change:
time: [+54.992% +56.016% +56.963%] (p = 0.00 < 0.05)
thrpt: [-36.291% -35.904% -35.480%]
Performance has regressed.
Found 16 outliers among 100 measurements (16.00%)
1 (1.00%) low mild
12 (12.00%) high mild
3 (3.00%) high severe
binomial/binomial_1000 time: [311388.9158 cycles 311770.8383 cycles 312189.4034 cycles]
thrpt: [39.0237 cpb 38.9714 cpb 38.9236 cpb]
change:
time: [-0.6452% -0.2210% +0.1944%] (p = 0.30 > 0.05)
thrpt: [-0.1940% +0.2215% +0.6494%]
No change in performance detected.
binomial/binomial_1e12 time: [149142.5352 cycles 149243.5861 cycles 149350.1778 cycles]
thrpt: [18.6688 cpb 18.6554 cpb 18.6428 cpb]
change:
time: [+2.0575% +2.1598% +2.2532%] (p = 0.00 < 0.05)
thrpt: [-2.2035% -2.1141% -2.0160%]
Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
4 (4.00%) high mild
1 (1.00%) high severe
I tried moving the Poisson
sample to a #[cold]
fn, but this didn't have much effect.
Removing the This approach may still be the best option. Further benchmarks:
The |
I wonder if a combination of the commits is best. I suspect (but am not certain) the changes I proposed in #1486 may help some of the benchmarks above, and the changes to r are equally applicable to the changes proposed and bench-marked here. powf is known to be slow, relative to powi which is likely the reason for the benchmark changes you've shown. |
We could also push the |
It should be faster now than the original in the benchmarks, at the cost of a bigger |
New benchmarks:
Our implementation of |
Do you thing the general approach to put precomputation in the structs is a good one? If yes I would invest a bit of time into that to get a good size/performance tradeoff. |
As you say, it's all about tradeoffs (code size and complexity vs working state size vs CPU time). Increasing the state size of The code as-is is probably fine, but if there are easy ways to drop the state size we should consider them. Further, could you merge or rebase on master, then amend/remove the doc added in #1480 please. |
With #[inline] on all sample methods there is no effect on benchmarks; as presented there is <2% on most, +8% time on variable benchmark.
Thanks @dhardy for the work in the Poisson. The struct is now a bit smaller. The struct has quite some padding: 7 bytes from the |
Thanks @benjamin-lieser. I still need to do a more detailed review, but broadly this looks good. There's a thing known as a "niche optimization" and some interest in applying this to Rust, but I'm not sure this got anywhere. Anyway, I suppose that 48 bytes will do! (If people want, we could try to make a size-optimized variant, but that applies to quite a few distributions and mostly there doesn't seem to be enough interest in taking that route.) Benchmarks now (where
Removing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And rebase please
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good; just a few suggestions below.
Thanks! |
CHANGELOG.md
entrySummary
This is an attempt to solve the issue in #1378
Details
The old implementation would enter BTPE even for small
np
in the casen > i32:MAX
and then causes panics, because the the algorithm needsnp
to be large enough. I have fixed this by castingn
tof64
and replacedpowi
withpowf
.The implementation is still not perfect, for example with
n = 1 << 61
andp = 1e-18
we havenp ~ 2.3
but it samples only0
. This is due toq = 1-p = 1.0
exactly. numpy has the same behavior (here already with1e-17
, because they have a higher BINV threshold).I guess we have to either catch this case (and then maybe doing poisson approximation, which should be very precise then) or we do more of the calculations in logspace but this will be slower.