Binv remodel #1486

jamaltas · 2024-08-12T06:54:16Z

[] Added a CHANGELOG.md entry

Summary

Slightly refactor algorithm to match the Binary Inverse algorithm used by numpy in order to remove the n <= i32::MAX issues.

Motivation

In my use case (large N, small p, seen frequently in computational biology with large N populations of cells with some small mutation rate p, I encountered frequent crashes.

Details

I found the numpy C-implementation of the BINV algorithm and noted they had no i32::MAX constraint to enter. The issue with the current implementation comes as a result of powi() taking i32. I modified the current algorithm to fit the implementation that numpy uses which relaxes this constraint. I tested the new algorithm against a handful of (n,p) values for large n,p and it seems to match what I get in python. I don't think this introduces any new panic/crash conditions, but I may have missed something.

Align BINV algorithm with NUMPY to relax i32::MAX condition to enter BINV algorithm.

dhardy · 2024-08-12T08:37:31Z

This appears to be an alternative to #1484.

Added n = 1 case. Added np < 10e-10 case. Optimized all BINV branches. Significant performance gains across most of the parameter space. No performance losses for any benchmarks.

jamaltas · 2024-08-13T06:58:31Z

Okay, I had the time to sit down tonight and solve the issues both here and with #1484 I believe.

For the case of binomial_1 (n = 1, p=0.9) the calculation can be dramatically simplified. I've added this code path. I can imagine this is a common use case and it is a simple addition.
For the case of binomial_small the powi() algorithm is significantly faster for n*p < approx 10e-10. I've added this code path, although it remains constrained by i32::MAX. I can see the argument for both sides of this whether to carve out this code path or not. We do see fairly significant performance gains, so I think it doesn't hurt but I won't argue much either way.
For the rest of the np < BINV_THRESHOLD values the code uses my updated algorithm for r. This is both more efficient in benchmarks and allows us to drop the i32::MAX constraint.

Benchmarks showing significant performance gains across a wide range of benchmarks and no change to two benchmarks (binomial_small as it continues to use the powi() algorithm, and binomial_1e12 because I did not change the BTPE algorithm which this test benchmarks):

Original (master branch):
binomial/binomial
time: [186779.4481 cycles 187011.4868 cycles 187239.1338 cycles]
thrpt: [23.4049 cpb 23.3764 cpb 23.3474 cpb]

binomial/binomial_small
time: [28230.1376 cycles 28379.1098 cycles 28556.1121 cycles]
thrpt: [3.5695 cpb 3.5474 cpb 3.5288 cpb]

binomial/binomial_1
time: [32013.4558 cycles 32054.6770 cycles 32098.4895 cycles]
thrpt: [4.0123 cpb 4.0068 cpb 4.0017 cpb]

binomial/binomial_10
time: [146572.2548 cycles 146784.9231 cycles 147007.3199 cycles]
thrpt: [18.3759 cpb 18.3481 cpb 18.3215 cpb]

binomial/binomial_100
time: [147255.9131 cycles 147529.2020 cycles 147813.5971 cycles]
thrpt: [18.4767 cpb 18.4412 cpb 18.4070 cpb]

binomial/binomial_1000
time: [348917.4704 cycles 399027.3114 cycles 448866.4078 cycles]
thrpt: [56.1083 cpb 49.8784 cpb 43.6147 cpb]

binomial/binomial_1e12
time: [133848.6357 cycles 136388.7594 cycles 139883.7218 cycles]
thrpt: [17.4855 cpb 17.0486 cpb 16.7311 cpb]

This (#1486) commit:
Newly optimized:

binomial/binomial
time: [143210.8756 cycles 143539.7949 cycles 143940.9181 cycles]
thrpt: [17.9926 cpb 17.9425 cpb 17.9014 cpb]
Performance has improved ~ -23%

binomial/binomial_small
time: [26698.7778 cycles 26718.3567 cycles 26741.9949 cycles]
thrpt: [3.3427 cpb 3.3398 cpb 3.3373 cpb]
Performance is unchanged.

binomial/binomial_1
time: [22258.1045 cycles 22271.8067 cycles 22288.2595 cycles]
thrpt: [2.7860 cpb 2.7840 cpb 2.7823 cpb]
Performance has improved ~ -30%

binomial/binomial_10
time: [107829.8224 cycles 107948.2544 cycles 108114.5181 cycles]
thrpt: [13.5143 cpb 13.4935 cpb 13.4787 cpb]
Performance has improved ~ -26%

binomial/binomial_100
time: [113915.2959 cycles 113955.7108 cycles 113998.6718 cycles]
thrpt: [14.2498 cpb 14.2445 cpb 14.2394 cpb]
Performance has improved ~ -22%

binomial/binomial_1000
time: [280200.3562 cycles 280355.2869 cycles 280542.6277 cycles]
thrpt: [35.0678 cpb 35.0444 cpb 35.0250 cpb]
Performance has improved ~ 20%

binomial/binomial_1e12
time: [137204.5220 cycles 137475.2389 cycles 137807.7701 cycles]
thrpt: [17.2260 cpb 17.1844 cpb 17.1506 cpb]
Performance is unchanged.

rustfmt loose ends

benjamin-lieser · 2024-08-15T10:16:50Z

The n==1 case is trivial, no binv needed:

let u : f64 = rng.random();
if u < p {1} else {0}

I also think we can unify the code paths.
Also not sure if this catches the q == 1.0 problem

jamaltas added 3 commits August 12, 2024 02:26

Update binomial.rs

e1ffc7d

Align BINV algorithm with NUMPY to relax i32::MAX condition to enter BINV algorithm.

Update binomial.rs

bbeffe9

Update CHANGELOG.md

4b9c75d

jamaltas mentioned this pull request Aug 13, 2024

Make sure BTPE is not entered when np < 10 #1484

Merged

1 task

Update binomial.rs

79cc79d

Added n = 1 case. Added np < 10e-10 case. Optimized all BINV branches. Significant performance gains across most of the parameter space. No performance losses for any benchmarks.

dhardy and others added 2 commits August 13, 2024 17:08

Merge branch 'master' into binv-remodel

d650bd4

Update binomial.rs

683e3f0

rustfmt loose ends

dhardy closed this Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Binv remodel #1486

Binv remodel #1486

Uh oh!

jamaltas commented Aug 12, 2024

Uh oh!

dhardy commented Aug 12, 2024

Uh oh!

jamaltas commented Aug 13, 2024

Uh oh!

benjamin-lieser commented Aug 15, 2024

Uh oh!

Uh oh!

Uh oh!

Binv remodel #1486

Binv remodel #1486

Uh oh!

Conversation

jamaltas commented Aug 12, 2024

Summary

Motivation

Details

Uh oh!

dhardy commented Aug 12, 2024

Uh oh!

jamaltas commented Aug 13, 2024

Uh oh!

benjamin-lieser commented Aug 15, 2024

Uh oh!

Uh oh!