Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AllocRingBuffer: separate size from capacity and remove power of 2 types #114

Merged
merged 6 commits into from
Sep 15, 2023

Conversation

tertsdiepraam
Copy link
Contributor

@tertsdiepraam tertsdiepraam commented Jun 9, 2023

I've already talked about this with @jonay2000, but to reiterate: this is just an idea. Your current implementation is fine and feel free to just close this.

This patch removes the distinction between PowerOfTwo and NonPowerOfTwo and gives a performance improvement of ~30% to the non power of 2 benchmarks, while making powers of 2 10% slower.

This is achieved by adding a field called size to AllocRingBuffer. The size is always set to the next power of 2 of the capacity. The idea is that the capacity is not the size of the allocation. Instead, an allocation of size is made, so we always mask with the size. But the capacity still forms a hard limit on the length of the buffer.

The upsides are:

  • Fewer panicking methods, because with_capacity now always succeeds for a non-zero capacity. This is (in my opinion) a more predictable behaviour and there is less opportunity to accidentally misuse the library.
  • A significant speedup for capacities that are not powers of two.
  • A simpler API surface:
    • PowerOfTwo, NonPowerOfTwo and RingbufferSize are all removed
    • with_capacity_non_power_of_two is removed (because it's now equivalent to with_capacity)
    • AllocRingBuffer only has one type parameter, which also simplifies a lot of method signatures throughout the code base.

The downsides are:

  • Maybe small regression on capacity of powers of 2.
  • No way to force a power of 2.
  • If the capacity is 2^x + 1 then the buffer will be almost twice as big as strictly necessary.
  • Some other things I'm not thinking of?

Summary of benchmarks:

  • Roughly 30% speedup for non power of 2.
  • Roughly 10% slower on power of 2.
  • I've been running them on a laptop so they should probably be reproduced in a more controlled environment.
Full benchmarks (click to open)
AllocRingBuffer benchmark_push 1M capacity 16
                        time:   [1.5414 ms 1.5459 ms 1.5501 ms]
                        change: [-2.4387% -2.0603% -1.7429%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

AllocRingBuffer benchmark_push 1M capacity 1024
                        time:   [1.4444 ms 1.4454 ms 1.4463 ms]
                        change: [-0.3071% -0.1727% -0.0311%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  5 (5.00%) high mild
  1 (1.00%) high severe

AllocRingBuffer benchmark_push 1M capacity 4096
                        time:   [1.5257 ms 1.5265 ms 1.5274 ms]
                        change: [+11.647% +11.770% +11.896%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

AllocRingBuffer benchmark_push 1M capacity 8192
                        time:   [1.5744 ms 1.5761 ms 1.5782 ms]
                        change: [-0.5984% -0.3924% -0.1769%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

AllocRingBuffer benchmark_various 1M capacity 16
                        time:   [444.66 µs 444.89 µs 445.14 µs]
                        change: [-14.231% -13.261% -12.214%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

AllocRingBuffer benchmark_various 1M capacity 1024
                        time:   [598.97 µs 607.22 µs 613.80 µs]
                        change: [+20.984% +22.717% +24.312%] (p = 0.00 < 0.05)
                        Performance has regressed.

AllocRingBuffer benchmark_various 1M capacity 4096
                        time:   [448.15 µs 448.42 µs 448.70 µs]
                        change: [+6.4816% +6.6414% +6.8085%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

AllocRingBuffer benchmark_various 1M capacity 8192
                        time:   [461.68 µs 461.97 µs 462.27 µs]
                        change: [+1.3473% +2.4754% +3.5285%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

Benchmarking AllocRingBuffer benchmark_push_dequeue 1M capacity 16: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 10.0s. You may wish to increase target time to 11.9s, enable flat sampling, or reduce sample count to 60.
AllocRingBuffer benchmark_push_dequeue 1M capacity 16
                        time:   [2.3352 ms 2.3378 ms 2.3418 ms]
                        change: [+13.344% +13.579% +13.810%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

Benchmarking AllocRingBuffer benchmark_push_dequeue 1M capacity 1024: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 10.0s. You may wish to increase target time to 11.8s, enable flat sampling, or reduce sample count to 60.
AllocRingBuffer benchmark_push_dequeue 1M capacity 1024
                        time:   [2.3330 ms 2.3348 ms 2.3367 ms]
                        change: [-1.7375% -1.5389% -1.3548%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) low mild
  3 (3.00%) high severe

Benchmarking AllocRingBuffer benchmark_push_dequeue 1M capacity 4096: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 10.0s. You may wish to increase target time to 11.8s, enable flat sampling, or reduce sample count to 60.
AllocRingBuffer benchmark_push_dequeue 1M capacity 4096
                        time:   [2.3214 ms 2.3225 ms 2.3239 ms]
                        change: [-1.2033% -1.0586% -0.9162%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  3 (3.00%) high mild
  3 (3.00%) high severe

Benchmarking AllocRingBuffer benchmark_push_dequeue 1M capacity 8192: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 10.0s. You may wish to increase target time to 11.8s, enable flat sampling, or reduce sample count to 60.
AllocRingBuffer benchmark_push_dequeue 1M capacity 8192
                        time:   [2.3198 ms 2.3207 ms 2.3217 ms]
                        change: [+17.118% +17.316% +17.522%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

AllocRingBuffer benchmark_various 1M capacity not power of two 16
                        time:   [483.27 µs 483.58 µs 484.00 µs]
                        change: [-33.932% -33.826% -33.724%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

AllocRingBuffer benchmark_various 1M capacity not power of two 17
                        time:   [500.21 µs 500.46 µs 500.70 µs]
                        change: [-41.461% -41.385% -41.309%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

AllocRingBuffer benchmark_various 1M capacity not power of two 1024
                        time:   [553.69 µs 556.03 µs 557.97 µs]
                        change: [-24.510% -24.243% -24.001%] (p = 0.00 < 0.05)
                        Performance has improved.

AllocRingBuffer benchmark_various 1M capacity not power of two 4096
                        time:   [438.10 µs 439.51 µs 440.70 µs]
                        change: [-43.446% -43.307% -43.187%] (p = 0.00 < 0.05)
                        Performance has improved.

AllocRingBuffer benchmark_various 1M capacity not power of two 8192
                        time:   [457.68 µs 459.10 µs 460.35 µs]
                        change: [-28.359% -27.975% -27.580%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

AllocRingBuffer benchmark_various 1M capacity not power of two 8195
                        time:   [446.68 µs 446.98 µs 447.30 µs]
                        change: [-42.842% -42.745% -42.652%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

Because this diff has a lot of type signature changes I'll add comments to point you to the important bits.

Comment on lines 40 to 46
pub struct AllocRingBuffer<T> {
buf: *mut T,
size: usize,
capacity: usize,
readptr: usize,
writeptr: usize,
mode: PhantomData<SIZE>,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a size field here and removed mode.

src/with_alloc/alloc_ringbuffer.rs Show resolved Hide resolved
src/with_alloc/alloc_ringbuffer.rs Show resolved Hide resolved
@jdonszelmann
Copy link
Collaborator

I measure 57% speedup on non-powers of 2 and +1.2% (likely noise) on powers of two. As well as cleaning up the API a ton, I think this is a good change.

@@ -408,22 +296,25 @@ impl<T> AllocRingBuffer<T, PowerOfTwo> {
/// Panics when capacity is zero or not a power of two
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs out of date here (and probably in other places too)

@jdonszelmann
Copy link
Collaborator

and THAT'S why we have MIRI in the CI lol

@tertsdiepraam
Copy link
Contributor Author

Memory leaks are safe, I don't see any issues here

@jdonszelmann jdonszelmann merged commit 13dc018 into NULLx76:main Sep 15, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants