`AllocRingBuffer`: separate size from capacity and remove power of 2 types #114

tertsdiepraam · 2023-06-09T10:56:49Z

I've already talked about this with @jonay2000, but to reiterate: this is just an idea. Your current implementation is fine and feel free to just close this.

This patch removes the distinction between PowerOfTwo and NonPowerOfTwo and gives a performance improvement of ~30% to the non power of 2 benchmarks, while making powers of 2 10% slower.

This is achieved by adding a field called size to AllocRingBuffer. The size is always set to the next power of 2 of the capacity. The idea is that the capacity is not the size of the allocation. Instead, an allocation of size is made, so we always mask with the size. But the capacity still forms a hard limit on the length of the buffer.

The upsides are:

Fewer panicking methods, because with_capacity now always succeeds for a non-zero capacity. This is (in my opinion) a more predictable behaviour and there is less opportunity to accidentally misuse the library.
A significant speedup for capacities that are not powers of two.
A simpler API surface:
- PowerOfTwo, NonPowerOfTwo and RingbufferSize are all removed
- with_capacity_non_power_of_two is removed (because it's now equivalent to with_capacity)
- AllocRingBuffer only has one type parameter, which also simplifies a lot of method signatures throughout the code base.

The downsides are:

Maybe small regression on capacity of powers of 2.
No way to force a power of 2.
If the capacity is 2^x + 1 then the buffer will be almost twice as big as strictly necessary.
Some other things I'm not thinking of?

Summary of benchmarks:

Roughly 30% speedup for non power of 2.
Roughly 10% slower on power of 2.
I've been running them on a laptop so they should probably be reproduced in a more controlled environment.

Full benchmarks (click to open)

AllocRingBuffer benchmark_push 1M capacity 16
                        time:   [1.5414 ms 1.5459 ms 1.5501 ms]
                        change: [-2.4387% -2.0603% -1.7429%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

AllocRingBuffer benchmark_push 1M capacity 1024
                        time:   [1.4444 ms 1.4454 ms 1.4463 ms]
                        change: [-0.3071% -0.1727% -0.0311%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  5 (5.00%) high mild
  1 (1.00%) high severe

AllocRingBuffer benchmark_push 1M capacity 4096
                        time:   [1.5257 ms 1.5265 ms 1.5274 ms]
                        change: [+11.647% +11.770% +11.896%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

AllocRingBuffer benchmark_push 1M capacity 8192
                        time:   [1.5744 ms 1.5761 ms 1.5782 ms]
                        change: [-0.5984% -0.3924% -0.1769%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

AllocRingBuffer benchmark_various 1M capacity 16
                        time:   [444.66 µs 444.89 µs 445.14 µs]
                        change: [-14.231% -13.261% -12.214%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

AllocRingBuffer benchmark_various 1M capacity 1024
                        time:   [598.97 µs 607.22 µs 613.80 µs]
                        change: [+20.984% +22.717% +24.312%] (p = 0.00 < 0.05)
                        Performance has regressed.

AllocRingBuffer benchmark_various 1M capacity 4096
                        time:   [448.15 µs 448.42 µs 448.70 µs]
                        change: [+6.4816% +6.6414% +6.8085%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

AllocRingBuffer benchmark_various 1M capacity 8192
                        time:   [461.68 µs 461.97 µs 462.27 µs]
                        change: [+1.3473% +2.4754% +3.5285%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

Benchmarking AllocRingBuffer benchmark_push_dequeue 1M capacity 16: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 10.0s. You may wish to increase target time to 11.9s, enable flat sampling, or reduce sample count to 60.
AllocRingBuffer benchmark_push_dequeue 1M capacity 16
                        time:   [2.3352 ms 2.3378 ms 2.3418 ms]
                        change: [+13.344% +13.579% +13.810%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe

Benchmarking AllocRingBuffer benchmark_push_dequeue 1M capacity 1024: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 10.0s. You may wish to increase target time to 11.8s, enable flat sampling, or reduce sample count to 60.
AllocRingBuffer benchmark_push_dequeue 1M capacity 1024
                        time:   [2.3330 ms 2.3348 ms 2.3367 ms]
                        change: [-1.7375% -1.5389% -1.3548%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) low mild
  3 (3.00%) high severe

Benchmarking AllocRingBuffer benchmark_push_dequeue 1M capacity 4096: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 10.0s. You may wish to increase target time to 11.8s, enable flat sampling, or reduce sample count to 60.
AllocRingBuffer benchmark_push_dequeue 1M capacity 4096
                        time:   [2.3214 ms 2.3225 ms 2.3239 ms]
                        change: [-1.2033% -1.0586% -0.9162%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low severe
  3 (3.00%) high mild
  3 (3.00%) high severe

Benchmarking AllocRingBuffer benchmark_push_dequeue 1M capacity 8192: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 10.0s. You may wish to increase target time to 11.8s, enable flat sampling, or reduce sample count to 60.
AllocRingBuffer benchmark_push_dequeue 1M capacity 8192
                        time:   [2.3198 ms 2.3207 ms 2.3217 ms]
                        change: [+17.118% +17.316% +17.522%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high severe

AllocRingBuffer benchmark_various 1M capacity not power of two 16
                        time:   [483.27 µs 483.58 µs 484.00 µs]
                        change: [-33.932% -33.826% -33.724%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

AllocRingBuffer benchmark_various 1M capacity not power of two 17
                        time:   [500.21 µs 500.46 µs 500.70 µs]
                        change: [-41.461% -41.385% -41.309%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

AllocRingBuffer benchmark_various 1M capacity not power of two 1024
                        time:   [553.69 µs 556.03 µs 557.97 µs]
                        change: [-24.510% -24.243% -24.001%] (p = 0.00 < 0.05)
                        Performance has improved.

AllocRingBuffer benchmark_various 1M capacity not power of two 4096
                        time:   [438.10 µs 439.51 µs 440.70 µs]
                        change: [-43.446% -43.307% -43.187%] (p = 0.00 < 0.05)
                        Performance has improved.

AllocRingBuffer benchmark_various 1M capacity not power of two 8192
                        time:   [457.68 µs 459.10 µs 460.35 µs]
                        change: [-28.359% -27.975% -27.580%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

AllocRingBuffer benchmark_various 1M capacity not power of two 8195
                        time:   [446.68 µs 446.98 µs 447.30 µs]
                        change: [-42.842% -42.745% -42.652%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

Because this diff has a lot of type signature changes I'll add comments to point you to the important bits.

tertsdiepraam · 2023-06-09T10:57:47Z

src/with_alloc/alloc_ringbuffer.rs

+pub struct AllocRingBuffer<T> {
    buf: *mut T,
+    size: usize,
    capacity: usize,
    readptr: usize,
    writeptr: usize,
-    mode: PhantomData<SIZE>,
 }


I added a size field here and removed mode.

src/with_alloc/alloc_ringbuffer.rs

…stinction

jdonszelmann · 2023-09-15T09:18:51Z

I measure 57% speedup on non-powers of 2 and +1.2% (likely noise) on powers of two. As well as cleaning up the API a ton, I think this is a good change.

…size

tertsdiepraam · 2023-09-15T09:23:09Z

src/with_alloc/alloc_ringbuffer.rs

@@ -408,22 +296,25 @@ impl<T> AllocRingBuffer<T, PowerOfTwo> {
    /// Panics when capacity is zero or not a power of two


Docs out of date here (and probably in other places too)

jdonszelmann · 2023-09-15T09:39:52Z

and THAT'S why we have MIRI in the CI lol

tertsdiepraam · 2023-09-15T09:41:17Z

Memory leaks are safe, I don't see any issues here

tertsdiepraam commented Jun 9, 2023

View reviewed changes

AllocRingBuffer: separate size from capacity and remove power of 2 di…

4b0613e

…stinction

tertsdiepraam force-pushed the cap-is-not-size branch from 1867655 to 4b0613e Compare June 9, 2023 11:20

tertsdiepraam mentioned this pull request Jun 10, 2023

Hint the compiler that capacity is non-zero in mask_modulo #115

Closed

tertsdiepraam marked this pull request as ready for review August 20, 2023 16:35

Merge branch 'main' of github.com:NULLx76/ringbuffer into cap-is-not-…

42fa9ea

…size

tertsdiepraam commented Sep 15, 2023

View reviewed changes

jdonszelmann added 2 commits September 15, 2023 11:29

fix ci

d79e3bb

fix docs

169a34a

jdonszelmann force-pushed the cap-is-not-size branch from 4951949 to 169a34a Compare September 15, 2023 09:35

fix CI

be752ea

fix miri

d5edef3

jdonszelmann merged commit 13dc018 into NULLx76:main Sep 15, 2023
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`AllocRingBuffer`: separate size from capacity and remove power of 2 types #114

`AllocRingBuffer`: separate size from capacity and remove power of 2 types #114

tertsdiepraam commented Jun 9, 2023 •

edited

Loading

tertsdiepraam Jun 9, 2023

jdonszelmann commented Sep 15, 2023

tertsdiepraam Sep 15, 2023

jdonszelmann commented Sep 15, 2023

tertsdiepraam commented Sep 15, 2023

		@@ -408,22 +296,25 @@ impl<T> AllocRingBuffer<T, PowerOfTwo> {
		/// Panics when capacity is zero or not a power of two

AllocRingBuffer: separate size from capacity and remove power of 2 types #114

AllocRingBuffer: separate size from capacity and remove power of 2 types #114

Conversation

tertsdiepraam commented Jun 9, 2023 • edited Loading

tertsdiepraam Jun 9, 2023

Choose a reason for hiding this comment

jdonszelmann commented Sep 15, 2023

tertsdiepraam Sep 15, 2023

Choose a reason for hiding this comment

jdonszelmann commented Sep 15, 2023

tertsdiepraam commented Sep 15, 2023

`AllocRingBuffer`: separate size from capacity and remove power of 2 types #114

`AllocRingBuffer`: separate size from capacity and remove power of 2 types #114

tertsdiepraam commented Jun 9, 2023 •

edited

Loading