Skip to content

Commit

Permalink
fix spelling
Browse files Browse the repository at this point in the history
  • Loading branch information
sivukhin committed Jan 14, 2024
1 parent 4c3395e commit c8f872d
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 18 deletions.
18 changes: 9 additions & 9 deletions find-slice-element-position-in-rust.dj
Original file line number Diff line number Diff line change
@@ -1,15 +1,15 @@
{date="2024/01/13"}
# Find slice element position in Rust

I started to learn `Rust` only recently and while exploring [slice methods][] I was a bit surprised that I didn't find any method for finding position of element in the slice:
I started to learn `Rust` only recently and while exploring [slice methods][] I was a bit surprised that I didn't find any method for finding the position of element in the slice:

{.noline}
``` rust
fn find(haystack: &[u8], needle: u8) -> Option<usize> { ... }
```


I had some experience with `Zig` and it has pretty cool [`std.mem`][zig stdmem] module with many generic functions including `indexOf`, which internally implements [Boyer-Moore-Horspool][] pattern matching algorithm against generic element type `T`:
I had some experience with `Zig` which has a very useful [`std.mem`][zig stdmem] module with many generic functions including `indexOf`, which internally implements [Boyer-Moore-Horspool][] pattern matching algorithm against generic element type `T`:

{.noline}
``` zig
Expand All @@ -20,15 +20,15 @@ fn indexOf(comptime T: type, haystack: []const T, needle: []const T) ?usize { ..
[zig stdmem]: https://ziglang.org/documentation/master/std/#A;std:mem
[Boyer-Moore-Horspool]: https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore%E2%80%93Horspool_algorithm

After discussion with `Rust` experts I quickly got the response that I can just use methods of `Iterator` traits:
After discussing with `Rust` experts I quickly got the response that I can just use methods of `Iterator` traits:

```rust
fn find(haystack: &[u8], needle: u8) -> Option<usize> {
haystack.iter().position(|&x| x == needle)
}
```

Nice! But what about performance of this method? At first, I was afraid that using lambda function with closure will lead to poor performance (coming from `Go` with non-`LLVM` based compiler which has pretty limited power of inlining optimization). But, non-surprisingly for most of the developers, `LLVM` (and `Rust`) can optimize this method very nicely and `rustc` produce [very clean][rustc iter] binary with `-C opt-level=3 -C target-cpu=native` release profile flags:
Nice! But what about performance of this method? At first, I was afraid that using lambda function with closure will lead to poor performance (coming from `Go` with non-`LLVM` based compiler which has pretty limited power of inlining optimization). But, unsurprisingly for most of the developers, `LLVM` (and `Rust`) can optimize this method very nicely and `rustc` produce [very clean][rustc iter] binary with `-C opt-level=3 -C target-cpu=native` release profile flags:

[rustc iter]: https://godbolt.org/z/YrvjKfx1v

Expand Down Expand Up @@ -58,7 +58,7 @@ example::find:
ret
```

Can we improve the performance of method?
Can we improve the method's performance?

### Implementing `find` without early returns

Expand All @@ -78,7 +78,7 @@ pub fn find_branchless(haystack: &[u8], needle: u8) -> Option<usize> {
}
```

Unfortunately, this doesn't help -- there is still to `SIMD` instructions in the output assembler. But wait, we can notice drastic changes in the [output binary][rustc rev] -- now it seems like compiler unrolled our main loop and compare bytes in chunks of size 8:
Unfortunately, this doesn't help -- there are still to `SIMD` instructions in the output assembler. But wait, we can notice drastic changes in the [output binary][rustc rev] -- now it seems like compiler unrolled our main loop and compare bytes in chunks of size 8:

[rustc rev]: https://godbolt.org/z/5Eh5rfaW3

Expand Down Expand Up @@ -152,7 +152,7 @@ pub fn find(haystack: &[u8], needle: u8) -> Option<usize> {
}
```

Unfortunately, this doesn't work -- compiler again produces boring assembly with only unrolling optimization on. But, if we stop and think about it, this is actually expected! Chunking logic make every chunk unpredictable in size -- because there is no guarantees about exact size of the last chunk (and every chunk can be the last one!).
Unfortunately, this doesn't work -- the compiler again produces boring assembly with only unrolling optimization on. But, if we stop and think about it, this is actually expected! Chunking logic make every chunk unpredictable in size -- because there is no guarantees about exact size of the last chunk (and every chunk can be the last one!).

Luckily, `Rust` developer team thought about this and added method [`chunks_exact`][chunks_exact] specifically for such cases! This method split slice in equally sized chunks and provides access to the tail of potentially smaller size through additional method: `remainder`.

Expand All @@ -165,7 +165,7 @@ This final step allow us to make our dream come true: [vectorized `find` functio
```rust
// bonus: refactoring of find_branchless function to make it more elegant!
fn find_branchless(haystack: &[u8], needle: u8) -> Option<usize> {
return chunk.iter().enumerate()
return haystack.iter().enumerate()
.filter(|(_, &b)| b == needle)
.rfold(None, |_, (i, _)| Some(i))
}
Expand All @@ -181,7 +181,7 @@ fn find(haystack: &[u8], needle: u8) -> Option<usize> {

### Benchmarks

You can find full benchmark source code here: [./rust-find-bench](https://github.com/sivukhin/sivukhin.github.io/tree/master/rust-find-bench)
The full benchmark source code is available here: [./rust-find-bench](https://github.com/sivukhin/sivukhin.github.io/tree/master/rust-find-bench)

| method | time | speedup |
| :----- | ---: | --: |
Expand Down
18 changes: 9 additions & 9 deletions find-slice-element-position-in-rust.html
Original file line number Diff line number Diff line change
Expand Up @@ -17,18 +17,18 @@ <h2><a href="/">naming is hard</a></h2>
<div class="article">
<section id="Find-slice-element-position-in-Rust">
<h1 date="2024/01/13">Find slice element position in Rust</h1>
<p>I started to learn <code>Rust</code> only recently and while exploring <a href="https://doc.rust-lang.org/std/primitive.slice.html">slice methods</a> I was a bit surprised that I didn&rsquo;t find any method for finding position of element in the slice:</p>
<p>I started to learn <code>Rust</code> only recently and while exploring <a href="https://doc.rust-lang.org/std/primitive.slice.html">slice methods</a> I was a bit surprised that I didn&rsquo;t find any method for finding the position of element in the slice:</p>
<pre class="noline"><code><span class="keyword">fn</span> <span class="function">find</span>(<span class="identifier">haystack</span>: &[<span class="identifier">u8</span>], <span class="identifier">needle</span>: <span class="identifier">u8</span>) -> <span class="identifier">Option</span><<span class="identifier">usize</span>> { ... }</code>
</pre>
<p>I had some experience with <code>Zig</code> and it has pretty cool <a href="https://ziglang.org/documentation/master/std/#A;std:mem"><code>std.mem</code></a> module with many generic functions including <code>indexOf</code>, which internally implements <a href="https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore%E2%80%93Horspool_algorithm">Boyer-Moore-Horspool</a> pattern matching algorithm against generic element type <code>T</code>:</p>
<p>I had some experience with <code>Zig</code> which has a very useful <a href="https://ziglang.org/documentation/master/std/#A;std:mem"><code>std.mem</code></a> module with many generic functions including <code>indexOf</code>, which internally implements <a href="https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore%E2%80%93Horspool_algorithm">Boyer-Moore-Horspool</a> pattern matching algorithm against generic element type <code>T</code>:</p>
<pre class="noline"><code><span class="keyword">fn</span> <span class="function">indexOf</span>(<span class="keyword">comptime</span> <span class="identifier">T</span>: <span class="identifier">type</span>, <span class="identifier">haystack</span>: []<span class="keyword">const</span> <span class="identifier">T</span>, <span class="identifier">needle</span>: []<span class="keyword">const</span> <span class="identifier">T</span>) ?<span class="identifier">usize</span> { ... }</code>
</pre>
<p>After discussion with <code>Rust</code> experts I quickly got the response that I can just use methods of <code>Iterator</code> traits:</p>
<p>After discussing with <code>Rust</code> experts I quickly got the response that I can just use methods of <code>Iterator</code> traits:</p>
<pre><code><span class="keyword">fn</span> <span class="function">find</span>(<span class="identifier">haystack</span>: &[<span class="identifier">u8</span>], <span class="identifier">needle</span>: <span class="identifier">u8</span>) -> <span class="identifier">Option</span><<span class="identifier">usize</span>> {</code>
<code> <span class="identifier">haystack</span>.<span class="function">iter</span>().<span class="function">position</span>(|&<span class="identifier">x</span>| <span class="identifier">x</span> == <span class="identifier">needle</span>)</code>
<code>}</code>
</pre>
<p>Nice! But what about performance of this method? At first, I was afraid that using lambda function with closure will lead to poor performance (coming from <code>Go</code> with non-<code>LLVM</code> based compiler which has pretty limited power of inlining optimization). But, non-surprisingly for most of the developers, <code>LLVM</code> (and <code>Rust</code>) can optimize this method very nicely and <code>rustc</code> produce <a href="https://godbolt.org/z/YrvjKfx1v">very clean</a> binary with <code>-C opt-level=3 -C target-cpu=native</code> release profile flags:</p>
<p>Nice! But what about performance of this method? At first, I was afraid that using lambda function with closure will lead to poor performance (coming from <code>Go</code> with non-<code>LLVM</code> based compiler which has pretty limited power of inlining optimization). But, unsurprisingly for most of the developers, <code>LLVM</code> (and <code>Rust</code>) can optimize this method very nicely and <code>rustc</code> produce <a href="https://godbolt.org/z/YrvjKfx1v">very clean</a> binary with <code>-C opt-level=3 -C target-cpu=native</code> release profile flags:</p>
<pre><code><span class="comment"># input : rdi=haystack.ptr, rsi=haystack.size, rdx=needle</span></code>
<code><span class="comment"># output: rax=None/Some, rdx=Some(v)</span></code>
<code>example::find:</code>
Expand All @@ -53,7 +53,7 @@ <h1 date="2024/01/13">Find slice element position in Rust</h1>
<code> <span class="keyword">mov</span> eax, <span class="number">1</span></code>
<code> <span class="keyword">ret</span></code>
</pre>
<p>Can we improve the performance of method?</p>
<p>Can we improve the method&rsquo;s performance?</p>
</section>
<section id="Implementing-find-without-early-returns">
<h3>Implementing <code>find</code> without early returns</h3>
Expand All @@ -69,7 +69,7 @@ <h3>Implementing <code>find</code> without early returns</h3>
<code> <span class="identifier">position</span></code>
<code>}</code>
</pre>
<p>Unfortunately, this doesn&rsquo;t help &ndash; there is still to <code>SIMD</code> instructions in the output assembler. But wait, we can notice drastic changes in the <a href="https://godbolt.org/z/5Eh5rfaW3">output binary</a> &ndash; now it seems like compiler unrolled our main loop and compare bytes in chunks of size 8:</p>
<p>Unfortunately, this doesn&rsquo;t help &ndash; there are still to <code>SIMD</code> instructions in the output assembler. But wait, we can notice drastic changes in the <a href="https://godbolt.org/z/5Eh5rfaW3">output binary</a> &ndash; now it seems like compiler unrolled our main loop and compare bytes in chunks of size 8:</p>
<pre><code><span class="comment"># there is just a part of the assembler, you can find full output by the godbolt link</span></code>
<code>.LBB0_11:</code>
<code> <span class="keyword">cmp</span> byte ptr [r8 + r11 - <span class="number">1</span>], dl</code>
Expand Down Expand Up @@ -127,12 +127,12 @@ <h3>Vectorized version of <code>find</code></h3>
<code> .<span class="function">find_map</span>(|(<span class="identifier">i</span>, <span class="identifier">chunk</span>)| <span class="function">find_branchless</span>(<span class="identifier">chunk</span>, <span class="identifier">needle</span>).<span class="function">map</span>(|<span class="identifier">x</span>| <span class="number">32 </span>* <span class="identifier">i</span> + <span class="identifier">x</span>) )</code>
<code>}</code>
</pre>
<p>Unfortunately, this doesn&rsquo;t work &ndash; compiler again produces boring assembly with only unrolling optimization on. But, if we stop and think about it, this is actually expected! Chunking logic make every chunk unpredictable in size &ndash; because there is no guarantees about exact size of the last chunk (and every chunk can be the last one!).</p>
<p>Unfortunately, this doesn&rsquo;t work &ndash; the compiler again produces boring assembly with only unrolling optimization on. But, if we stop and think about it, this is actually expected! Chunking logic make every chunk unpredictable in size &ndash; because there is no guarantees about exact size of the last chunk (and every chunk can be the last one!).</p>
<p>Luckily, <code>Rust</code> developer team thought about this and added method <a href="https://doc.rust-lang.org/std/primitive.slice.html#method.chunks_exact"><code>chunks_exact</code></a> specifically for such cases! This method split slice in equally sized chunks and provides access to the tail of potentially smaller size through additional method: <code>remainder</code>.</p>
<p>This final step allow us to make our dream come true: <a href="https://godbolt.org/z/n3b7dbWoW">vectorized <code>find</code> function</a> with only safe <code>Rust</code>!</p>
<pre><code><span class="comment">// bonus: refactoring of find_branchless function to make it more elegant!</span></code>
<code><span class="keyword">fn</span> <span class="function">find_branchless</span>(<span class="identifier">haystack</span>: &[<span class="identifier">u8</span>], <span class="identifier">needle</span>: <span class="identifier">u8</span>) -> <span class="identifier">Option</span><<span class="identifier">usize</span>> {</code>
<code> <span class="keyword">return</span> <span class="identifier">chunk</span>.<span class="function">iter</span>().<span class="function">enumerate</span>()</code>
<code> <span class="keyword">return</span> <span class="identifier">haystack</span>.<span class="function">iter</span>().<span class="function">enumerate</span>()</code>
<code> .<span class="function">filter</span>(|(_, &<span class="identifier">b</span>)| <span class="identifier">b</span> == <span class="identifier">needle</span>)</code>
<code> .<span class="function">rfold</span>(<span class="identifier">None</span>, |_, (<span class="identifier">i</span>, _)| <span class="function">Some</span>(<span class="identifier">i</span>))</code>
<code>}</code>
Expand All @@ -148,7 +148,7 @@ <h3>Vectorized version of <code>find</code></h3>
</section>
<section id="Benchmarks">
<h3>Benchmarks</h3>
<p>You can find full benchmark source code here: <a href="https://github.com/sivukhin/sivukhin.github.io/tree/master/rust-find-bench">./rust-find-bench</a></p>
<p>The full benchmark source code is available here: <a href="https://github.com/sivukhin/sivukhin.github.io/tree/master/rust-find-bench">./rust-find-bench</a></p>
<table>
<tr>
<th style="text-align: left;">method</th>
Expand Down

0 comments on commit c8f872d

Please sign in to comment.