Skip to content

Faster reverse() string function for ASCII-only case #14195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 22, 2025

Conversation

UBarney
Copy link
Contributor

@UBarney UBarney commented Jan 19, 2025

Which issue does this PR close?

Closes #12445.

Rationale for this change

See the issue.

What changes are included in this PR?

  • If all characters in this string are within the ASCII range, reverse bytes directly
  • Move gen_string_array in benches/character_length.rs to helper.rs

Are these changes tested?

Yes. Using existing unit test

bench result

reverse_StringArray_ascii_str_len_8
                        time:   [93.762 µs 94.463 µs 95.301 µs]
                        change: [-38.347% -37.726% -37.143%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe

reverse_StringArray_utf8_density_0.8_str_len_8
                        time:   [491.88 µs 496.57 µs 501.74 µs]
                        change: [-1.0373% +0.1648% +1.4032%] (p = 0.79 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe

reverse_StringViewArray_ascii_str_len_8
                        time:   [87.226 µs 87.666 µs 88.147 µs]
                        change: [-34.737% -34.039% -33.350%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

reverse_StringViewArray_utf8_density_0.8_str_len_8
                        time:   [502.54 µs 505.02 µs 507.66 µs]
                        change: [-0.5734% +0.3619% +1.3269%] (p = 0.49 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

reverse_StringArray_ascii_str_len_32
                        time:   [98.021 µs 98.794 µs 99.785 µs]
                        change: [-70.309% -70.056% -69.817%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe

reverse_StringArray_utf8_density_0.8_str_len_32
                        time:   [1.6026 ms 1.6100 ms 1.6181 ms]
                        change: [-0.0304% +1.0048% +2.0155%] (p = 0.07 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

reverse_StringViewArray_ascii_str_len_32
                        time:   [102.37 µs 102.89 µs 103.44 µs]
                        change: [-68.935% -68.690% -68.454%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

reverse_StringViewArray_utf8_density_0.8_str_len_32
                        time:   [1.6199 ms 1.6269 ms 1.6345 ms]
                        change: [+0.1269% +1.4123% +3.0235%] (p = 0.04 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

reverse_StringArray_ascii_str_len_128
                        time:   [166.01 µs 167.09 µs 168.33 µs]
                        change: [-84.421% -84.299% -84.176%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

reverse_StringArray_utf8_density_0.8_str_len_128
                        time:   [6.1278 ms 6.1569 ms 6.1883 ms]
                        change: [-1.0026% -0.2737% +0.5242%] (p = 0.48 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild

reverse_StringViewArray_ascii_str_len_128
                        time:   [173.15 µs 174.16 µs 175.26 µs]
                        change: [-83.918% -83.731% -83.543%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe

reverse_StringViewArray_utf8_density_0.8_str_len_128
                        time:   [6.1279 ms 6.1637 ms 6.2028 ms]
                        change: [-1.3879% -0.5410% +0.3269%] (p = 0.22 > 0.05)
                        No change in performance detected.
Found 9 outliers among 100 measurements (9.00%)
  7 (7.00%) high mild
  2 (2.00%) high severe

reverse_StringArray_ascii_str_len_4096
                        time:   [15.579 ms 15.687 ms 15.800 ms]
                        change: [-63.915% -63.620% -63.301%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

reverse_StringArray_utf8_density_0.8_str_len_4096
                        time:   [215.98 ms 217.47 ms 219.18 ms]
                        change: [-0.0524% +0.8008% +1.6803%] (p = 0.07 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

reverse_StringViewArray_ascii_str_len_4096
                        time:   [8.6421 ms 8.8760 ms 9.1174 ms]
                        change: [-75.462% -74.835% -74.087%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

reverse_StringViewArray_utf8_density_0.8_str_len_4096
                        time:   [217.17 ms 218.29 ms 219.58 ms]
                        change: [+0.9451% +1.5937% +2.2792%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

Are there any user-facing changes?

No

@github-actions github-actions bot added the functions Changes to functions implementation label Jan 19, 2025
@UBarney UBarney force-pushed the ascii_fast_path_for_reverse branch from 1010906 to 7e91bc8 Compare January 19, 2025 07:18
@UBarney UBarney marked this pull request as ready for review January 19, 2025 08:18
// SAFETY: Since the original string was ASCII, reversing the bytes still results in valid UTF-8.
let reversed = unsafe {
// reverse bytes directly since ASCII characters are single bytes
String::from_utf8_unchecked(s.bytes().rev().collect::<Vec<u8>>())
Copy link
Contributor

@Dandandan Dandandan Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could avoid a allocation here by copying into a Vec<u8> buffer

// slightly faster than using iterator `bytes`
bytes_buf.extend(s.as_bytes())
bytes_buf.reverse()
...
bytes_buf.clear()

Copy link
Contributor Author

@UBarney UBarney Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got an improvement of ~30% at most after add bytes_buf

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@UBarney UBarney force-pushed the ascii_fast_path_for_reverse branch from 028333c to e8b44b5 Compare January 21, 2025 14:24
@UBarney UBarney requested a review from Dandandan January 21, 2025 14:47
@Dandandan Dandandan closed this Jan 21, 2025
@Dandandan Dandandan reopened this Jan 21, 2025
@alamb
Copy link
Contributor

alamb commented Jan 22, 2025

change: [-83.918% -83.731% -83.543%] (p = 0.00 < 0.05)

That is some sweet sweet performance

@alamb alamb merged commit 2ac20e3 into apache:main Jan 22, 2025
49 checks passed
@alamb
Copy link
Contributor

alamb commented Jan 22, 2025

Thanks @UBarney and @Dandandan

@UBarney UBarney deleted the ascii_fast_path_for_reverse branch January 23, 2025 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
functions Changes to functions implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize reverse() string function with ASCII fast path
3 participants