Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use divide and conquer in to_radix_digits #316

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

HKalbasi
Copy link

This implements the algorithm mentioned in #315

Benchmark:

// prev
running 4 tests
test to_str_radix_10      ... bench:       4,242.41 ns/iter (+/- 712.43)
test to_str_radix_10_2    ... bench:      82,360.73 ns/iter (+/- 13,304.91)
test to_str_radix_10_3    ... bench:   3,929,829.90 ns/iter (+/- 514,647.85)
test to_str_radix_10_4    ... bench: 243,146,081.10 ns/iter (+/- 44,213,689.70)
// now
running 4 tests
test to_str_radix_10      ... bench:       4,261.38 ns/iter (+/- 522.98)
test to_str_radix_10_2    ... bench:      83,358.54 ns/iter (+/- 11,170.02)
test to_str_radix_10_3    ... bench:   2,623,301.20 ns/iter (+/- 279,505.89)
test to_str_radix_10_4    ... bench: 195,073,240.50 ns/iter (+/- 17,025,687.08)

Currently both grow with O(n^2), to make things algorithmically faster we need a faster multiplication and division algorithm.

@@ -701,34 +701,48 @@ pub(super) fn to_radix_digits_le(u: &BigUint, radix: u32) -> Vec<u8> {
// The threshold for this was chosen by anecdotal performance measurements to
// approximate where this starts to make a noticeable difference.
if digits.data.len() >= 64 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you re-evaluate this threshold at all? Notably, it's different than the one you used in to_radix_digits_le_divide_and_conquer. Maybe that does make sense since the inner part doesn't have to pay for creating big_bases, but I'm not sure.

Copy link
Author

@HKalbasi HKalbasi Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are new results relevant for the threshold:

simple:
test 1009 bit      ... bench:       4,169.26 ns/iter (+/- 470.97)
test 2009 bit    ... bench:      14,735.97 ns/iter (+/- 1,819.63)
test 3009 bit    ... bench:      32,522.20 ns/iter (+/- 2,949.82)
test 4009 bit    ... bench:      56,441.64 ns/iter (+/- 6,354.65)
divide and conquer:
test 1009 bit       ... bench:       5,955.14 ns/iter (+/- 859.07)
test 2009 bit       ... bench:      12,731.82 ns/iter (+/- 1,780.59)
test 3009 bit       ... bench:      18,701.03 ns/iter (+/- 2,284.40)
test 4009 bit       ... bench:      27,605.41 ns/iter (+/- 5,229.87)

So probably 2000/64 ~ 32 make sense as new threshold?

since the inner part doesn't have to pay for creating big_bases, but I'm not sure

If I understand correctly, the main difference in small numbers is that the recursive algorithm loses const propagation for 10. If it wasn't the case, I'd expect some threshold near 8.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly did you change for your new results?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In benchmarks? I changed them to this:


#[bench]
fn to_str_radix_10(b: &mut Bencher) {
    to_str_radix_bench(b, 10, 1009);
}

#[bench]
fn to_str_radix_10_2(b: &mut Bencher) {
    to_str_radix_bench(b, 10, 2009);
}

#[bench]
fn to_str_radix_10_3(b: &mut Bencher) {
    to_str_radix_bench(b, 10, 3009);
}

#[bench]
fn to_str_radix_10_4(b: &mut Bencher) {
    to_str_radix_bench(b, 10, 4009);
}

And I changed if digits.data.len() >= 64 { to if digits.data.len() >= 1 { and if digits.data.len() >= 1000 {.

@HKalbasi
Copy link
Author

HKalbasi commented Dec 18, 2024

I would like to implement Burnikel Ziegler for fast division to make to_radix even faster. Would you rather have it in this PR or merge this alone?

@HKalbasi
Copy link
Author

HKalbasi commented Jan 2, 2025

I implemented the Burnikel Ziegler, and the result is no longer quadratic:

running 4 tests
test to_str_radix_10      ... bench:       4,447.95 ns/iter (+/- 504.38)
test to_str_radix_10_2    ... bench:      88,899.42 ns/iter (+/- 14,242.49)
test to_str_radix_10_3    ... bench:   1,968,791.90 ns/iter (+/- 231,474.81)
test to_str_radix_10_4    ... bench:  58,370,762.50 ns/iter (+/- 3,103,897.08)

Now the time complexity is O(n^log3). Further improvement needs faster multiplication algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants