Fast isSquare / Legendre symbol / Jacobi symbol - 16.8x acceleration #95

mratsim · 2023-10-13T15:19:29Z

Upstreaming on behalf of @AlekseiVambol - original: taikoxyz#3

This implements optimized isSquare/legendre symbol as suggested in #73 (comment).

Performance

The performance improvement is an incredible 16.8x acceleration over exponentiation by (p-1)/2.

API direction

At the moment the procedure is named jacobi.

PR #77 introduced the Legendre trait.

AFAIK it was because exponentiation by (p-1)/2 needed to abstract away the usage of curve-specific modulus constants.
In that case, the trait can be deleted, it should be recent enough that breakage is minimal if any (cc @huitseeker).
Alternatively, the legendre procedure can call jacobi instead and the Legendre trait can be preserved. Though I'm curious what's the use-case of a trait instead of a function in that case.

Implementation

After extensive testing between Bernstein-Yang and Pornin's algorithms,m it turns out that:

Bernstein-Yang's is faster for modular inverse, hence we keep Fast modular inverse - 9.4x acceleration #83
Pornin's is faster for Legendre, hence this PR uses the algorithm.

References:

mratsim · 2023-10-13T20:14:15Z

Actually speedup may not be 16.8x but only 10x, quote from original PR

The Legendre symbol is computed by the added method 10.1 times faster than by the previously existing one based on the Euler's criterion.

Note: According to the test bench in /benches/bn256_field.rs, the aforesaid performance advantage is more than 20 times, which is incorrect. The reason for this behavior of the test bench has not been found yet.

huitseeker · 2023-10-18T19:05:03Z

PR #77 introduced the Legendre trait.

AFAIK it was because exponentiation by (p-1)/2 needed to abstract away the usage of curve-specific modulus constants.
In that case, the trait can be deleted, it should be recent enough that breakage is minimal if any (cc @huitseeker).

To answer this part specifically: there should be no breakage, since PR #77 was not part of any released version of halo2curves (0.4.0 at the latest).

Though that should not be taken as a sign of support for a slow release schedule on my part!

* Bernstein yang modular multiplicative inverter (#2) * rename similar to #95 --------- Co-authored-by: Aleksei Vambol <[email protected]>

han0110

LGTM! Thanks again for improving the library.

Although I still don't fully understand the reason of i being 30, but I don't want to block the PR. Other parts could be referenced from Computing the Jacobi symbol using Bernstein-Yang and Optimized Binary GCD for Modular Inversion.

Additional small notes for perhaps other reviewers:

(b ^ b >> 1) & 2 == 2 is equal to b % 8 == 3 || b % 8 == 5 when b is odd.
This python version might be the most unoptmized one to understand the algorithm https://gist.github.com/robot-dreams/ceb00162b80384f2ae1913aaa2b35e75 (also quoted in Perf: isSquare - constant-time Jacobi/Kronecker/Legendre symbol using fast GCD mratsim/constantine#199)

AlekseiVambol · 2023-10-19T10:03:20Z

LGTM! Thanks again for improving the library.

Although I still don't fully understand the reason of i being 30, but I don't want to block the PR. Other parts could be referenced from Computing the Jacobi symbol using Bernstein-Yang and Optimized Binary GCD for Modular Inversion.

Additional small notes for perhaps other reviewers:
1. `(b ^ b >> 1) & 2 == 2` is equal to `b % 8 == 3 || b % 8 == 5` when `b` is odd.

2. This python version might be the most unoptmized one to understand the algorithm https://gist.github.com/robot-dreams/ceb00162b80384f2ae1913aaa2b35e75 (also quoted in [Perf: `isSquare` - constant-time Jacobi/Kronecker/Legendre symbol using fast GCD mratsim/constantine#199](https://github.com/mratsim/constantine/issues/199))

Thank you for the review!

The python code you mentioned is not related to my modification of the Pornin's method. It seems to be the Wuille’s Bernstein-Yang-based method of Jacobi symbol computation. It is somewhat slower than the proposed method and lacks the proof of convergence. More information about this method can be found here: https://github.com/bitcoin-core/secp256k1/blob/master/doc/safegcd_implementation.md (section 8).

Regarding the number of the inner loop's iterations, which is 30 instead of 31:

The initial Pornin's method for modular inversion performs 31 iterations of the inner loop and uses "approximations" of the variables, which contain the arguments at the beginning of the algorithm's execution. These "approximations" are defined in the following way. Let n be the bit length of the largest argument without leading zeros. For n > 64 the "approximation" of the argument, which equals v, is (v div 2 ^ (n - 33)) * 2 ^ 33 + (v mod 2 ^ 31), i.e. it retains the 33 high and 31 low bits of the n-bit representation of v. If n does not exceed 64, an argument and its "approximation" are equal.

It is not hard to see, that before each of the 31 iterations of the inner loop the least significant bit of each "approximation" is the same as the least significant bit of the value that the corresponding variable would have had if it had been updated in the inner loop along with the "approximations". Let us call this value the "potential precise value". Before the first iteration of the inner loop an "approximation" and the corresponding "potential precise value" are equal modulo 2^31, before the second iteration they are equal at least modulo 2^30 and so on. the This allows the initial Pornin's method to make the correct decisions, which are related to parity checks.

However, we need an "approximation" to be equal to the corresponding "potential precise value" at least modulo 8 before each iteration to use the properties of the modified Jacobi symbol (x / |y|). The naive approach lies in performing just 29 iterations in the inner loop. However, I decided to change the "approximation" method in the terms of the used formula: here it is (v div 2 ^ (n - 32)) * 2 ^ 32 + (v mod 2 ^ 32), i.e. it retains the 32 high and 32 low bits of the n-bit representation of v. This modification allows to perform 30 iterations instead of 29. The admissibility of this modification I have proven using the appropriately modified Pornin's theorems.

PS I think I have to make a part of comments more accurate: replace

"corresponding "precise" variables' values, which would be computed, if the "precise" variables were used in the inner loop instead of the "approximation" ones."

with

"corresponding "precise" variables' values, which would have been computed, if the "precise" variables had been updated in the inner loop along with the "approximations"."

Should I do this after merging or now?

PPS Among the several improvements the most practically valuable one lies in using the short-arithmetic binary Euclidean algorithm to finish the computation, when the modified Pornin method’s "approximations" become absolutely precise.

han0110 · 2023-10-19T10:43:08Z

The python code you mentioned is not related to my modification of the Pornin's method. It seems to be the Wuille’s Bernstein-Yang-based method of Jacobi symbol computation. It is somewhat slower than the proposed method and lacks the proof of convergence. More information about this method can be found here: https://github.com/bitcoin-core/secp256k1/blob/master/doc/safegcd_implementation.md (section 8).

Yeah I mean to understand Computing the Jacobi symbol using Bernstein-Yang the script might help, and from it to your optmized implementation is easier for myself, since I couldn't find other implementation taking the same strategy.

Before the first iteration of the inner loop an "approximation" and the corresponding "potential precise value" are equal modulo 2^31, before the second iteration they are equal at least modulo 2^30 and so on. the This allows the initial Pornin's method to make the correct decisions, which are related to parity checks. However, we need an "approximation" to be equal to the corresponding "potential precise value" at least modulo 8 before each iteration to use the properties of the modified Jacobi symbol (x / |y|).

I see it now, thanks for clarifying, I think this is what I was missing.

Should I do this after merging or now?

Feel free to do so, besides the PR needs a rebase.

…s#83) * Bernstein yang modular multiplicative inverter (#2) * rename similar to privacy-scaling-explorations#95 --------- Co-authored-by: Aleksei Vambol <[email protected]>

CPerezz

LGTM!

Thanks so much for this work!

* Add field conversion to/from `[u64;4]` (privacy-scaling-explorations#80) * feat: add field conversion to/from `[u64;4]` * Added conversion tests * Added `montgomery_reduce_short` for no-asm * For bn256, uses assembly conversion when asm feature is on * fix: remove conflict for asm * chore: bump rust-toolchain to 1.67.0 * Compute Legendre symbol for `hash_to_curve` (privacy-scaling-explorations#77) * Add `Legendre` trait and macro - Add Legendre macro with norm and legendre symbol computation - Add macro for automatic implementation in prime fields * Add legendre macro call for prime fields * Remove unused imports * Remove leftover * Add `is_quadratic_non_residue` for hash_to_curve * Add `legendre` function * Compute modulus separately * Substitute division for shift * Update modulus computation * Add quadratic residue check func * Add quadratic residue tests * Add hash_to_curve bench * Implement Legendre trait for all curves * Move misplaced comment * Add all curves to hash bench * fix: add suggestion for legendre_exp * fix: imports after rebase * Add simplified SWU method (privacy-scaling-explorations#81) * Fix broken link * Add simple SWU algorithm * Add simplified SWU hash_to_curve for secp256r1 * add: sswu z reference * update MAP_ID identifier Co-authored-by: Han <[email protected]> --------- Co-authored-by: Han <[email protected]> * Bring back curve algorithms for `a = 0` (privacy-scaling-explorations#82) * refactor: bring back curve algorithms for `a = 0` * fix: clippy warning * fix: Improve serialization for prime fields (privacy-scaling-explorations#85) * fix: Improve serialization for prime fields Summary: 256-bit field serialization is currently 4x u64, ie. the native format. This implements the standard of byte-serialization (corresponding to the PrimeField::{to,from}_repr), and an hex-encoded variant of that for (de)serializers that are human-readable (concretely, json). - Added a new macro `serialize_deserialize_32_byte_primefield!` for custom serialization and deserialization of 32-byte prime field in different struct (Fq, Fp, Fr) across the secp256r, bn256, and derive libraries. - Implemented the new macro for serialization and deserialization in various structs, replacing the previous `serde::{Deserialize, Serialize}` direct use. - Enhanced error checking in the custom serialization methods to ensure valid field elements. - Updated the test function in the tests/field.rs file to include JSON serialization and deserialization tests for object integrity checking. * fixup! fix: Improve serialization for prime fields --------- Co-authored-by: Carlos Pérez <[email protected]> * refactor: (De)Serialization of points using `GroupEncoding` (privacy-scaling-explorations#88) * refactor: implement (De)Serialization of points using the `GroupEncoding` trait - Updated curve point (de)serialization logic from the internal representation to the representation offered by the implementation of the `GroupEncoding` trait. * fix: add explicit json serde tests * Insert MSM and FFT code and their benchmarks. (privacy-scaling-explorations#86) * Insert MSM and FFT code and their benchmarks. Resolves taikoxyz/zkevm-circuits#150. * feedback * Add instructions * feeback * Implement feedback: Actually supply the correct arguments to `best_multiexp`. Split into `singlecore` and `multicore` benchmarks so Criterion's result caching and comparison over multiple runs makes sense. Rewrite point and scalar generation. * Use slicing and parallelism to to decrease running time. Laptop measurements: k=22: 109 sec k=16: 1 sec * Refactor msm * Refactor fft * Update module comments * Fix formatting * Implement suggestion for fixing CI * Re-export also mod `pairing` and remove flag `reexport` to alwasy re-export (privacy-scaling-explorations#93) fix: re-export also mod `pairing` and remove flag `reexport` to alwasy re-export * fix regression in privacy-scaling-explorations#93 reexport field benches aren't run (privacy-scaling-explorations#94) fix regression in privacy-scaling-explorations#93, field benches aren't run * Fast modular inverse - 9.4x acceleration (privacy-scaling-explorations#83) * Bernstein yang modular multiplicative inverter (#2) * rename similar to privacy-scaling-explorations#95 --------- Co-authored-by: Aleksei Vambol <[email protected]> * Fast isSquare / Legendre symbol / Jacobi symbol - 16.8x acceleration (privacy-scaling-explorations#95) * Derivatives of the Pornin's method (taikoxyz#3) * renaming file * make cargo fmt happy * clarifications from privacy-scaling-explorations#95 (comment) [skip ci] * Formatting and slightly changing a comment --------- Co-authored-by: Aleksei Vambol <[email protected]> * chore: delete bernsteinyang module (replaced by ff_inverse) * Bump version to 0.4.1 --------- Co-authored-by: David Nevado <[email protected]> Co-authored-by: Han <[email protected]> Co-authored-by: François Garillot <[email protected]> Co-authored-by: Carlos Pérez <[email protected]> Co-authored-by: einar-taiko <[email protected]> Co-authored-by: Mamy Ratsimbazafy <[email protected]> Co-authored-by: Aleksei Vambol <[email protected]>

…s#83) * Bernstein yang modular multiplicative inverter (#2) * rename similar to privacy-scaling-explorations#95 --------- Co-authored-by: Aleksei Vambol <[email protected]>

…rivacy-scaling-explorations#95) * Derivatives of the Pornin's method (taikoxyz#3) * renaming file * make cargo fmt happy * clarifications from privacy-scaling-explorations#95 (comment) [skip ci] * Formatting and slightly changing a comment --------- Co-authored-by: Aleksei Vambol <[email protected]>

mratsim added a commit to taikoxyz/halo2curves that referenced this pull request Oct 13, 2023

rename similar to privacy-scaling-explorations#95

98cf974

mratsim mentioned this pull request Oct 13, 2023

Fast modular inverse - 9.4x acceleration #83

Merged

han0110 self-requested a review October 16, 2023 08:18

CPerezz requested review from CPerezz and kilic October 17, 2023 09:39

github-merge-queue bot pushed a commit that referenced this pull request Oct 19, 2023

Fast modular inverse - 9.4x acceleration (#83)

a835096

* Bernstein yang modular multiplicative inverter (#2) * rename similar to #95 --------- Co-authored-by: Aleksei Vambol <[email protected]>

han0110 approved these changes Oct 19, 2023

View reviewed changes

AlekseiVambol and others added 3 commits October 19, 2023 13:38

Derivatives of the Pornin's method (#3)

8000162

renaming file

830cd74

make cargo fmt happy

33b23d4

mratsim force-pushed the pse-pr-legendre branch from 70f16ce to 33b23d4 Compare October 19, 2023 11:39

mratsim and others added 2 commits October 19, 2023 13:42

clarifications from privacy-scaling-explorations#95 (comment) [skip ci]

d479820

Formatting and slightly changing a comment

1cbc80a

huitseeker mentioned this pull request Oct 23, 2023

New crate release #96

Closed

CPerezz approved these changes Oct 31, 2023

View reviewed changes

CPerezz added this pull request to the merge queue Oct 31, 2023

Merged via the queue into privacy-scaling-explorations:main with commit 81a0782 Oct 31, 2023
7 checks passed

mratsim deleted the pse-pr-legendre branch November 2, 2023 16:24

mratsim mentioned this pull request Nov 17, 2023

Legendre trait now uses accelerated Jacobi #102

Merged

CPerezz mentioned this pull request Dec 18, 2023

chore: Bump to 0.5.0 for release #114

Merged

mratsim mentioned this pull request Jul 17, 2024

feat(taiko impl): Add Taiko Rust implementation of Jacobi symbol jonas-lj/jacobi-benchmarks#3

Open

Velaciela mentioned this pull request Aug 14, 2024

[sync] upstream inverse acceleration scroll-tech/halo2curves#10

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fast isSquare / Legendre symbol / Jacobi symbol - 16.8x acceleration #95

Fast isSquare / Legendre symbol / Jacobi symbol - 16.8x acceleration #95

mratsim commented Oct 13, 2023 •

edited

Loading

mratsim commented Oct 13, 2023

huitseeker commented Oct 18, 2023 •

edited

Loading

han0110 left a comment

AlekseiVambol commented Oct 19, 2023 •

edited

Loading

han0110 commented Oct 19, 2023

CPerezz left a comment

Fast isSquare / Legendre symbol / Jacobi symbol - 16.8x acceleration #95

Fast isSquare / Legendre symbol / Jacobi symbol - 16.8x acceleration #95

Conversation

mratsim commented Oct 13, 2023 • edited Loading

Performance

API direction

Implementation

mratsim commented Oct 13, 2023

huitseeker commented Oct 18, 2023 • edited Loading

han0110 left a comment

Choose a reason for hiding this comment

AlekseiVambol commented Oct 19, 2023 • edited Loading

han0110 commented Oct 19, 2023

CPerezz left a comment

Choose a reason for hiding this comment

mratsim commented Oct 13, 2023 •

edited

Loading

huitseeker commented Oct 18, 2023 •

edited

Loading

AlekseiVambol commented Oct 19, 2023 •

edited

Loading