-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement faster inversion #28
Comments
The current state-of-the-art is based on either Bernstein-Yang or Pornin's GCD. OverviewModular inverse can be computed either via Fermat's little theorem or a variant of Extended Euclid Algorithm to solve GCD (for example using binary shift/add/sub instead the canonical with quotients/division) Binary Extended Euclid Algorithm (bEEA)I'll use the algorithm by Niels Möller as the basis of my analysis (Niles is author of the GNU Nettle cryptographic library and a significant contributor to GMP, in particular the constant-time modular inverse). Algorithm 5 in
Constant-time algorithms make it easier to analyze speed because they always take the worst case time. As you can see, the worst case grows with iteration of Fermat's Little Theorem (FLT)Using Little Fermat's Theorem, we can use exponentiation by p-2 and rely on a highly optimized modular multiplication. Addition chains can reduce the number of multiplications but not of squaring, In particular BN254 has a quite high hamming weight for a cryptographic prime. Benching my old transition (2020) to addition chains, the speedup was only 30% (mratsim/constantine#80 ) The bottleneckBoth traditional EEA and FLT have a bottleneck, each operations are done on the full integer width (254 bits). The big innovation in Bernstein-Yang and Pornin is that instead of working on the full integer width, you only work on:
Then you propagate those transitions based on 62 or 31/31 on the full integer width (254 bits) once every 31 or 62 iterations. ReferencesFor implementation details and gotchas, see mratsim/constantine#172 (comment) Pornin's writeup: https://research.nccgroup.com/2020/09/28/faster-modular-inversion-and-legendre-symbol-and-an-x25519-speed-record/ Bernstein-Yang inversion:
Pornin's inversion:
BenchmarksBoth 255-bit Pornin's constant-time inversion from BLST and 255-bit BY's inversion from Constantine are in the 1100 to 1300ns on my machine: For BN254 specifically, Bernstein-Yang variable-time And Gnark's Pornin inversion (variable-time): Note that BLST uses Assembly and Constantine/Gnark do not. Legendre symbolBoth BY and Pornin's GCD method can be adapted to compute the Legendre symbol, see:
Legendre symbol is a bottleneck in Hashing to curve of BN254 (#47) |
Upcoming optimized msm method privacy-scaling-explorations/halo2#40 heavily uses base field inversion. Currently inversion is an exponentiation with
(p-2
). It can be optimized by replacing with one of theseThe text was updated successfully, but these errors were encountered: