-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
einar/pr/extended.jacobian.coordinates #1
base: taiko/unstable
Are you sure you want to change the base?
einar/pr/extended.jacobian.coordinates #1
Conversation
}); | ||
sum | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the plan to remove this after the review? This duplicates halo2_proofs https://github.com/privacy-scaling-explorations/halo2/blob/f3487575f00e627705fcb7778d7d1049eed79601/halo2_proofs/src/arithmetic.rs#L28-L174
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The EC implementation is correct with a small issue on inline usage that will lead to code explosion.
The multiexp implementation should be done by just modifying the buckets accumulation scheme in https://github.com/privacy-scaling-explorations/halo2/blob/f3487575f00e627705fcb7778d7d1049eed79601/halo2_proofs/src/arithmetic.rs#L67-L71
Also, there are 2 passes over the coeffs+base instead of 1 in this PR which should explain most of the slowness compared to theoretical (15%) and conservative estimate (10% speedup).
|
||
// Jacobian implementations | ||
|
||
// From affine to homogeneous |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it from affine to homogeneous or from affine to Jacobian?
This seems to contradict the comment above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I added the comment because I found it confusing too, since $name
refers to homogeneous coordinates and $name_affine
to affine coordinates. Alternatively, I can remove the first comment, since non-extended Jacobian coordinates should be gone anyway.
src/derive/curve.rs
Outdated
let zzz1 = p.zzz; | ||
|
||
// curve constants | ||
let a = Self::curve_constant_a(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a is always equal to 0 for pairing curves (those needed for Snarks) and it's also equal to 0 for secp256k1.
If Rust provides compile time evaluation, it would be extremely useful to specialize for a == 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the very least a comment is needed to point out an future easy optimization opportunity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a is always equal to 0 for pairing curves (those needed for Snarks) and it's also equal to 0 for secp256k1.
If Rust provides compile time evaluation, it would be extremely useful to specialize for a == 0
I have changed it to:
// curve constants
const A: $base = $name_jac_ext::curve_constant_a();
This should enforce compile time evaluation and hopefully imply that
let m = (x1_sqr.double()+x1_sqr) + A*zz1.square();
gets optimized to
let m = (x1_sqr.double()+x1_sqr)
for the curves where
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should use if A == 0 and if A != 0.
The multiplication and addition operations are pure assembly and will not be optimized away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make it that way, no problem, but I politely remain skeptical of the argument. When the value is known to be 0 at compile time, like in our case, the assembly will indeed be optimized away. I tested it with -opt-level=2
here https://godbolt.org/z/4Gr41o8Kr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can make it that way, no problem, but I politely remain skeptical of the argument. When the value is known to be 0 at compile time, like in our case, the assembly will indeed be optimized away. I tested it with
-opt-level=2
here https://godbolt.org/z/4Gr41o8Kr
I realise that *
is overloaded for field arithmetic, so my argument may or may not be applicable.
src/derive/curve.rs
Outdated
let w = u*v; | ||
let s = x1 * v; | ||
let x1_sqr = x1.square(); | ||
let m = (x1_sqr.double()+x1_sqr) + a*zz1.square(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with a = 0, a*zz1.square() can be skipped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried commenting out the last term and then ran cargo test
. All tests pass.
I put in a debug_assert_eq!(a, 0);
, so if we can catch future curves where a != 0
.
} | ||
|
||
/// <http://www.hyperelliptic.org/EFD/g1p/auto-shortw-xyzz.html#doubling-dbl-2008-s-1> | ||
#[inline] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[inline] is likely not worth it here. Especially because the underlying field operations are already inline so it will lead to code size explosion:
halo2curves/src/bn256/assembly.rs
Lines 11 to 12 in 2264867
#[inline] | |
pub fn double(&self) -> $field { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I will remove it from the add
as well then.
let num_threads = num_cpus::get(); | ||
if n > num_threads && n > 32 { | ||
let chunk = n / num_threads; | ||
let results: Vec<C::ExtendedJacobianCoordinates> = coeffs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the inner buckets result should be collected in Extended Jacobian.
Extended Jacobian mixed addition is faster than projective, but for plain addition, projective is faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the signature of best_multiexp should stay with projective coordinates
benches/multiexp.rs
Outdated
}) | ||
.sample_size(30); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To measure perf of this PR without multithreading interference, a serial benchmark is needed as well.
} | ||
} | ||
|
||
pub(crate) fn multiexp_serial<C: CurveJacExt>(coeffs: &[C::Scalar], bases: &[C]) -> C::ExtendedJacobianCoordinates { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly, the signature of multiexp_serial should stay homogeneous projective
.enumerate() | ||
.rev() | ||
.map(|(i, bucket)| { | ||
for (coeff, base) in coeffs.iter().zip(bases.iter()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The map and for are inefficient, you loop twice over the data.
You should loop over the coeff+base and directly mutate/add_assign in the matching bucket
} | ||
bucket | ||
}) | ||
.fold(C::jac_ext_identity(), |mut sum, bucket| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The final reduction should use projective coordinates (and explicitly convert the bucket sum to projective)
Note on our discussion on constant_a, some more changes upstream privacy-scaling-explorations#82. |
…_curve()` is implemented
d52078b
to
747c6e2
Compare
2239b6a
to
81a0782
Compare
This resolves taikoxyz/zkevm-circuits#121
To measure the performance impact: