-
Notifications
You must be signed in to change notification settings - Fork 144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement CIOS for ARM F::mul #134
Conversation
Thanks for the PR! @sragss I think we should add the input range check in the (Normally, I use the What do you think? @CPerezz @davidnevadoc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really clean and easy to follow, thanks for the improvement!
LGTM 👍
Forgot to check the bit discrepancy for operands outside the normal range.
Fixed comments @davidnevadoc – Let me know what you think about the operands outside the normal range. We can add the reduction before, or can update the |
In regards to the outside of range operands, the approach I like is controlling the ways in which we create field elements and then assuming they are in the appropriate ranges in all operations. halo2curves/src/derive/field.rs Line 159 in 3c43d3c
The asm version on the other hand, was using I have modified the Let me know what you think and feel free to add the change if you like it. |
Agree with your approach – added those changes. All tests are passing now and we don't have the slowdown from adding the check to |
9fff22c
(t[N - 1], _) = adc(c_2, c, 0); | ||
} | ||
|
||
if bigint_geq(&t, &$modulus.0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the bigint_geq
procedure and its usage may be suboptimal. First of all, you can notice that
Then you can notice that you are actually computing the
(tmp, borrow) = t - m
t = borrow ? t : tmp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
I believe strategy would only save instructions in the case that bigint_geq == true
. It would cost some instructions in the case that bigint_geq == false
as sbb
gets broken out into a few instructions on ARM where as the 4 u64 LTs should be a single instruction each.
* impl CIOS * more details * add Fast CIOS for bn256 * rolled Fast CIOS * clean comment * geq for last line in bigint_geq * update comment to include WORD_SIZE * mod in montomgery * cargo fmt * cargo clippy --------- Co-authored-by: sragss <[email protected]>
* impl CIOS * more details * add Fast CIOS for bn256 * rolled Fast CIOS * clean comment * geq for last line in bigint_geq * update comment to include WORD_SIZE * mod in montomgery * cargo fmt * cargo clippy --------- Co-authored-by: sragss <[email protected]>
Implements CIOS for Montgomery 256-bit field multiplication. Specifically the fast variant (algorithm 2). These changes are particularly relevant on ARM where we do not have x86 / BMI2 / AVX512 and the associated assembly backend. There does not appear to be a NEON equivalent for MULX / ADOX / ADCX.
Accelerates 7-15%, roughly reaching parity with Arkworks on ARM. See sragss/speedy-fields to benchmark.
cargo run --release
is more consistent thancargo bench
due to matching elements.Currently
bn256::test::test_consistent_hash_to_curve
fails due to an attempt to multiply a number larger than both the montgomery radix and the field modulus. This results in a 1-bit difference between implementations. Specifically:I'm not sure if this was intended to be supported. Other libraries (such as Arkworks) do not handle numbers outside of the range of the modulus. Any elements created via
F::new()
undergo field multiplication (byR2
) where they're brought into the proper range.This can be handled by checking if the inputs are greater than the field modulus and subtracting in advance but there's some non-zero cost (2-5%) to performing the check.