-
Notifications
You must be signed in to change notification settings - Fork 814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[crypto] Trial division for RSA key generation. #21086
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just had a really enjoyable few minutes reading through some of this code! (I must say, I hadn't thought of using a ones-register to do masking like this: it's neat!)
A couple of nitty comments on the bits that I'd read.
sw/otbn/crypto/rsa_keygen.s
Outdated
@@ -1003,8 +1004,7 @@ check_p: | |||
/* Get the FG0.Z flag into a register. | |||
x2 <= (CSRs[FG0] >> 3) & 1 = FG0.Z */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might need tweaking to match the "& 8" that's now in the code?
Looking at the rest of the file, it looks like there are several other places where we could make the corresponding change. Maybe that would be worth doing uniformly as a separate commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe so, yes. I think I was looking to save code size here.
sw/otbn/crypto/rsa_keygen.s
Outdated
|
||
/* Add the lower 16 bits of the sum to the highest 3 bits to get a 17-bit | ||
result. | ||
w22 <= w22 + (w23 >> 32) */ | ||
bn.add w22, w22, w23 >> 32 | ||
|
||
/* The sum from the previous addition is < 2 * F4, so a modular addition with | ||
zero is sufficient to fully reduce. | ||
/* The sum from the previous addition is <= 2^16 - 1 + 2^3 - 1 < 2 * F4, so a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "at most" instead of "<=", since those symbols are also being used for assignment?
sw/otbn/crypto/rsa_keygen.s
Outdated
* 35-bit result and then fold the number a few more times to get a 9-bit | ||
* result. | ||
* | ||
* Testing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Testing "??
(I think the rest of the paragraph appears in the next commit?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what was going on there! Anyway, I double-checked and it does seem like it's filled in on the next commit. The full paragraph is:
* Testing for these primes will catch approximately:
* 1 - ((1 - 1/3) * (1 - 1/5) * ... * (1 - 1/31))
* = 62.1% of composite numbers.
Maybe I just hadn't done the calculation yet 🙂
50891e0
to
5d80df5
Compare
I was searching for my own open PRs recently and noticed this one was never merged! Looks like it was approved the day before I changed employers and lost my open tabs so I think I know how that happened 😉 Luckily, the code hasn't changed much in the meantime, so I've rebased it and unless there are objections I'll merge once it passes CI. |
These small primes have the convenient property that 2^8 mod p = 1, which significantly speeds up the check and allows it to share code with relprime_f4. Signed-off-by: Jade Philipoom <[email protected]>
4599b4d
to
689ba1e
Compare
All of these small primes have the nice property that 2^32 mod p = 4. Signed-off-by: Jade Philipoom <[email protected]>
689ba1e
to
a2e535a
Compare
Since we now have nicer profiling targets than we did when I first made this PR, I also ran some quick benchmarks. I modified the RSA-2048 keygen test to generate keys 10 times and continue recording the OTBN instruction count (a good proxy for the cycle count, and more accurate on FPGA tests given RSA keygen is too slow for Verilator). The results were:
In short, I observed a 62% speedup, a nice confirmation of the 61% estimate. I think I just got lucky with how exact it is; the standard deviation on these measurements is large because RSA timing is so variable and I only had the patience for 10 samples. But still -- at least it confirms that we see the significant average speedups we expect. |
Filter out 62.1% of composite numbers by using arithmetic tricks specific to certain small primes. This should speed up RSA key generation by something close to 60% in expectation, because primality testing is the vast majority of the runtime and this lets us skip it in 60% of cases where it will eventually fail.