Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[crypto] Trial division for RSA key generation. #21086

Merged
merged 2 commits into from
Feb 24, 2025

Conversation

jadephilipoom
Copy link
Contributor

Filter out 62.1% of composite numbers by using arithmetic tricks specific to certain small primes. This should speed up RSA key generation by something close to 60% in expectation, because primality testing is the vast majority of the runtime and this lets us skip it in 60% of cases where it will eventually fail.

@jadephilipoom jadephilipoom requested a review from moidx January 29, 2024 11:07
Copy link
Contributor

@rswarbrick rswarbrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just had a really enjoyable few minutes reading through some of this code! (I must say, I hadn't thought of using a ones-register to do masking like this: it's neat!)

A couple of nitty comments on the bits that I'd read.

@@ -1003,8 +1004,7 @@ check_p:
/* Get the FG0.Z flag into a register.
x2 <= (CSRs[FG0] >> 3) & 1 = FG0.Z */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might need tweaking to match the "& 8" that's now in the code?

Looking at the rest of the file, it looks like there are several other places where we could make the corresponding change. Maybe that would be worth doing uniformly as a separate commit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe so, yes. I think I was looking to save code size here.


/* Add the lower 16 bits of the sum to the highest 3 bits to get a 17-bit
result.
w22 <= w22 + (w23 >> 32) */
bn.add w22, w22, w23 >> 32

/* The sum from the previous addition is < 2 * F4, so a modular addition with
zero is sufficient to fully reduce.
/* The sum from the previous addition is <= 2^16 - 1 + 2^3 - 1 < 2 * F4, so a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "at most" instead of "<=", since those symbols are also being used for assignment?

* 35-bit result and then fold the number a few more times to get a 9-bit
* result.
*
* Testing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Testing "??

(I think the rest of the paragraph appears in the next commit?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what was going on there! Anyway, I double-checked and it does seem like it's filled in on the next commit. The full paragraph is:

 * Testing for these primes will catch approximately:
 *   1 - ((1 - 1/3) * (1 - 1/5) * ... * (1 - 1/31))
 *   = 62.1% of composite numbers.

Maybe I just hadn't done the calculation yet 🙂

@jadephilipoom
Copy link
Contributor Author

jadephilipoom commented Feb 21, 2025

I was searching for my own open PRs recently and noticed this one was never merged! Looks like it was approved the day before I changed employers and lost my open tabs so I think I know how that happened 😉 Luckily, the code hasn't changed much in the meantime, so I've rebased it and unless there are objections I'll merge once it passes CI.

These small primes have the convenient property that 2^8 mod p = 1,
which significantly speeds up the check and allows it to share code with
relprime_f4.

Signed-off-by: Jade Philipoom <[email protected]>
@jadephilipoom jadephilipoom force-pushed the trial-div branch 4 times, most recently from 4599b4d to 689ba1e Compare February 21, 2025 10:12
All of these small primes have the nice property that 2^32 mod p = 4.

Signed-off-by: Jade Philipoom <[email protected]>
@jadephilipoom
Copy link
Contributor Author

jadephilipoom commented Feb 21, 2025

Since we now have nicer profiling targets than we did when I first made this PR, I also ran some quick benchmarks. I modified the RSA-2048 keygen test to generate keys 10 times and continue recording the OTBN instruction count (a good proxy for the cycle count, and more accurate on FPGA tests given RSA keygen is too slow for Verilator). The results were:

>>> master = [728920834,626858263,844018075,212064556,1295722633,1265332333,1395617782,798422326,1964599999,225096775]
>>> trial_div = [518505838,279511222,642358674,344699495,1109487354,1467981868,223009163,527194053,118732085,535880793]
>>> avg(master)
935665357.6
>>> avg(trial_div)
576736054.5
>>> avg(master) / avg(trial_div)
1.6223458726040163

In short, I observed a 62% speedup, a nice confirmation of the 61% estimate. I think I just got lucky with how exact it is; the standard deviation on these measurements is large because RSA timing is so variable and I only had the patience for 10 samples. But still -- at least it confirms that we see the significant average speedups we expect.

@jadephilipoom jadephilipoom merged commit 3a5cca7 into lowRISC:master Feb 24, 2025
42 checks passed
@jadephilipoom jadephilipoom deleted the trial-div branch February 24, 2025 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants