Add inverse NTTs for Kyber & Dilithium #37

dop-amin · 2024-03-15T08:41:31Z

This PR introduces inverse NTTs for Kyber and Dilithium.
The type of transposition and reduction is supposed to match the code from PQClean [1,2].

TODO:

Add optimized Dilithium invNTTs
~~Simplify syntax as in NTTs: remove ldr/str macros that are no longer needed #36~~ (do this separately, we want to re-run every example for this)
~~Optional: Add intt_kyber_1234_567~~ (from experience, these variants perform not as well as the ones that merge more layers in the second merge)

[1] https://github.com/PQClean/PQClean/tree/8e221ae797b229858a0b0d784577a8cb149d5789/crypto_sign/dilithium3/aarch64
[2] https://github.com/PQClean/PQClean/tree/8e221ae797b229858a0b0d784577a8cb149d5789/crypto_kem/kyber768/aarch64

examples/naive/aarch64/intt_dilithium_1234_5678_manual_ld4.s

slothy/targets/aarch64/aarch64_neon.py

* Fix reductions

hanno-becker · 2024-04-02T19:47:46Z

@dop-amin I cancelled the CI which was spinning indefinitely on the example dry run.

hanno-becker · 2024-04-03T03:09:03Z

slothy/targets/aarch64/apple_m1_firestorm_experimental.py

@@ -120,7 +120,7 @@ def get_min_max_objective(slothy):
     vqdmulh_lane,
     vmull, vmlal,
     vsrshr, vushr, vusra, vshl,
-     vand, vbic): ExecutionUnit.V(),
+     vand, vbic, cmge): ExecutionUnit.V(),


Here and below you can now refer to cmge as ASimdCompare instead (assuming all compare instructions have similar performance characteristics). That makes the models more readable and simplifies the addition of further comparison instructions, because you only need to tweak the arch model.

Thanks, I modified the models. However, if we have something like is_qform_form_of, then passing the parent class won't work. Do we want to keep it this way or do we want the function returned by is_qform_form_of to also return True on the child classes?

@dop-amin Good point. I think is_qform_of should be able to handle parent classes, yes. I bet the way I did this in aarch64_neon.py is not terribly pythonic, but you could use all_subclass_leaves() here.

hanno-becker · 2024-04-05T07:57:44Z

@dop-amin Are you going to investigate the CI failure or do you need help?

hanno-becker · 2024-04-05T07:58:22Z

@dop-amin Is there [going to be] a sibling PR to PQAX as well adding tests for the inverse NTT?

dop-amin · 2024-04-05T10:36:48Z

@dop-amin Are you going to investigate the CI failure or do you need help?

Hi Hanno, I think the CI just times out because it takes too long to go through all the examples. Especially the ones using heuristics seem to take long because it involves so many individual calls to the solver. Do you have a suggestion on how to go about this? We could disable the CI for examples using heuristics.

dop-amin · 2024-04-05T10:37:41Z

@dop-amin Is there [going to be] a sibling PR to PQAX as well adding tests for the inverse NTT?

Yes, I've been planning to submit it for a couple of days but now I finally did so. Thanks for reminding me.

hanno-becker · 2024-04-05T11:00:09Z

Hi Hanno, I think the CI just times out because it takes too long to go through all the examples. Especially the ones using heuristics seem to take long because it involves so many individual calls to the solver. Do you have a suggestion on how to go about this? We could disable the CI for examples using heuristics.

I am surprised by this because the dry run sets functional_only=True, allow_renaming=False and allow_reordering=False if I remember correctly -- this should not take long. Can you double-check that your scripts in example.py do not overwrite this?

example.py

examples/naive/aarch64/intt_dilithium_1234_5678_manual_ld4.s

hanno-becker · 2024-04-07T04:30:20Z

examples/naive/aarch64/intt_dilithium_1234_5678_manual_ld4.s

+modulus_addr:   .quad 8380417
+ninv_addr:      .quad 16382
+ninv_tw_addr:   .quad 4197891
+intt_dilithium_1234_5678_manual_ld4:


Again an optional improvement, but better to do while we're at it:

Could you go through the clean [inv]NTTs and add a comment at their entry symbol describing in what way (reduction / ordering) they deviate from a standard, fully-reduced NTT?

That's a good point in general. I followed the way the current PQClean implementations handle the reductions sucht that we can have a fair comparison.

Regarding the ordering: All of our NTTs on aarch64 (should) output in the canonical ordering.

hanno-becker · 2024-04-07T05:47:58Z

examples/naive/aarch64/intt_dilithium_123_45678.s

+.endm
+
+.macro montg_reduce a
+        srshr tmp.4S,  \a\().4S, #23


Is this really a Montgomery reduction? It looks more like Barrett, leveraging somehow that q is very close to 2^23.

In more detail: It looks like the absolute value |a/q - a/2**23| is at most 2**31 * |1/q - q/r| ~ 0.25, so the rounding is at most off by one. This neatly uses that the 'buffer' from q to the word boundary 2^32 is has about the same bitlength than the approximation q~2^23 (since q=2^23-2^13+1).

To me, this appears to be a shortened version of reduce32 from the reference implementation. As mentioned above, I followed the current state in PQClean.
Anyhow, the naming is off.

Could you adjust the macro names?

hanno-becker · 2024-04-07T06:26:55Z

examples/naive/aarch64/intt_dilithium_123_45678.s

+        str_vo data6, in, (6*(1024/8))
+        str_vo data7, in, (7*(1024/8))        
+
+        mul_ninv data4, data5, data6, data7, data0, data1, data2, data3


It would be interesting to double-check if the mul_ninv and the canonical_reduce can be replaced by a refined Barrett multiplication the sense of https://eprint.iacr.org/2022/439.pdf.

hanno-becker

LGTM, @dop-amin -- thank you very much for this work.

dop-amin · 2024-04-11T07:51:51Z

LGTM, @dop-amin -- thank you very much for this work.

Great, thanks for your feedback in the process!

hanno-becker reviewed Mar 28, 2024

View reviewed changes

examples/naive/aarch64/intt_dilithium_1234_5678_manual_ld4.s Outdated Show resolved Hide resolved

hanno-becker reviewed Mar 28, 2024

View reviewed changes

slothy/targets/aarch64/aarch64_neon.py Outdated Show resolved Hide resolved

dop-amin added 21 commits April 2, 2024 16:04

Dilithium invNTT

1b84425

manual_ld4 for Dilithium invNTT

d7c6180

Kyber invNTT

141a287

Kyber invNTT opt A72

f3adf51

Introduce more Kyber invNTT reductions

b16a319

Kyber clean invNTT

242c68f

* Fix reductions

Add invNTT to example.py

e66d9ff

Add Kyber invNTT manual_ld4

dacc5ed

Add invNTTs to example.py

18562da

Adjust models to optimize invNTTs

e21d54f

Fix invNTT macro syntax

e4d0834

Optimized Kyber invNTTs

2334d1e

Fix reduction in Dilithium 123-45678 invNTTs

7a44d1f

Optimized DIlithium invNTTs

8313232

Fix modulus use invNTT Dilithium

3787897

Fix manual_ld4 invNTT Dilithium

53117d8

Final (?) update for invNTTs

836f521

Match Kyber invNTT Barrett reduction to pqclean

ec6e6ce

New no-unfold syntax

21d429f

Add more optimized code

1c6ec62

Add simd compare class to aarch64

53ee2d3

dop-amin force-pushed the invntt branch from 42aef73 to 53ee2d3 Compare April 2, 2024 14:08

hanno-becker reviewed Apr 3, 2024

View reviewed changes

dop-amin added 2 commits April 3, 2024 12:31

Fixed Dilithium invNTTs (ld4 ordering)

88dc5e9

cmge -> ASimdCompare

7aabde4

dop-amin force-pushed the invntt branch from ed92290 to 7aabde4 Compare April 3, 2024 11:13

dop-amin marked this pull request as ready for review April 3, 2024 12:28

dop-amin mentioned this pull request Apr 5, 2024

Add Tests for inverse NTTs slothy-optimizer/pqax#5

Merged

hanno-becker reviewed Apr 5, 2024

View reviewed changes

example.py Outdated Show resolved Hide resolved

Copy example config to preserve --dry-run

6659a6d

hanno-becker reviewed Apr 7, 2024

View reviewed changes

example.py Outdated Show resolved Hide resolved

hanno-becker reviewed Apr 7, 2024

View reviewed changes

examples/naive/aarch64/intt_dilithium_1234_5678_manual_ld4.s Show resolved Hide resolved

hanno-becker reviewed Apr 7, 2024

View reviewed changes

examples/naive/aarch64/intt_dilithium_1234_5678_manual_ld4.s Outdated Show resolved Hide resolved

hanno-becker reviewed Apr 7, 2024

View reviewed changes

dop-amin added 3 commits April 7, 2024 14:18

rm duplicate code

ee53b4b

allow src=dst in mulmod macro

8cb796e

Adjust mulmod{q} macro for fwd NTTs as well, add comments

c7c10a9

dop-amin force-pushed the invntt branch from d27a995 to c7c10a9 Compare April 7, 2024 13:57

dop-amin added 2 commits April 9, 2024 11:43

Fix invNTT reduction macro names

f37cffe

Add note about output range to invNTTs

d5c0d02

hanno-becker approved these changes Apr 11, 2024

View reviewed changes

hanno-becker merged commit d7b5296 into slothy-optimizer:main Apr 11, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add inverse NTTs for Kyber & Dilithium #37

Add inverse NTTs for Kyber & Dilithium #37

dop-amin commented Mar 15, 2024 •

edited

Loading

hanno-becker commented Apr 2, 2024

hanno-becker Apr 3, 2024 •

edited

Loading

dop-amin Apr 3, 2024

hanno-becker Apr 3, 2024

hanno-becker commented Apr 5, 2024

hanno-becker commented Apr 5, 2024

dop-amin commented Apr 5, 2024

dop-amin commented Apr 5, 2024

hanno-becker commented Apr 5, 2024

hanno-becker Apr 7, 2024 •

edited

Loading

dop-amin Apr 7, 2024

hanno-becker Apr 7, 2024 •

edited

Loading

dop-amin Apr 7, 2024

hanno-becker Apr 7, 2024

hanno-becker Apr 7, 2024

hanno-becker left a comment

dop-amin commented Apr 11, 2024

Add inverse NTTs for Kyber & Dilithium #37

Add inverse NTTs for Kyber & Dilithium #37

Conversation

dop-amin commented Mar 15, 2024 • edited Loading

hanno-becker commented Apr 2, 2024

hanno-becker Apr 3, 2024 • edited Loading

Choose a reason for hiding this comment

dop-amin Apr 3, 2024

Choose a reason for hiding this comment

hanno-becker Apr 3, 2024

Choose a reason for hiding this comment

hanno-becker commented Apr 5, 2024

hanno-becker commented Apr 5, 2024

dop-amin commented Apr 5, 2024

dop-amin commented Apr 5, 2024

hanno-becker commented Apr 5, 2024

hanno-becker Apr 7, 2024 • edited Loading

Choose a reason for hiding this comment

dop-amin Apr 7, 2024

Choose a reason for hiding this comment

hanno-becker Apr 7, 2024 • edited Loading

Choose a reason for hiding this comment

dop-amin Apr 7, 2024

Choose a reason for hiding this comment

hanno-becker Apr 7, 2024

Choose a reason for hiding this comment

hanno-becker Apr 7, 2024

Choose a reason for hiding this comment

hanno-becker left a comment

Choose a reason for hiding this comment

dop-amin commented Apr 11, 2024

dop-amin commented Mar 15, 2024 •

edited

Loading

hanno-becker Apr 3, 2024 •

edited

Loading

hanno-becker Apr 7, 2024 •

edited

Loading

hanno-becker Apr 7, 2024 •

edited

Loading