Perform 2-RTT Handshake to upgrade to PQ when possible #4526

alexw91 · 2024-04-25T23:34:59Z

Resolved issues:

N/A

Description of changes:

Updates s2n to always prefer upgrading to a PQ Hybrid KeyShare whenever possible, even if doing so would require a 2 round trip handshake. s2n's current behavior is to choose the best KeyShare option possible in a 1-RTT handshake first, and only if a 1-RTT is not possible will a KeyShare algorithm that requires a 2-RTT handshake be chosen. After this change, s2n's server-side negotiation logic will consider a Hybrid PQ KeyShare as higher priority than completing the handshake in 1-RTT. This is to ensure that Hybrid PQ KeyShares will be guaranteed to be negotiated during any transition periods between one PQ KeyShare algorithm and another PQ KeyShare algorithm (where a newer PQ algorithm isn't mutually supported yet, but the older PQ algorithm can be negotiated in 2 round trips).

There is further discussion of this change in the IETF mailing list, and in a draft RFC by David Benjamin from Google:

Call-outs:

N/A

Testing:

Unit Tests

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

WillChilds-Klein

code changes look good, just a few questions and nit-picks. perhaps consider adding two additional test cases:

an integration test asserting that a client sending a server-usupported PQ keyshare negotiates Kyber512
a unit test with the client sending multiple keyshares (perhaps the first unsupported by server, second supported by server)

tests/unit/s2n_tls13_pq_handshake_test.c

tls/extensions/s2n_server_key_share.c

tests/unit/s2n_extensions_server_key_share_select_test.c

tls/s2n_server_hello_retry.c

alexw91 · 2024-06-14T23:15:21Z

code changes look good, just a few questions and nit-picks. perhaps consider adding two additional test cases:

an integration test asserting that a client sending a server-usupported PQ keyshare negotiates Kyber512

s2n's current integration tests are fragile and hard to run and debug locally. I'd prefer to test this change using unit tests until this issue is resolved: #4342

a unit test with the client sending multiple keyshares (perhaps the first unsupported by server, second supported by server)

These unit tests already exist in this test file. Found some that appear to be testing what you are asking:

tls/extensions/s2n_client_key_share.c

tests/unit/s2n_tls13_pq_handshake_test.c

tests/unit/s2n_extensions_server_key_share_select_test.c

lrstewart · 2024-06-19T06:58:43Z

tls/extensions/s2n_server_key_share.c

I think my discomfort with this behavior change is coming from the fact that we're making an expensive choice (HRR, and with large key shares) based on assumptions about both the client and server. For the client, we're ignoring any intention indicated by key_share as unreliable, and we've never considered client preference order for options like supported_groups. For the server, we keep our PQ and classic groups in separate preference lists, so it's impossible for an application to indicate relative preferences between the two.

But I think the argument here is that this is just "secure by default". We've unilaterally made the decision for both applications and their clients that PQ is always strongly preferable to classic.

It might be worth adding a comment to the security policy struct reminding developers of that implied preference so they have it in mind while writing new security policies. Like, a reminder that if you add kem preferences to a policy, those always take priority over ecc groups, even at the cost of HRR.

This choice will likely prevent us from adding PQ to the default policies.

My understanding is that after this change we'd only incur an HRR in 3 cases:

Client supports PQ per supported_groups, but doesn't send a PQ keyshare.

result: PQ keyshare in HelloRetry

Client supports PQ, but sends malformed PQ keyshare or PQ keyshare for some algorithm combination not supported by the server

result: PQ keyshare in HelloRetry

Client doesn't support PQ, and sends a keyshare with with some ECDHE group not supported by the server

result: ECDHE-only keyshare in HelloRetry

1.) is somewhat odd behavior; the only use-case I can think of for that is "probing" server PQ support or clients readying themselves for seamless PQ transition once the PQ groups are enabled server-side. 2.) is possible for non-s2n clients, but somewhat unlikely -- Kyber/MLKEM are what the industry has settled on and the code points are well known.

I don't think 1.) or 2.) would be particularly common, and 3.) was existing behavior before this PR. What's the concern with adding PQ to default policies?

2 will happen if there's ever a migration away from ML-KEM towards some "ML-KEM-v2" or equivalent. If we deploy this change today, little to nothing will change, but it will make discussions around PQ compliance much easier to have with customers, and save a lot of headache years from now if we ever need to migrate off of ML-KEM towards a better PQ algorithm.

Suppose 5 years from now "ML-KEM-v2" is invented that is twice as fast and uses half the bandwidth of today's ML-KEM (let's call it "ML-KEM-v1" for clarity). At that point in the future, clients will want to migrate to "ML-KEM-v2" and offer it in their KeyShare extension, but keep "ML-KEM-v1" as a fallback in their SupportedGroup extension without sending it in their KeyShare extension. In this scenario, AWS doesn't know if customers prefer PQ-security over first-byte latency, but if the customer is offering PQ at all, the safest option to ensure all customers are meeting all their compliance requirements around PQ, and select PQ whenever possible. This would perform a 2-RTT handshake and negotiate "ML-KEM-v1" from the SupportedGroups extension, even if it wasn't sent in the clients KeyShare extension.

Without this change customers who enable ML-KEM-v2 in their client configs early (before matching server-side support) will counterintuitively downgrade themselves to classical ECDHE KeyShares. Depending on the customers understanding of how TLS works, they might blame AWS TLS endpoints for downgrading their previously working PQ connections to ECDHE when it was correctly offered by their clients, and all they did was add a new PQ algorithm to the top of their preference list. To avoid these types of customer misconfiguration issues where changes to client-side PQ preference ordering might accidentally cause connections to lose PQ security that was previously working, we have decided that ensuring PQ algorithms are selected when they are offered is worth the small risk of a loss in performance.

What's the concern with adding PQ to default policies?

Applications use the default policies when they just want secure defaults and have no particular requirements for what those defaults are. Those applications may not want a HRR in order to select an option that they don't particularly need or care about. But with the current security policy structure, there's no way to indicate that PQ is supported but not strongly preferred. But maybe that's still fine, and we want everyone doing PQ all the time by default.

This choice will likely prevent us from adding PQ to the default policies.

I think this should be rephrased to

This choice will likely prevent us from adding PQ algorithms which were deployed before its final specification to the default policies.

If the algorithm is final and will not change, you can add it in the default policy.

I think a comment in the PQ policy to explain the logic as @lrstewart suggests is a good idea.

And one more comment about key_share as an order indicator. This was not intender. RFC8446-bis is trying to explain that and enforce that the order is depicted from the group, not the key_share.

And +1 about not having a unified order instead of two orders which have order between them. I think @alexw91 explained this would take a lot of work. One unified order for picking groups would be better, but as it is now, we need to enforce an order between the two order and I think the way done here is the better way because of secure-by-default and that we don't want to downgrade someone that chose to enable PQ and support it.

I also think we'd want to have a careful performance discussion before adding PQ to any of the default policies. On graviton 3

3 cert RSA2048 chain, secp256 -> secp256/kyber768, 14% more cpu

3 cert ECDSA256 chain, secp256 -> secp256/kyber768, 35% more cpu

alexw91 force-pushed the 2-rtt-pq-handshake branch 4 times, most recently from e923ece to e2d4a2c Compare May 21, 2024 16:35

alexw91 marked this pull request as ready for review May 21, 2024 17:16

alexw91 requested review from lrstewart and WillChilds-Klein and removed request for lrstewart May 21, 2024 17:59

alexw91 force-pushed the 2-rtt-pq-handshake branch from e2d4a2c to bb5f254 Compare May 23, 2024 16:11

alexw91 added the type/post-quantum label May 23, 2024

alexw91 force-pushed the 2-rtt-pq-handshake branch from bb5f254 to 2013258 Compare June 3, 2024 20:21

WillChilds-Klein reviewed Jun 3, 2024

View reviewed changes

alexw91 commented Jun 3, 2024

View reviewed changes

tls/extensions/s2n_server_key_share.c Outdated Show resolved Hide resolved

lrstewart reviewed Jun 3, 2024

View reviewed changes

tests/unit/s2n_extensions_server_key_share_select_test.c Show resolved Hide resolved

tls/s2n_server_hello_retry.c Outdated Show resolved Hide resolved

alexw91 force-pushed the 2-rtt-pq-handshake branch 2 times, most recently from 34b98d0 to db306ee Compare June 12, 2024 22:13

alexw91 force-pushed the 2-rtt-pq-handshake branch from db306ee to 46ae0e0 Compare June 14, 2024 23:15

WillChilds-Klein approved these changes Jun 17, 2024

View reviewed changes

alexw91 force-pushed the 2-rtt-pq-handshake branch from 46ae0e0 to 0fa4c71 Compare June 18, 2024 23:34

lrstewart reviewed Jun 19, 2024

View reviewed changes

lrstewart requested a review from jmayclin June 19, 2024 07:00

alexw91 force-pushed the 2-rtt-pq-handshake branch 2 times, most recently from 793a895 to d2b0334 Compare June 19, 2024 20:25

lrstewart approved these changes Jun 20, 2024

View reviewed changes

jmayclin approved these changes Jun 20, 2024

View reviewed changes

alexw91 force-pushed the 2-rtt-pq-handshake branch 3 times, most recently from 1845e0d to 61cec75 Compare June 21, 2024 22:11

alexw91 added 5 commits June 24, 2024 09:14

Perform 2-RTT Handshake to upgrade to PQ when possible

f227c7d

Update tests to pass on Openssl 1.1.1

e9b0611

Cleanup

3d9250e

Address Feedback

71239fa

Address Feedback

34a97bc

alexw91 force-pushed the 2-rtt-pq-handshake branch from 61cec75 to 34a97bc Compare June 24, 2024 16:14

jmayclin merged commit 2edc3f8 into aws:main Jun 24, 2024
34 checks passed

rod-chapman pushed a commit to rod-chapman/s2n-tls that referenced this pull request Jun 25, 2024

Perform 2-RTT Handshake to upgrade to PQ when possible (aws#4526)

ce82d9e

alexw91 mentioned this pull request Jul 16, 2024

Update s2n_connection_get_kem_group_name() to work with ClientHelloRe… #4652

Merged

alexw91 mentioned this pull request Oct 15, 2024

Add new MLKEM TLS Policies #4830

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perform 2-RTT Handshake to upgrade to PQ when possible #4526

Perform 2-RTT Handshake to upgrade to PQ when possible #4526

alexw91 commented Apr 25, 2024 •

edited

Loading

WillChilds-Klein left a comment

alexw91 commented Jun 14, 2024

lrstewart Jun 19, 2024 •

edited

Loading

WillChilds-Klein Jun 19, 2024

alexw91 Jun 19, 2024 •

edited

Loading

lrstewart Jun 20, 2024

csosto-pk Jun 20, 2024

csosto-pk Jun 20, 2024

csosto-pk Jun 20, 2024

jmayclin Jun 20, 2024 •

edited

Loading

Perform 2-RTT Handshake to upgrade to PQ when possible #4526

Perform 2-RTT Handshake to upgrade to PQ when possible #4526

Conversation

alexw91 commented Apr 25, 2024 • edited Loading

Resolved issues:

Description of changes:

Call-outs:

Testing:

WillChilds-Klein left a comment

Choose a reason for hiding this comment

alexw91 commented Jun 14, 2024

lrstewart Jun 19, 2024 • edited Loading

Choose a reason for hiding this comment

WillChilds-Klein Jun 19, 2024

Choose a reason for hiding this comment

alexw91 Jun 19, 2024 • edited Loading

Choose a reason for hiding this comment

lrstewart Jun 20, 2024

Choose a reason for hiding this comment

csosto-pk Jun 20, 2024

Choose a reason for hiding this comment

csosto-pk Jun 20, 2024

Choose a reason for hiding this comment

csosto-pk Jun 20, 2024

Choose a reason for hiding this comment

jmayclin Jun 20, 2024 • edited Loading

Choose a reason for hiding this comment

alexw91 commented Apr 25, 2024 •

edited

Loading

lrstewart Jun 19, 2024 •

edited

Loading

alexw91 Jun 19, 2024 •

edited

Loading

jmayclin Jun 20, 2024 •

edited

Loading