-
Notifications
You must be signed in to change notification settings - Fork 724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform 2-RTT Handshake to upgrade to PQ when possible #4526
Conversation
e923ece
to
e2d4a2c
Compare
e2d4a2c
to
bb5f254
Compare
bb5f254
to
2013258
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code changes look good, just a few questions and nit-picks. perhaps consider adding two additional test cases:
- an integration test asserting that a client sending a server-usupported PQ keyshare negotiates Kyber512
- a unit test with the client sending multiple keyshares (perhaps the first unsupported by server, second supported by server)
34b98d0
to
db306ee
Compare
s2n's current integration tests are fragile and hard to run and debug locally. I'd prefer to test this change using unit tests until this issue is resolved: #4342
These unit tests already exist in this test file. Found some that appear to be testing what you are asking: |
db306ee
to
46ae0e0
Compare
46ae0e0
to
0fa4c71
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my discomfort with this behavior change is coming from the fact that we're making an expensive choice (HRR, and with large key shares) based on assumptions about both the client and server. For the client, we're ignoring any intention indicated by key_share as unreliable, and we've never considered client preference order for options like supported_groups. For the server, we keep our PQ and classic groups in separate preference lists, so it's impossible for an application to indicate relative preferences between the two.
But I think the argument here is that this is just "secure by default". We've unilaterally made the decision for both applications and their clients that PQ is always strongly preferable to classic.
It might be worth adding a comment to the security policy struct reminding developers of that implied preference so they have it in mind while writing new security policies. Like, a reminder that if you add kem preferences to a policy, those always take priority over ecc groups, even at the cost of HRR.
This choice will likely prevent us from adding PQ to the default policies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that after this change we'd only incur an HRR in 3 cases:
- Client supports PQ per
supported_groups
, but doesn't send a PQ keyshare.- result: PQ keyshare in HelloRetry
- Client supports PQ, but sends malformed PQ keyshare or PQ keyshare for some algorithm combination not supported by the server
- result: PQ keyshare in HelloRetry
- Client doesn't support PQ, and sends a keyshare with with some ECDHE group not supported by the server
- result: ECDHE-only keyshare in HelloRetry
1.) is somewhat odd behavior; the only use-case I can think of for that is "probing" server PQ support or clients readying themselves for seamless PQ transition once the PQ groups are enabled server-side. 2.) is possible for non-s2n clients, but somewhat unlikely -- Kyber/MLKEM are what the industry has settled on and the code points are well known.
I don't think 1.) or 2.) would be particularly common, and 3.) was existing behavior before this PR. What's the concern with adding PQ to default policies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 will happen if there's ever a migration away from ML-KEM towards some "ML-KEM-v2" or equivalent. If we deploy this change today, little to nothing will change, but it will make discussions around PQ compliance much easier to have with customers, and save a lot of headache years from now if we ever need to migrate off of ML-KEM towards a better PQ algorithm.
Suppose 5 years from now "ML-KEM-v2" is invented that is twice as fast and uses half the bandwidth of today's ML-KEM (let's call it "ML-KEM-v1" for clarity). At that point in the future, clients will want to migrate to "ML-KEM-v2" and offer it in their KeyShare extension, but keep "ML-KEM-v1" as a fallback in their SupportedGroup extension without sending it in their KeyShare extension. In this scenario, AWS doesn't know if customers prefer PQ-security over first-byte latency, but if the customer is offering PQ at all, the safest option to ensure all customers are meeting all their compliance requirements around PQ, and select PQ whenever possible. This would perform a 2-RTT handshake and negotiate "ML-KEM-v1" from the SupportedGroups extension, even if it wasn't sent in the clients KeyShare extension.
Without this change customers who enable ML-KEM-v2 in their client configs early (before matching server-side support) will counterintuitively downgrade themselves to classical ECDHE KeyShares. Depending on the customers understanding of how TLS works, they might blame AWS TLS endpoints for downgrading their previously working PQ connections to ECDHE when it was correctly offered by their clients, and all they did was add a new PQ algorithm to the top of their preference list. To avoid these types of customer misconfiguration issues where changes to client-side PQ preference ordering might accidentally cause connections to lose PQ security that was previously working, we have decided that ensuring PQ algorithms are selected when they are offered is worth the small risk of a loss in performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the concern with adding PQ to default policies?
Applications use the default policies when they just want secure defaults and have no particular requirements for what those defaults are. Those applications may not want a HRR in order to select an option that they don't particularly need or care about. But with the current security policy structure, there's no way to indicate that PQ is supported but not strongly preferred. But maybe that's still fine, and we want everyone doing PQ all the time by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This choice will likely prevent us from adding PQ to the default policies.
I think this should be rephrased to
This choice will likely prevent us from adding PQ algorithms which were deployed before its final specification to the default policies.
If the algorithm is final and will not change, you can add it in the default policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a comment in the PQ policy to explain the logic as @lrstewart suggests is a good idea.
And one more comment about key_share
as an order indicator. This was not intender. RFC8446-bis is trying to explain that and enforce that the order is depicted from the group, not the key_share
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And +1 about not having a unified order instead of two orders which have order between them. I think @alexw91 explained this would take a lot of work. One unified order for picking groups would be better, but as it is now, we need to enforce an order between the two order and I think the way done here is the better way because of secure-by-default and that we don't want to downgrade someone that chose to enable PQ and support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think we'd want to have a careful performance discussion before adding PQ to any of the default policies. On graviton 3
- 3 cert RSA2048 chain, secp256 -> secp256/kyber768, 14% more cpu
- 3 cert ECDSA256 chain, secp256 -> secp256/kyber768, 35% more cpu
793a895
to
d2b0334
Compare
1845e0d
to
61cec75
Compare
61cec75
to
34a97bc
Compare
Resolved issues:
N/A
Description of changes:
Updates s2n to always prefer upgrading to a PQ Hybrid KeyShare whenever possible, even if doing so would require a 2 round trip handshake. s2n's current behavior is to choose the best KeyShare option possible in a 1-RTT handshake first, and only if a 1-RTT is not possible will a KeyShare algorithm that requires a 2-RTT handshake be chosen. After this change, s2n's server-side negotiation logic will consider a Hybrid PQ KeyShare as higher priority than completing the handshake in 1-RTT. This is to ensure that Hybrid PQ KeyShares will be guaranteed to be negotiated during any transition periods between one PQ KeyShare algorithm and another PQ KeyShare algorithm (where a newer PQ algorithm isn't mutually supported yet, but the older PQ algorithm can be negotiated in 2 round trips).
There is further discussion of this change in the IETF mailing list, and in a draft RFC by David Benjamin from Google:
Call-outs:
N/A
Testing:
Unit Tests
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.