[fips-2022-11-02] Make SSL_select_next_proto more robust to invalid calls. #1680

skmcgrail · 2024-06-29T01:20:10Z

SSL_select_next_proto has some fairly complex preconditions:

The peer and supported list must be valid protocol lists
The supported list must not be empty. The peer list may be empty due to one of NPN's edge cases.

In the context of how this function is meant to be used, these are reasonable preconditions. The caller should not serialize its own list wrong, and it makes no sense to try to negotiate a protocol when you don't support any protocols. In particular, it complicates NPN's weird "opportunistic" protocol.

However, the preconditions are unchecked and a bit subtle. Violating them will result in memory errors. Bad syntax on the protocol lists is mostly not a concern (you should encode your own list correctly and the library checks the peer's list), but the second rule is somewhat of a mess in practice:

Despite having the same precondition in reality, OpenSSL did not document this. Their documentation implies things which are impossible without this precondition, but they forgot to actually write down the precondition. There's an added complexity that OpenSSL never updated the parameter names to match the role reversal between ALPN and NPN.

There are thus a few cases where a buggy caller may pass an empty "supported" list.

An NPN client called SSL_select_next_proto despite not actually supporting any NPN protocols.
An NPN client called SSL_select_next_proto, flipped the parameters, and the server advertised no protocols.
An ALPN server called SSL_select_next_proto, passed its own list in as the second parameter, despite not actually supporting any ALPN protocols.

In these scenarios, the "opportunistic" protocol returned in the OPENSSL_NPN_NO_OVERLAP case will be out of bounds. If the caller discards it, this does not matter. If the caller returns it through the NPN or ALPN selection callback, they have a problem. ALPN servers are expected to discard it, though some may be buggy. NPN clients may implement either behavior.

Older versions of some callers have exhibited variations on the above mistakes, so empirically folks don't always get it right. OpenSSL's wrong documentation also does not help matters. Instead, have SSL_select_next_proto just check these preconditions. That is not a performance-sensitive function and these preconditions are easy to check. While I'm here, rewrite it with CBS so it is much more straightforwardly correct.

What to return when the preconditions fail is tricky, but we need to output some protocol, so we output the empty protocol. This, per the previous test and doc fixes, is actually fine in NPN, so one of the above buggy callers is not retroactively made OK. But it is not fine in ALPN, so we still need to document that callers need to avoid this state. To that end, revamp the documentation a bit.

Thanks to Joe Birr-Pixton for reporting this!

Fixed: 735
Change-Id: I4378a082385e7334e6abaa6705e6b15d6843f6c5 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/69028
Reviewed-by: Bob Beck [email protected]
Commit-Queue: David Benjamin [email protected]
(cherry picked from commit c1d9ac02514a138129872a036e3f8a1074dcb8bd)

Issues:

Resolves #ISSUE-NUMBER1
Addresses #ISSUE-NUMBER2

Description of changes:

Describe AWS-LC’s current behavior and how your code changes that behavior. If there are no issues this pr is resolving, explain why this change is necessary.

Call-outs:

Point out areas that need special attention or support during the review process. Discuss architecture or design changes.

Testing:

How is this change tested (unit tests, fuzz tests, etc.)? Are there any testing steps to be verified by the reviewer?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

SSL_select_next_proto has some fairly complex preconditions: - The peer and supported list must be valid protocol lists - The supported list must not be empty. The peer list may be empty due to one of NPN's edge cases. In the context of how this function is meant to be used, these are reasonable preconditions. The caller should not serialize its own list wrong, and it makes no sense to try to negotiate a protocol when you don't support any protocols. In particular, it complicates NPN's weird "opportunistic" protocol. However, the preconditions are unchecked and a bit subtle. Violating them will result in memory errors. Bad syntax on the protocol lists is mostly not a concern (you should encode your own list correctly and the library checks the peer's list), but the second rule is somewhat of a mess in practice: Despite having the same precondition in reality, OpenSSL did not document this. Their documentation implies things which are impossible without this precondition, but they forgot to actually write down the precondition. There's an added complexity that OpenSSL never updated the parameter names to match the role reversal between ALPN and NPN. There are thus a few cases where a buggy caller may pass an empty "supported" list. - An NPN client called SSL_select_next_proto despite not actually supporting any NPN protocols. - An NPN client called SSL_select_next_proto, flipped the parameters, and the server advertised no protocols. - An ALPN server called SSL_select_next_proto, passed its own list in as the second parameter, despite not actually supporting any ALPN protocols. In these scenarios, the "opportunistic" protocol returned in the OPENSSL_NPN_NO_OVERLAP case will be out of bounds. If the caller discards it, this does not matter. If the caller returns it through the NPN or ALPN selection callback, they have a problem. ALPN servers are expected to discard it, though some may be buggy. NPN clients may implement either behavior. Older versions of some callers have exhibited variations on the above mistakes, so empirically folks don't always get it right. OpenSSL's wrong documentation also does not help matters. Instead, have SSL_select_next_proto just check these preconditions. That is not a performance-sensitive function and these preconditions are easy to check. While I'm here, rewrite it with CBS so it is much more straightforwardly correct. What to return when the preconditions fail is tricky, but we need to output *some* protocol, so we output the empty protocol. This, per the previous test and doc fixes, is actually fine in NPN, so one of the above buggy callers is not retroactively made OK. But it is not fine in ALPN, so we still need to document that callers need to avoid this state. To that end, revamp the documentation a bit. Thanks to Joe Birr-Pixton for reporting this! Fixed: 735 Change-Id: I4378a082385e7334e6abaa6705e6b15d6843f6c5 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/69028 Reviewed-by: Bob Beck <[email protected]> Commit-Queue: David Benjamin <[email protected]> (cherry picked from commit c1d9ac02514a138129872a036e3f8a1074dcb8bd)

skmcgrail requested review from a team as code owners June 29, 2024 01:20

torben-hansen approved these changes Jun 29, 2024

View reviewed changes

andrewhop approved these changes Jun 29, 2024

View reviewed changes

skmcgrail merged commit d6d496b into aws:fips-2022-11-02 Jun 29, 2024
5 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fips-2022-11-02] Make SSL_select_next_proto more robust to invalid calls. #1680

[fips-2022-11-02] Make SSL_select_next_proto more robust to invalid calls. #1680

skmcgrail commented Jun 29, 2024

[fips-2022-11-02] Make SSL_select_next_proto more robust to invalid calls. #1680

[fips-2022-11-02] Make SSL_select_next_proto more robust to invalid calls. #1680

Conversation

skmcgrail commented Jun 29, 2024

Issues:

Description of changes:

Call-outs:

Testing: