Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fips-2022-11-02] Make SSL_select_next_proto more robust to invalid calls. #1680

Merged

Conversation

skmcgrail
Copy link
Member

SSL_select_next_proto has some fairly complex preconditions:

  • The peer and supported list must be valid protocol lists
  • The supported list must not be empty. The peer list may be empty due to one of NPN's edge cases.

In the context of how this function is meant to be used, these are reasonable preconditions. The caller should not serialize its own list wrong, and it makes no sense to try to negotiate a protocol when you don't support any protocols. In particular, it complicates NPN's weird "opportunistic" protocol.

However, the preconditions are unchecked and a bit subtle. Violating them will result in memory errors. Bad syntax on the protocol lists is mostly not a concern (you should encode your own list correctly and the library checks the peer's list), but the second rule is somewhat of a mess in practice:

Despite having the same precondition in reality, OpenSSL did not document this. Their documentation implies things which are impossible without this precondition, but they forgot to actually write down the precondition. There's an added complexity that OpenSSL never updated the parameter names to match the role reversal between ALPN and NPN.

There are thus a few cases where a buggy caller may pass an empty "supported" list.

  • An NPN client called SSL_select_next_proto despite not actually supporting any NPN protocols.

  • An NPN client called SSL_select_next_proto, flipped the parameters, and the server advertised no protocols.

  • An ALPN server called SSL_select_next_proto, passed its own list in as the second parameter, despite not actually supporting any ALPN protocols.

In these scenarios, the "opportunistic" protocol returned in the OPENSSL_NPN_NO_OVERLAP case will be out of bounds. If the caller discards it, this does not matter. If the caller returns it through the NPN or ALPN selection callback, they have a problem. ALPN servers are expected to discard it, though some may be buggy. NPN clients may implement either behavior.

Older versions of some callers have exhibited variations on the above mistakes, so empirically folks don't always get it right. OpenSSL's wrong documentation also does not help matters. Instead, have SSL_select_next_proto just check these preconditions. That is not a performance-sensitive function and these preconditions are easy to check. While I'm here, rewrite it with CBS so it is much more straightforwardly correct.

What to return when the preconditions fail is tricky, but we need to output some protocol, so we output the empty protocol. This, per the previous test and doc fixes, is actually fine in NPN, so one of the above buggy callers is not retroactively made OK. But it is not fine in ALPN, so we still need to document that callers need to avoid this state. To that end, revamp the documentation a bit.

Thanks to Joe Birr-Pixton for reporting this!

Fixed: 735
Change-Id: I4378a082385e7334e6abaa6705e6b15d6843f6c5 Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/69028
Reviewed-by: Bob Beck [email protected]
Commit-Queue: David Benjamin [email protected]
(cherry picked from commit c1d9ac02514a138129872a036e3f8a1074dcb8bd)

Issues:

Resolves #ISSUE-NUMBER1
Addresses #ISSUE-NUMBER2

Description of changes:

Describe AWS-LC’s current behavior and how your code changes that behavior. If there are no issues this pr is resolving, explain why this change is necessary.

Call-outs:

Point out areas that need special attention or support during the review process. Discuss architecture or design changes.

Testing:

How is this change tested (unit tests, fuzz tests, etc.)? Are there any testing steps to be verified by the reviewer?

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

SSL_select_next_proto has some fairly complex preconditions:

- The peer and supported list must be valid protocol lists
- The supported list must not be empty. The peer list may be empty due
  to one of NPN's edge cases.

In the context of how this function is meant to be used, these are
reasonable preconditions. The caller should not serialize its own list
wrong, and it makes no sense to try to negotiate a protocol when you
don't support any protocols. In particular, it complicates NPN's weird
"opportunistic" protocol.

However, the preconditions are unchecked and a bit subtle. Violating
them will result in memory errors. Bad syntax on the protocol lists is
mostly not a concern (you should encode your own list correctly and the
library checks the peer's list), but the second rule is somewhat of a
mess in practice:

Despite having the same precondition in reality, OpenSSL did not
document this. Their documentation implies things which are impossible
without this precondition, but they forgot to actually write down the
precondition. There's an added complexity that OpenSSL never updated the
parameter names to match the role reversal between ALPN and NPN.

There are thus a few cases where a buggy caller may pass an empty
"supported" list.

- An NPN client called SSL_select_next_proto despite not actually
  supporting any NPN protocols.

- An NPN client called SSL_select_next_proto, flipped the parameters,
  and the server advertised no protocols.

- An ALPN server called SSL_select_next_proto, passed its own list in as
  the second parameter, despite not actually supporting any ALPN
  protocols.

In these scenarios, the "opportunistic" protocol returned in the
OPENSSL_NPN_NO_OVERLAP case will be out of bounds. If the caller
discards it, this does not matter. If the caller returns it through the
NPN or ALPN selection callback, they have a problem. ALPN servers are
expected to discard it, though some may be buggy. NPN clients may
implement either behavior.

Older versions of some callers have exhibited variations on the above
mistakes, so empirically folks don't always get it right. OpenSSL's
wrong documentation also does not help matters. Instead, have
SSL_select_next_proto just check these preconditions. That is not a
performance-sensitive function and these preconditions are easy to
check. While I'm here, rewrite it with CBS so it is much more
straightforwardly correct.

What to return when the preconditions fail is tricky, but we need to
output *some* protocol, so we output the empty protocol. This, per the
previous test and doc fixes, is actually fine in NPN, so one of the
above buggy callers is not retroactively made OK. But it is not fine in
ALPN, so we still need to document that callers need to avoid this
state. To that end, revamp the documentation a bit.

Thanks to Joe Birr-Pixton for reporting this!

Fixed: 735
Change-Id: I4378a082385e7334e6abaa6705e6b15d6843f6c5
Reviewed-on: https://boringssl-review.googlesource.com/c/boringssl/+/69028
Reviewed-by: Bob Beck <[email protected]>
Commit-Queue: David Benjamin <[email protected]>
(cherry picked from commit c1d9ac02514a138129872a036e3f8a1074dcb8bd)
@skmcgrail skmcgrail requested review from a team as code owners June 29, 2024 01:20
@skmcgrail skmcgrail merged commit d6d496b into aws:fips-2022-11-02 Jun 29, 2024
5 of 9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants