Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNM] Add Rustls compile time implementation #336

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

hargut
Copy link

@hargut hargut commented Jul 24, 2024

Overview

The PR implements native Rustls usage within pingora-core. The feature can be activated using feature flag rustls. The changes involved restructuring/abstraction within the pingora-core to allow for multiple TLS implementations.

As of now there are no danger elements involved within the Rustls implementation.

The PR is intended to provide a basis for further discussions and enhancements. It is expected that further works will be needed. The current status has reached a fully working state and seems to provide good initial results.

#29

Notes

As I'm quite new to Rust I may have taken choices within the implementation that are not good, may have missed best practices or other things. The PR was created with the goal of further exploring the Rust language and to gather more knowledge on the pingora framework.

There are likely areas within the code that can and should be improved. I highly appreciate any feedback or discussions that help to improve the implementation.

Implementation

The implementation relies on the crates provided from the rustls project and utilizes tokio-rustls.

As of now the pingora-rustls crate is very slim and only contains helper functions and re-exports. tokio-rustls was directly integrated with pingora-core.

The feature flag rustls adds the pingora-rustls dependency to pingora-core and triggers compile time includes within pingora-core.

Significant API changes

  • pingora_core::connectors::tls::Connector & pingora_core::listeners::tls::Acceptor
    are now having Box<dyn Trait> members instead of concrete types.
  • server handshake related parameters & returns are now based on Box<dyn IO + Send>
  • Some structural elements have been renamed from a ssl form to a tls form (e.g. SslAcceptor to TLSAcceptor).
  • Stored certificates in non-implementation specific areas are now using DER format as [u8] which is favorable to instantiating the according certs/keys from BoringSSL/OpenSSL and Rustls.

... and multiple others as OpenSSL/BoringSSL is currently tied into pingora-core at different areas.

Tests

The existing tests pass with BoringSSL, OpenSSL and Rustls. Some tests where slightly modified to avoid errors/warnings with the different SSL/TLS feature flags. Tests in related to the certificate callback and no-verify options are disabled when using the rustls feature.

The tests for Rustls need to run with a certificate that is valid for all requests as there is currently no option implemented to disable hostname verification.

Performance

As of now there is no in-depth comparison between the different SSL/TLS implementations available. Some quick initial tests where performed to check some basic metrics. None of these performance statements are reliable indicators as they were only measured on a local machine without checking further relevant details.

During the tests the used artifacts had been compiled using a Cargo profile that inherits = "release".

Areas that where checked on the local machine:

  • TLS handshakes (v1.2) using tls-perf
  • simple GET at a rates of 1500 req/sec using ali
  • simple POST requests sending 8MB of data using ali

From these quick and basic tests it looks like that

  • the OpenSSL implementation compared to current main branch v0.3.0
    does not show significant differences.
  • the Rustls implementation is providing improvements. The gap seems to be closest on the POST result comparison.

Thank you for very much for your attention and have a nice week. 😀

Kind regards,
Harald

@eaufavor eaufavor added the enhancement New feature or request label Jul 26, 2024
@hargut
Copy link
Author

hargut commented Jul 29, 2024

Further Thoughts & Considerations

... in unsorted order

  • layout/structure to allow generic TLS provider integration

    • creation of a pingora-core-api crate to avoid cyclic dependencies
      • moving the TLS provider traits into that crate
      • updating implementation in pingora-core TLS areas to use the traits from pingora-core-api
      • adding actual implementation using compile time feature configuration
    • moving TLS provider specific code into TLS crates
      • e.g. create pingora-boringssl-openssl crate to avoid code duplication for BoringSSL & OpenSSL
      • move BoringSSL & OpenSSL related code into that create
      • re-exporting pingora-boringssl-openssl implementations via the pingora-openssl and pingora-boringssl crates and use these to select the TLS provider during compile time
      • move Rustls specific code into pingora-rustls
    • allowing potential other TLS backends to be implemented (e.g. tokio-tls-native, ...)
  • certificate/key storage and parsing

    • within the PR the storage format is currently adapted to DER [u8] which might not be the best choice in terms of performance
      • in Rustls this does as align well with the internal storage type beeing either Vec<u8> or &[u8] in DER.
    • currently parts of the pingora-core::utils::tls are BoringSSL/OpenSSL specific implementations
    • within Rustls there is no X509 parsing available, and the project mentions using the x509-parser crate
    • depending on the storage format, it might be helpful to unify the utils section and to add the x509-parser directly to pingora-core and use it unified for all TLS implementations
  • testing

    • are performance tests / benchmarks available for the TLS area to validate the impact changes?
    • benchmarking TLS implementations within core (connector, acceptor, proxy from connector to acceptor)
    • enhance test coverage for various Rustls specific implementations
  • extending the CI

    • test various TLS implementations (e.g. matrix with `tls_feature: [openssl, boringssl, rustls], executing tests & benchmarks)
    • run benchmarks and create comparing reports during PRs
  • identical behaviour to BoringSSL/OpenSSL in the area of verify/sni/alt validation

  • Rustls Post Quantum support

@johnhurt johnhurt self-assigned this Aug 2, 2024
@johnhurt
Copy link
Contributor

johnhurt commented Aug 2, 2024

This is incredible work! I (and the rest of the pingora team) are going to start taking a look at this now and start putting in feedback as it comes up. My first reaction (other than "Wow!") is this is a giant pr, so the first thing I'm going to try to do is figure out a way to break it up into smaller pieces. This makes it easier to review, but also since we use pingora as the foundation for multiple services in production, we try not to let it change too much too fast 😅

Here is the order of operations I recommend.

  1. Leave this pr open as a reference, but add [DNM] (do not merge) to the title.
  2. Wait for the public sync today (or next week at the latest) - @palant has a pr for no-op tls that should get synced to the public repo today, and that might cause some conflicts with this pr.
  3. Add separate, stacked prs with this (suggested) break down
    1. Structural refactoring - Just the minimum changes to establish the new boringssl_openssl module and add the rustls feature (minimal code changes). This will eliminate a lot of the noise from the pr.
    2. Certificate changes - Changes to make the existing boringssl_openssl implementations, examples and tests work with the modified CertKey type. To me this is the change with the biggest risk since it will affect the tls code we are using in production.
    3. Introduce pingora-rustls - Add this crate to the workspace and wire it in with the features
    4. Completed Implementation - Everything else.

I want to reiterate, that this is an enormous amount of work, and an amazing undertaking for one person 👏 . rustls integration is something we have been planning to do for a while. To that end, if the above steps seem too tedious or onerous, I will totally understand. We will gladly push this over the finish line with the work you have already done (and make sure you are credited for your contributions). Obviously we would rather keep working with you directly, but you should know that you have already helped us a lot, so thank for you for this contribution (and hopefully lots more in the future).

PS. If you are new to rust it doesn't show 😆

@johnhurt
Copy link
Contributor

johnhurt commented Aug 2, 2024

Your Further Thoughts comment has some good points on how the tls and api code can be reorganized (among other things). Now that we are introducing rustls and no-op tls to the code base, it does make sense to formalize and isolate those features in their own crate. The big thing we want to ensure is that at the pingora-sever level, we aren't adding any dynamic or enum dispatches to just to account for extra tls-provider options. As @eaufavor mentioned here, we will internally be using open/boring ssl internally for compliance reasons for (basically) ever. We also don't see a need to support multiple providers at runtime, so we want to make sure that the provider selected via compile-time features is accessible directly.

You also mentioned testing and feature matrices within the ci. Those are good callouts, and part of the changes I'm working on to go with the pr for no-op tls.

@palant
Copy link
Contributor

palant commented Aug 2, 2024

Interesting approach. I was also thinking about this, and I thought about standardizing on the Rustls API rather than introducing separate code paths for Rustls and OpenSSL/BoringSSL. The reason is that Rustls has a better API (at least for Rust consumers), and it supports different backends. A BoringSSL backend already exists, and writing an OpenSSL backend providing only the necessary functionality should be doable.

But I guess that this kind of standardization can still be done later once the Rustls codepath lands – if it’s considered desirable.

@hargut hargut changed the title Add Rustls compile time implementation [DNM] Add Rustls compile time implementation Aug 5, 2024
@hargut
Copy link
Author

hargut commented Aug 5, 2024

Hi @johnhurt,

thank you very much for your kind words and the feedback. 😃

I'd be interested in continuing working on this change, and fully support the suggested approach to split that PR into pieces. I'm currently travelling and will be back in around 2-3 weeks. In case someone wants to work meanwhile on the topic I'd be as well happy to support any efforts that help to bring this feature forward.

I've renamed the PR to have the [DNM] prefix.

It is not fully clear to me which path is desired for the suggested stacked PRs:

  • Should the current PR be split and be continue to be part of pingora-core?
  • Should the current PR be further adapted to the separate crates approach (introduce the pingora-core-api, have the implementations within e.g. pingora-boringssl-openssl, pingora-rustls?

For splitting up the PR into the stacked ones it would be as well helpful to have an initial PR review on the code side to allow the first set of feedback to be directly incorporated with these PRs. I know, this request is unfortunate as one of the goals of stacking the PRs is to simplify reviewing.

From a performance, dependency, distribution point of view I think it is desirable to stay with compile time TLS implementation selection. Fully avoiding dynamic dispatching could eventually work out, but the current approach does contain trait objects as struct members.

I'm thinking it might be worthwhile to start with a benchmarking PR for the acceptor/connector components to have meaningful way for comparisons. The tooling here could be valgrind based (e.g. callgrind/cachegrind/dhat). A time based benchmark test for the full acceptor/connector will likely not produce stable figures due to the complexity and async runtime.
What do you think of trying iai_callgrind for that purpose?

Enjoy the summer and have a great time. 🌞

Kind regards,
Harald

@eaufavor eaufavor added the WIP We are working on this feature internally label Aug 9, 2024
@johnhurt johnhurt added the Accepted This change is accepted by us and merged to our internal repo label Aug 17, 2024
@johnhurt
Copy link
Contributor

Awesome, @hargut. I marked this as accepted. I'll split it up into smaller PRs internally and you will see them come out slowly in our weekly syncs. I am going to lean towards keeping everything in the pingora-core as opposed to adding new crates. I think we might go that way in the future, but for development, this is a little easier ... plus crates.io already complains enough about the number of crates we publish.

I want to thank you again for this giant piece of work and effort. I'll make sure you are the author on the commits I make that are derived from your code. If you are willing to keep contributing then, a benchmarking framework would be a great place to start. That is something we don't have for pingora, and there are times (like now) when comparative benchmarks would be helpful. I am partial to criterion, but I have never used it outside of simple tests. Thanks,

Kevin

@hargut
Copy link
Author

hargut commented Aug 22, 2024

Hi @johnhurt ,

thank you for looking into splitting up the PRs in to smaller ones and taking care of the process.

As one of my goals with this PR is learning Rust and related details, I'd highly appreciate to have a chance to receive the technical feedback to allow learning from mistakes having been made. Can you please somehow (also via mail if easier) provide the feedback received?

I've created a separate ticket for the benchmark topic and will be looking into it.

Kind regards,
Harald

@johnhurt
Copy link
Contributor

Totally, @hargut. I'll update this ticket whenever we review/release part of this original cr to update you on the progress and what we find. Internally we are reviewing the changes to the existing openssl-boringssl implementations. We are going to spend the most time with this because this is only code path we are going to use internally, and we want to make sure it isn't affected by including a rustls implementation.

The only thing I have right now in my notes from reading through to deciding where to split is that I that the listeners::tls::SSLAcceptorBuilder which in this pr is a Box<dyn Trait> could probably be replaced with a #[cfg(feature)] to allow it to work with all tls implementations, but still static dispatch.

I like the new issue you put in, and I appreciate all your contribution. Thanks!

use generics instead as used within the original version
remove trait implementations for Box<dyn IO + Send> as no longer required
@hargut
Copy link
Author

hargut commented Aug 27, 2024

Hi @johnhurt,

FYI: I've pushed a simplification commit within this branch removing several Boxes from method signatures.

Kind regards,
Harald

@johnhurt
Copy link
Contributor

Awesome. Thank. I'm the one traveling this time, so I'll take a look at it when I get home

@hargut
Copy link
Author

hargut commented Sep 6, 2024

Hi @johnhurt,

I'm currently having a closer look at the differences in the BoringSSL/OpenSSL implementations comparing pingora/0.3.0 with this branch (#336 ) using #367.

The first performance related enhancement was removing the indirection through the trait in protocols::tls::TlsStream.tls related calls. The indirection was present in the call graph, and is gone after applying the change. This change also had positive effects when comparing the result stats.

Have a great weekend. 😃

Kind regards,
Harald

 replace direct field access with getter method that
 returns &Arc<impl TlsConnectorContext>,
 store concrete type in now private struct field

using impl Trait in return value keeps method signature
stable across features; allows for static dispatch

 adjust visibility for TlsConnectorCtx structs from pub(crate) to pub(super)
@johnhurt
Copy link
Contributor

Hey, @hargut. I am almost done with an internal recreation of your pr. I wanted to give you an update and save some of your time because it looks like (based on your most recent prs) we came to the same conclusions.

My main goal is to get rustls support added in without making any (meaningful) changes to the existing ssl implementations.
What I did was take all the trait-level abstractions you (used to) have, and replace them with compile-time configuration switching. The result is our ssl implementations are unchanged (other than some type aliasing), and the rustls implementation loses the need for any indirection other than what it takes to make its api look like what we are already using to talk to openssl.

I'll let you know when this is about to be synced so you can get prepared for the outpouring of appreciation from all the people waiting for specifically this feature. I think you can consider this pr complete. Thanks!

@hargut
Copy link
Author

hargut commented Sep 10, 2024

Hi @johnhurt,

thank you for the feedback and your help in landing this feature! 👍

The past commits from my end, had been focused on getting the benchmark stats as close as possible to the original implementation. As of now, I feel that the PR could have been less complicated in regards to layout and abstractions. As you already mentioned using a more direct approach with compile time configs is favourable in regards to performance.

Likely I'll add another commit to "finish" the intended work from my end. But I solely consider this efforts as educational efforts for myself and to get familiar with the tooling and (more) performance details.

Kind regards,
Harald 😃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted This change is accepted by us and merged to our internal repo enhancement New feature or request WIP We are working on this feature internally
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants