Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ssl_ctx test #196

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open

Add ssl_ctx test #196

wants to merge 6 commits into from

Conversation

nhorman
Copy link
Contributor

@nhorman nhorman commented Jun 3, 2024

we have the handshake test, but it would be nice to have a perf test that measures just the amount of time taken to execute a SSL_CTX_new call. Add that test here

@nhorman nhorman linked an issue Jun 3, 2024 that may be closed by this pull request
perf/ssl_ctx.c Outdated Show resolved Hide resolved
perf/ssl_ctx.c Show resolved Hide resolved
@nhorman nhorman requested a review from mattcaswell June 4, 2024 16:04
@vdukhovni
Copy link

One of the tasks here should ideally be to isolate the cost of loading the certificate store in the SSL_CTX, by sharing the store across multiple SSL_CTXs. The mechanism for this is described in the manpage of SSL_CTX_set1_cert_store(3). Basically get a handle on the store of the first fully initialised CTX, and then set as the store for all the others.

In other words, compare the cost of creating a fully fleshed SSL_CTX with a private store vs. creating a similar SSL_CTX using a pre-existing store, amortising CAfile loading over multiple SSL_CTX objects. However, there may then be multiple threads accessing that shared store, and that could impact performance of store operations. So a trade-off.

Thus we should also look for any impact on handshake performance when multiple SSL_CTX objects are used, but their underlying cert store is shared.

@nhorman
Copy link
Contributor Author

nhorman commented Jun 6, 2024

Is that really what we want here though? I'm honestly askinig, as I'm not sure.

While I understand that what you're describing is an optimization to creating multiple SSL_CTX objects, this is a performance test. Do we want to test the optimized path, or do we want to test the worst case path?

I could see an argument for both, but if we're working under the assumption that the optimized application commonly only creates a single (or very few) SSL_CTX objects that then get shared among multiple SSL objects, I would make an argument that we want to test and optimize the slow path, that rebuilds the store every time, so that the single creation case gets addressed.

Opinions welcome of course.

@vdukhovni
Copy link

Is that really what we want here though? I'm honestly askinig, as I'm not sure.

While I understand that what you're describing is an optimization to creating multiple SSL_CTX objects, this is a performance test. Do we want to test the optimized path, or do we want to test the worst case path?

I could see an argument for both, but if we're working under the assumption that the optimized application commonly only creates a single (or very few) SSL_CTX objects that then get shared among multiple SSL objects, I would make an argument that we want to test and optimize the slow path, that rebuilds the store every time, so that the single creation case gets addressed.

Opinions welcome of course.

The point isn't to test the optimisation, rather it is to determine whether, in fact as surmised, the majority of the cost of SSL_CTX creation is loading the CAfile, and therefore any effort to reduce that cost should be spent on decoders, and the store, or whether there are other costs that also matter.

So certainly comparing shared store vs. per-CTX store was/is part of what we want to learn. Whether this then impacts runtime perf is I guess then interesting, but optional, in that we're now exploring whether to recommend this optimisation, or whether it has a non-trivial downside. Good to learn, but perhaps a separate question? I'd like to know the answer, but it is a separate question.

@nhorman
Copy link
Contributor Author

nhorman commented Jun 7, 2024

But we can't (or well, we can, but arguably shouldn't) create fan out for every combination of operation in one test (that would become unruly very quickly I think). If you want to know where you spend most of your time in a given set of call paths, use them all and run a tool like perf or gprof to determiine where your time is most frequently spent.

@vdukhovni
Copy link

But we can't (or well, we can, but arguably shouldn't) create fan out for every combination of operation in one test (that would become unruly very quickly I think). If you want to know where you spend most of your time in a given set of call paths, use them all and run a tool like perf or gprof to determiine where your time is most frequently spent.

Sure, which is why we can probably skip measuring handshake impact, but if we truly want to understand CTX creation cost, it is I think useful to try to isolate the X509 trust store setup, which is naïvely expected to be the bulk of the cost. Otherwise, what are we really trying to measure here?

@nhorman
Copy link
Contributor Author

nhorman commented Jun 7, 2024

My expectation (based on prior comments), was that we were trying to measure the 'cold' creation time of an SSL_CTX. That is to say, the creation and population of an SSL_CTX with data that would make it usable for handling connections. The identification of highly latent paths within that context is the (IMHO) the responsibility of adjunct tools like gprof or perf, should we find a performance degradation between compared builds.

@vdukhovni
Copy link

That's different from the original discussion during the backlog review. I think that isolating the store creation is an obvious high-level knob that is much simpler to turn than gprof, but I'm probably repeating myself. What was the motivation for doing this work? I thought it was reports of regressions in performance of this call, possibly as a result of higher costs in the decoder side, primarily in the CAfile parsing, which splitting it out would expose. Your call...

@nhorman
Copy link
Contributor Author

nhorman commented Jun 7, 2024

There were no explicit issues raised regarding the latency of this call (that I am aware of). Rather the handshake test creates a shared ssl_ctx for all created connections, and the discussion I recall was that:
a) we should have a test which creates an ssl_ctx per connection, rather than a shared one, as doing so is a not uncommon practice in some environments (which we can now toggle with the -s flag in the handshake test)

b) It would be nice to have an ssl_ctx standalone test to measure ssl_ctx_new performance (based on institutional knoweldge that ssl_ctx is generally an expensive operaition). Thats what this test does. The comments here suggested that the calling of SSL_CTX_new by itself is in fact not expensive, but the bulk of the latency occurs when adding a keystore/ca/key/etc), and so this test was modified to add those operations.

Its my view that any test should really attempt to test the worst case scenario, from which we can use adjunct tools to identify the slow paths. Creating efficiencies in our test cases really just puts us in the position of observing that in the nominal/efficient case, performance is ok (unless someone opens an issue that says the nominal/efficient case is somehow bad, in which case, yes, that certainly drives the argument that we should create a test case for that scenario).

Put another way: Right now we're working on the assumption that SSL_CTX_new with a per-ctx cert store/cey/cafile/etc is our slow path, and thats what we need to improve. If there are reports of other scenarios, yes, we should add those test cases as well, but baring that, Adding efficiencies just makes things run faster, and lets us ignore the slow path.

Not saying that we shouldn't have extra tests for various situations, but from my understanding of past discussions adn comments, this is what we want to address right now.

@vdukhovni
Copy link

Thanks for the clarification, makes sense. Just to make sure we're not miscommunication, I was not suggesting only testing the code path where the trust store is shared. Rather, I was anticipating that the test would cover the worst-case scenario you describe, and also, separately (suitable command-line flag) would test with a shared store for comparison to validate the intuition that just he store setup accounts for almost all the costs. Perhaps I already made that clear, if so please pardon the repetition.

If you feel that having both metrics is not worth the effort, so be it. Thanks for bearing with me so far.

@nhorman nhorman requested a review from vdukhovni June 10, 2024 13:07
Copy link
Contributor

@Sashan Sashan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general, have just two minor suggestions.

perf/ssl_ctx.c Outdated Show resolved Hide resolved
perf/ssl_ctx.c Show resolved Hide resolved
perf/ssl_ctx.c Outdated Show resolved Hide resolved
perf/ssl_ctx.c Outdated Show resolved Hide resolved
we have the handshake test, but it would be nice to have a perf test
that measures just the amount of time taken to execute a SSL_CTX_new
call.  Add that test here
Add command line switches to do ssl_ctx setup for a client or server,
and add in the assigning of cert/key for server and certstore for
clients
perf/ssl_ctx.c Outdated Show resolved Hide resolved
perf/ssl_ctx.c Outdated Show resolved Hide resolved
perf/ssl_ctx.c Outdated Show resolved Hide resolved
@nhorman
Copy link
Contributor Author

nhorman commented Jul 8, 2024

@mattcaswell given that we're moving these tools to the perf repository, I'm actually going to close this and open a new pr there, with your issues addressed

@nhorman
Copy link
Contributor Author

nhorman commented Jul 9, 2024

actually scratch that last comment, i'll fix them up here, it will be much easier for me to port it over once its all fixed up properly

@nhorman nhorman requested a review from mattcaswell July 9, 2024 12:41
perf/ssl_ctx.c Show resolved Hide resolved
@nhorman nhorman requested a review from mattcaswell July 9, 2024 14:00
Copy link
Contributor

@Sashan Sashan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all looks good, thanks.

@t8m
Copy link
Member

t8m commented Jul 10, 2024

Hmm.. this PR needs to be transfered over to openssl/perftools repo.

@nhorman
Copy link
Contributor Author

nhorman commented Jul 10, 2024

yeah, see my comment here. My plan is to get these merged, then do a sync pr to perftools, while at the same time doing a PR to remove the code from this one

@t8m
Copy link
Member

t8m commented Aug 16, 2024

IMO there is no point to merge this here. The PR is approved so if you apply identical changes to the perftools repo, you can just merge it there.

@vdukhovni
Copy link

Is this still pertinent? I can review if that still makes sense.

@Sashan
Copy link
Contributor

Sashan commented Oct 10, 2024

I think this PR needs to opened against https://github.com/openssl/perftools if we want to move it forward.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write test to measure SSL_CTX creation times
5 participants