Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: memory leak during STEK rotation #5146

Merged
merged 3 commits into from
Feb 28, 2025
Merged

Conversation

jmayclin
Copy link
Contributor

@jmayclin jmayclin commented Feb 26, 2025

Release Summary:

A small memory leak related to session resumption was resolved. Long lived applications with session resumption enabled will see a reduction in the memory footprint of s2n_config.

Description of changes:

For each STEK that was added to a config, we hashed it and stored the hash in config->ticket_key_hashes. This was intended to have a maximum size of 500, but a typo in the length check resulted in us checking the length of ticket_keys instead of ticket_key_hashes. Therefore ticket_key_hashes would continue to grow as new STEKs were added to the config.

This PR totally removes the ticket key hashes set. Instead we enforce secret uniqueness through a linear scan of the existing keys. Performance wise, this is acceptable because S2N_MAX_TICKET_KEYS is 48.

external behavior changes

  • previous: STEK secret material must be unique across all STEKs that have ever been set on the config
  • proposed: STEK secret material must be unique across all unexpired STEKs currently on the config

internal behavior changes

  • previous: STEKs in config->ticket_keys are sorted based on the intro_time parameter
  • proposed: STEKs in config->tickets_keys are not sorted. They happen to be in "most recently added" order.

I was suspicious that this internal behavior change wouldn't break things in odd ways, so here is the little audit I did

removing steks: s2n_config_wipe_expired_ticket_crypto_keys does not short circuit, but iterates through all keys selecting the ones that are no longer valid. So this behavior doesn't break.
selecting stek for encryption: s2n_get_ticket_encrypt_decrypt_key does not short circuit when gathering the indices of the steks available for encryption. So this behavior doesn't break.
selecting stek for decryption: This is done by key name, so it makes sense that we aren't relying on the order of the ticket keys.

Call-outs:

This PR will remove the only usages of s2n_set in the codebase, so we can rip out s2n_set after this is merged in.

Testing:

I had to modify some unit tests that were asserting on old behaviors.

Technically this should all be covered by existing unit tests, but I couldn't find a unit test enforcing that adding more than S2N_MAX_TICKET_KEYS fails, so I added one.

I also had added a test confirming the memory leak, which showed that more than 500 ticket hashes can be found on a config. That is not included in the PR since we no longer store ticket hashes at all 🙂

Benchmarks

This PR actually makes it less expensive to add STEKs to config despite the new linear scan. Hypothesis: the simple comparison is relatively inexpensive relative to key derivation/hashing.

Stek Addition/config add 1 key to 0 keys
                        time:   [1.4064 µs 1.4077 µs 1.4091 µs]
                        change: [-9.5961% -9.1370% -8.6922%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking Stek Addition/config add 1 key to 47 keys: Collecting 100 samples in es
Stek Addition/config add 1 key to 47 keys
                        time:   [2.0392 µs 2.0409 µs 2.0425 µs]
                        change: [-8.4058% -8.1100% -7.7137%] (p = 0.00 < 0.05)
                        Performance has improved.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@github-actions github-actions bot added the s2n-core team label Feb 26, 2025
* update grep simple mistakes after memcmp removal
* fix shadow variable warning
* clang format header define
@jmayclin jmayclin requested a review from dougch as a code owner February 26, 2025 03:18
Comment on lines -136 to +135
struct s2n_set *ticket_keys;
struct s2n_set *ticket_key_hashes;
struct s2n_array *ticket_keys;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's several places in the code that already use s2n_set_get to iterate over the full set. Should this PR just remove ticket_key_hashes, and not yet swap the type of ticket_keys?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as long as it's not too hard to review, I like the shift to array because it more accurately describes how we are using the data structure now.

But the diff would definitely be smaller if we broke it out into two PRs. 🤷 . Lmk what you think!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lean towards breaking it up into two PRs, since you're already going to have a follow-up PR where you remove the set implementation. That seems like a more logical division to me: this PR only fixes the memory leak, that PR removes set (and part of removing set is removing uses of set).

But it's not a blocker for me.

@jmayclin jmayclin requested a review from lrstewart February 26, 2025 21:32
* check name and secret uniqueness in same place
@jmayclin jmayclin requested a review from goatgoose February 27, 2025 19:55
@jmayclin jmayclin added this pull request to the merge queue Feb 28, 2025
Merged via the queue into aws:main with commit 161deb1 Feb 28, 2025
46 checks passed
@jmayclin jmayclin deleted the fix-mem-leak-pr branch February 28, 2025 19:10
dougch pushed a commit to dougch/s2n-tls that referenced this pull request Mar 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants