fix: memory leak during STEK rotation #5146

jmayclin · 2025-02-26T03:09:03Z

Release Summary:

A small memory leak related to session resumption was resolved. Long lived applications with session resumption enabled will see a reduction in the memory footprint of s2n_config.

Description of changes:

For each STEK that was added to a config, we hashed it and stored the hash in config->ticket_key_hashes. This was intended to have a maximum size of 500, but a typo in the length check resulted in us checking the length of ticket_keys instead of ticket_key_hashes. Therefore ticket_key_hashes would continue to grow as new STEKs were added to the config.

This PR totally removes the ticket key hashes set. Instead we enforce secret uniqueness through a linear scan of the existing keys. Performance wise, this is acceptable because S2N_MAX_TICKET_KEYS is 48.

external behavior changes

previous: STEK secret material must be unique across all STEKs that have ever been set on the config
proposed: STEK secret material must be unique across all unexpired STEKs currently on the config

internal behavior changes

previous: STEKs in config->ticket_keys are sorted based on the intro_time parameter
proposed: STEKs in config->tickets_keys are not sorted. They happen to be in "most recently added" order.

I was suspicious that this internal behavior change wouldn't break things in odd ways, so here is the little audit I did

removing steks: s2n_config_wipe_expired_ticket_crypto_keys does not short circuit, but iterates through all keys selecting the ones that are no longer valid. So this behavior doesn't break.
selecting stek for encryption: s2n_get_ticket_encrypt_decrypt_key does not short circuit when gathering the indices of the steks available for encryption. So this behavior doesn't break.
selecting stek for decryption: This is done by key name, so it makes sense that we aren't relying on the order of the ticket keys.

Call-outs:

This PR will remove the only usages of s2n_set in the codebase, so we can rip out s2n_set after this is merged in.

Testing:

I had to modify some unit tests that were asserting on old behaviors.

Technically this should all be covered by existing unit tests, but I couldn't find a unit test enforcing that adding more than S2N_MAX_TICKET_KEYS fails, so I added one.

I also had added a test confirming the memory leak, which showed that more than 500 ticket hashes can be found on a config. That is not included in the PR since we no longer store ticket hashes at all 🙂

Benchmarks

This PR actually makes it less expensive to add STEKs to config despite the new linear scan. Hypothesis: the simple comparison is relatively inexpensive relative to key derivation/hashing.

Stek Addition/config add 1 key to 0 keys
                        time:   [1.4064 µs 1.4077 µs 1.4091 µs]
                        change: [-9.5961% -9.1370% -8.6922%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking Stek Addition/config add 1 key to 47 keys: Collecting 100 samples in es
Stek Addition/config add 1 key to 47 keys
                        time:   [2.0392 µs 2.0409 µs 2.0425 µs]
                        change: [-8.4058% -8.1100% -7.7137%] (p = 0.00 < 0.05)
                        Performance has improved.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

* update grep simple mistakes after memcmp removal * fix shadow variable warning * clang format header define

tls/s2n_resume.c

lrstewart · 2025-02-26T18:04:52Z

tls/s2n_config.h

-    struct s2n_set *ticket_keys;
-    struct s2n_set *ticket_key_hashes;
+    struct s2n_array *ticket_keys;


There's several places in the code that already use s2n_set_get to iterate over the full set. Should this PR just remove ticket_key_hashes, and not yet swap the type of ticket_keys?

I think as long as it's not too hard to review, I like the shift to array because it more accurately describes how we are using the data structure now.

But the diff would definitely be smaller if we broke it out into two PRs. 🤷 . Lmk what you think!

I lean towards breaking it up into two PRs, since you're already going to have a follow-up PR where you remove the set implementation. That seems like a more logical division to me: this PR only fixes the memory leak, that PR removes set (and part of removing set is removing uses of set).

But it's not a blocker for me.

tls/s2n_resume.c

tests/unit/s2n_session_ticket_test.c

* check name and secret uniqueness in same place

fix: memory leak during STEK rotation

43d0aed

jmayclin requested review from goatgoose and lrstewart February 26, 2025 03:09

github-actions bot added the s2n-core team label Feb 26, 2025

address ci failures

35f3265

* update grep simple mistakes after memcmp removal * fix shadow variable warning * clang format header define

jmayclin requested a review from dougch as a code owner February 26, 2025 03:18

lrstewart reviewed Feb 26, 2025

View reviewed changes

jmayclin requested a review from lrstewart February 26, 2025 21:32

lrstewart approved these changes Feb 26, 2025

View reviewed changes

tls/s2n_resume.c Outdated Show resolved Hide resolved

goatgoose reviewed Feb 27, 2025

View reviewed changes

tests/unit/s2n_session_ticket_test.c Show resolved Hide resolved

address pr feedback

28fdf02

* check name and secret uniqueness in same place

jmayclin requested a review from goatgoose February 27, 2025 19:55

goatgoose approved these changes Feb 27, 2025

View reviewed changes

dougch approved these changes Feb 27, 2025

View reviewed changes

jmayclin added this pull request to the merge queue Feb 28, 2025

Merged via the queue into aws:main with commit 161deb1 Feb 28, 2025
46 checks passed

jmayclin deleted the fix-mem-leak-pr branch February 28, 2025 19:10

jmayclin mentioned this pull request Feb 28, 2025

chore: deprecate s2n_set #5155

Open

dougch pushed a commit to dougch/s2n-tls that referenced this pull request Mar 3, 2025

fix: memory leak during STEK rotation (aws#5146)

d7489dd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: memory leak during STEK rotation #5146

fix: memory leak during STEK rotation #5146

jmayclin commented Feb 26, 2025 •

edited

Loading

lrstewart Feb 26, 2025

jmayclin Feb 26, 2025

lrstewart Feb 26, 2025

fix: memory leak during STEK rotation #5146

fix: memory leak during STEK rotation #5146

Conversation

jmayclin commented Feb 26, 2025 • edited Loading

Release Summary:

Description of changes:

external behavior changes

internal behavior changes

Call-outs:

Testing:

Benchmarks

lrstewart Feb 26, 2025

Choose a reason for hiding this comment

jmayclin Feb 26, 2025

Choose a reason for hiding this comment

lrstewart Feb 26, 2025

Choose a reason for hiding this comment

jmayclin commented Feb 26, 2025 •

edited

Loading