ZTC-1648: Avoid heap profiling crash by eagerly starting long-lived profiling thread #54

OmegaJak · 2024-07-17T22:57:50Z

Closes #46

Rather than fully removing seccomp setup in the memory profiler, which would require adding new syscalls to the whole application's configuration, the memory profiler now spawns a long-lived thread during startup, before global seccomp is initialized. This thread no longer sets up seccomp for itself.

…ore seccomp is configured

foundations/src/telemetry/memory_profiler.rs

inikulin · 2024-07-18T07:00:45Z

foundations/src/telemetry/memory_profiler.rs

@@ -256,6 +302,30 @@ mod tests {
        .is_err());
    }

+    #[tokio::test]
+    async fn profile_heap_with_profiling_sandboxed_after_previous_seccomp_init() {
+        let profiler = MemoryProfiler::get_or_init_with(&MemoryProfilerSettings {


As far as I can tell this will still fail as MemoryProfiler will usually be initialised in production on a first request to telemetry server for a heap profile and that happens when the whole app is spun up and seccomp is initialised on the main thread.

It seems the way to solve that is to start the profiling thread in telemetry::init that is recommended to be called before seccomp init on the main thread (though, it's not reflected in docs, but shown in the example - probably we should update the docs here).

Task sender for the thread will be stored in a global var (like the profiler itself), so once we have profiler initialized, it can tell the thread to collect a profile via a global sender. Additionally, we can put the sender under a mutex, this way we can remove PROFILING_IN_PROGRESS_LOCK.

With my changes, the MemoryProfiler is initialized when telemetry::server::init is called, which is called by telemetry::init. I'll make sure the recommendation to call telemetry::init before seccomp setup is in the docs.

Regarding the locking -- if the profiler itself is stored in a global variable, and the only way to get the profiler is through the global variable, I'm not sure I see the point of pulling the sender into its own global variable rather than letting it continue living inside the profiler.

As far as the PROFILING_IN_PROGRESS_LOCK goes, looking at it now, I actually don't think we need that any more. If multiple requests come in at the same time, they'll simply queue up and each be processed in turn by the profiling thread's loop. I'll look into removing that altogether.

I've added docs and removed the PROFILING_IN_PROGRESS_LOCK. As a sanity check, I used apache bench to hit a local version of a service using this branch, making 10k heap profile requests with a concurrency level of 100. High concurrency meant high response times of course, but there were no failures.

Apache Bench Output

$ ab -n 10000 -c 100 -l localhost:7800/pprof/heap This is ApacheBench, Version 2.3 <$Revision: 1879490 $> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Licensed to The Apache Software Foundation, http://www.apache.org/ Benchmarking localhost (be patient) Completed 1000 requests Completed 2000 requests Completed 3000 requests Completed 4000 requests Completed 5000 requests Completed 6000 requests Completed 7000 requests Completed 8000 requests Completed 9000 requests Completed 10000 requests Finished 10000 requests Server Software: Server Hostname: localhost Server Port: 7800 Document Path: /pprof/heap Document Length: Variable Concurrency Level: 100 Time taken for tests: 1.236 seconds Complete requests: 10000 Failed requests: 0 Total transferred: 88918063 bytes HTML transferred: 87658063 bytes Requests per second: 8091.14 [#/sec] (mean) Time per request: 12.359 [ms] (mean) Time per request: 0.124 [ms] (mean, across all concurrent requests) Transfer rate: 70258.63 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect: 0 0 0.1 0 1 Processing: 1 12 3.1 11 20 Waiting: 1 12 3.1 11 20 Total: 2 12 3.1 11 20 Percentage of the requests served within a certain time (ms) 50% 11 66% 13 75% 14 80% 15 90% 17 95% 19 98% 19 99% 19 100% 20 (longest request)

Even though having multiple profiles can be fine in artificial benchmarks, we don't want to run multiple profiles at a time on a heavily loaded server as it: 1) can have effect on the performance of the server; 2) due to 1 can produce skewed results for other profiles, e.g. timing discrepancy can cause pacing issues and introduce additional allocations; 3) due to the way how our profiling pipeline is built (i.e. profiles are available globally once they collected) we want to avoid doing the same job multiple times and just allow others grab a profile that was already collected by someone else at the time period when they requested theirs.

With my changes, the MemoryProfiler is initialized when telemetry::server::init is called,

I see. Let's also add docs for those who might use the profiler programmatically outside of telemetry server that if they have seccomp enabled it's recommended to init the profiler before that.

we don't want to run multiple profiles at a time on a heavily loaded server

We don't run multiple profiles at a time. Requests enter into a queue which the profiling thread processes one at a time.

due to the way how our profiling pipeline is built (i.e. profiles are available globally once they collected) we want to avoid doing the same job multiple times and just allow others grab a profile that was already collected by someone else at the time period when they requested theirs

Sure, we could do something like "if we receive profiling request A, and then request B comes in while we're still gathering A's, then return to B what we returned to A". But, that seems like a premature optimization with unneeded complexity to me.

If you're opposed to the queuing functionality and want to match how it worked before my changes, we could avoid it by reintroducing a lock/mutex around the sender and returning an error if we can't immediately acquire the lock. While that does avoid possible server overload due to profiling requests with a sort of rate limiting of "only one at a time", it feels like an unnecessary limiting of the server's functionality. If we do that, and user B requests a profile while user A's is still running, user B will just get an error and have to try again, instead of (as it currently is) getting their profile result a few milliseconds later when A's is done.

We don't run multiple profiles at a time. Requests enter into a queue which the profiling thread processes one at a time.

Don't really want to allow queuing profiles: we either need to have a sophisticated way to control and monitor the queue (e.g. metrics for queue, duration, a way to cancel the queue, etc.) or just keep the things simple as they were.

Some of the applications are extremely loaded and hypersensitive to performance changes and at the same time anyone really can request a profile. And I imagine people erroneously queuing profiles multiple times. And the only way we can abort such a queue is only via a service restart.

Then, also, if people request profiles, they probably run a certain experiment: making requests/connections, etc. So, the timing is important, running profile without a timing guarantee of when it will be run is not very useful. To make the matter worse, we don't even give any feedback signal to the requester whether profile will be run immediately or queued.

So, tl;dr, yes, let's re-introduce the lock.

Lock re-introduced.

foundations/src/telemetry/memory_profiler.rs

inikulin · 2024-07-22T11:57:02Z

foundations/src/telemetry/memory_profiler.rs

 pub struct MemoryProfiler {
    _seal: Seal,

-    #[cfg(feature = "security")]
-    sandbox_profiling_syscalls: bool,
+    request_heap_profile: mpsc::Sender<oneshot::Sender<anyhow::Result<String>>>,


We can now remove the Seal above as MemoryProfiler now always has one private field disregard the feature set and, thus, can't be constructed by the external code

Good point, thank you! Removed.

inikulin · 2024-07-22T11:59:25Z

foundations/src/telemetry/memory_profiler.rs

 pub struct MemoryProfiler {
    _seal: Seal,

-    #[cfg(feature = "security")]
-    sandbox_profiling_syscalls: bool,
+    request_heap_profile: mpsc::Sender<oneshot::Sender<anyhow::Result<String>>>,


Use crate::Result instead. anyhow::Result aliased BootstrapResult is used only for errors that eventually can terminate the process and are quite heavy on CPU and memory as they also contain stack traces. For operational errors, like this one use crate::Result

inikulin · 2024-07-22T11:59:48Z

foundations/src/telemetry/memory_profiler.rs

+
+    #[cfg(feature = "security")]
+    let sandbox_profiling_syscalls = settings.sandbox_profiling_syscalls;
+    std::thread::spawn(move || {


nit: blank line before the spawn call

inikulin · 2024-07-24T07:04:36Z

foundations/src/telemetry/memory_profiler.rs

+            common_syscall_allow_lists::{ASYNC, SERVICE_BASICS},
+            enable_syscall_sandboxing, ViolationAction,
+        },
+        telemetry::settings::MemoryProfilerSettings,


Nit: don't use nested imports. Instead, introduce a separate use for each non-leaf path, e.g.:

use crate::security::{allow_list, enable_syscall_sandboxing, ViolationAction}; use crate::security::common_syscall_allow_lists::{ASYNC, SERVICE_BASICS}; ...

Ah the joys of auto importing. Too bad rustfmt's imports_granularity = "Module" config isn't stabilized yet

OmegaJak · 2024-07-24T19:56:45Z

CI failure should be fixed with #57

This addresses two accidental breaking changes: 1. Introduced in #14: Config deserialization previously expected log level to be all caps (e.g., "TRACE", matching [slog](https://docs.rs/slog/latest/src/slog/lib.rs.html#2079)), but are now expected to be lowercase (since foundations adds the serde attribute `rename_all="snake_case"` to enums). So deserialization of existing configs would break 2. Introduced in #54: If an app has the "memory-profiling" feature enabled, but the settings disable memory profiling, server init would fail because the returned profiler would be None This also adds adds a step to CI that does a "dry run" of the example to validate the example config, to help catch breaking config changes going forward

ZTC-1648: Avoid heap profiling crash by starting profiling thread bef…

3977538

…ore seccomp is configured

OmegaJak commented Jul 17, 2024

View reviewed changes

foundations/src/telemetry/memory_profiler.rs Outdated Show resolved Hide resolved

foundations/src/telemetry/memory_profiler.rs Outdated Show resolved Hide resolved

foundations/src/telemetry/memory_profiler.rs Outdated Show resolved Hide resolved

Remove recv timeout

634ff4d

OmegaJak changed the title ~~ZTC-1648: Avoid heap profiling crash by starting profiling thread before seccomp is configured~~ ZTC-1648: Avoid heap profiling crash by eagerly starting long-lived profiling thread Jul 17, 2024

inikulin requested changes Jul 18, 2024

View reviewed changes

OmegaJak added 2 commits July 18, 2024 11:13

Cleanup and add docs about initialization order

4e38e56

Remove now-unnecessary profiling lock

e5a25ed

nmldiegues requested review from nmldiegues and norwoodj July 19, 2024 08:26

nmldiegues reviewed Jul 19, 2024

View reviewed changes

Code review improvements, fix builds

a94d8f4

OmegaJak requested a review from inikulin July 19, 2024 17:58

nmldiegues approved these changes Jul 20, 2024

View reviewed changes

inikulin requested changes Jul 22, 2024

View reviewed changes

OmegaJak added 4 commits July 22, 2024 14:40

Remove profiler sandboxing functionality

1757ced

More code review improvements

3abd233

Clippy

19c3933

Add blank line

e446b46

OmegaJak requested a review from inikulin July 22, 2024 20:43

inikulin requested changes Jul 24, 2024

View reviewed changes

Lock around the sender

e852a05

OmegaJak requested a review from inikulin July 24, 2024 19:54

OmegaJak mentioned this pull request Aug 6, 2024

OXY-1404: Avoid crashes resulting from double seccomp initialization #60

Merged

Merge branch 'main'

8891073

inikulin approved these changes Aug 13, 2024

View reviewed changes

inikulin merged commit d319361 into cloudflare:main Aug 13, 2024
17 checks passed

OmegaJak mentioned this pull request Oct 1, 2024

Fix accidental 4.0 breaking changes and run example in CI #68

Merged

ZTC-1648: Avoid heap profiling crash by eagerly starting long-lived profiling thread #54

ZTC-1648: Avoid heap profiling crash by eagerly starting long-lived profiling thread #54

Uh oh!

Conversation

OmegaJak commented Jul 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

inikulin Jul 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

OmegaJak Jul 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

inikulin Jul 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

OmegaJak commented Jul 24, 2024

Uh oh!

Uh oh!

Uh oh!

OmegaJak commented Jul 17, 2024 •

edited

Loading

inikulin Jul 18, 2024 •

edited

Loading

OmegaJak Jul 22, 2024 •

edited

Loading

inikulin Jul 24, 2024 •

edited

Loading