From 97b79d855b86cdf4c8ce80f24897fd5aab07dd9f Mon Sep 17 00:00:00 2001 From: Alex Crichton Date: Wed, 15 Jan 2025 12:33:56 -0800 Subject: [PATCH] Disable sigaltstack overriding in asan builds This commit is an attempt to fix a number of flaky crashes that we've been seeing on OSS-Fuzz for some time now. These crashes only reproduce under ASAN and even then have been spotty to reproduce. The current thinking is that a test with threads (e.g. only `wast_tests` using some of the threads spec tests) is required to run some wasm which will register a `sigaltstack`. Destruction of this `sigaltstack` happens with TLS destructors which seems to have a bad interaction with ASAN state additionally being destroyed around that time. This whole interaction means that no one test case is enough to reproduce the corruption. Many crashes on OSS-Fuzz are likely due to "some historical test case spawned a thread" which corrupted something to crash later. The test case that I can reproduce with locally requires rerunning it in the same process a few thousand times to get a reproduction. The purpose of the `sigaltstack` is to ensure that we have a big enough stack, primarily in debug mode, for testing if a trap is wasm. The hope is that this extra size of the Rust-standard-library-default's stack size is not necessary in release mode with ASAN. In the end time will tell with OSS-Fuzz to see if we can keep this or if we need to both install a bigger sigaltstack in addition to managing them differently in ASAN builds. --- .../src/runtime/vm/sys/unix/signals.rs | 53 +++++++++++++++++++ 1 file changed, 53 insertions(+) diff --git a/crates/wasmtime/src/runtime/vm/sys/unix/signals.rs b/crates/wasmtime/src/runtime/vm/sys/unix/signals.rs index 21d33c4d2ad4..8082aab4f5c6 100644 --- a/crates/wasmtime/src/runtime/vm/sys/unix/signals.rs +++ b/crates/wasmtime/src/runtime/vm/sys/unix/signals.rs @@ -393,8 +393,61 @@ unsafe fn set_pc(cx: *mut libc::c_void, pc: usize, arg1: usize) { /// always large enough for our signal handling code. Override it by creating /// and registering our own alternate stack that is large enough and has a guard /// page. +/// +/// Note that one might reasonably ask why do this at all? Why not remove +/// `SA_ONSTACK` from our signal handlers entirely? The basic reason for that is +/// because we want to print a message on stack overflow. The Rust standard +/// library will print this message by default and by us overriding the +/// `SIGSEGV` handler above we're now sharing responsibility for that as well. +/// We must have `SA_ONSTACK` to even attempt to being able to printing this +/// message, and so we leave it turned on. Wasmtime will determine a stack +/// overflow fault isn't caused by wasm and then forward to libstd's signal +/// handler which will actually print-and-abort. +/// +/// Another reasonable question might be why we need to increase the size of the +/// sigaltstack at all? This is something which we may want to reconsider in the +/// future. For now it helps keep debug builds working which consume more stack +/// when handling normal wasm out-of-bounds and faults. Perhaps in the future we +/// could optimize this more or maybe even do something clever like lazily +/// allocate the sigaltstack on the fault itself. (e.g. trampoline from a tiny +/// stack to the "big stack" during a wasm fault or something like that) #[cold] pub fn lazy_per_thread_init() { + // This is a load-bearing requirement to keep address-sanitizer working and + // prevent crashes during fuzzing. The general idea here is that we skip the + // sigaltstack setup below entirely on asan builds, aka fuzzing. The exact + // reason for this is not entirely known, but the closest guess we have at + // this time is something like: + // + // * ASAN builds intercept mmap/munmap to keep track of what's going on. + // * The sigaltstack below registers a TLS destructor for when the current + // thread exits to deallocate the stack. + // * ASAN looks to also have TLS destructors for its own internal state. + // * The current assumption is that the order of these TLS destructors can + // cause corruption in ASAN state where if we run after asan's destructor + // it may intercept munmap and then asan doesn't know it's been + // de-initialized yet. + // + // The reproduction of this involved a standalone project built with + // `-Zsanitizer=address` where internally it would spawn two threads. Each + // thread would build a "hello world" module and then one of the threads + // would execute a noop exported function. If this was run thousands of + // times in a loop in the same process it would eventually crash under asan. + // + // It's notably not quite so simple as frobbing TLS destructors. There's + // clearly something else going on with ASAN state internally which we don't + // fully understand at this time. An attempt to make a standalone C++ + // reproduction, for example, was not successful. In lieu of that the best + // we have for now is to disable our custom and larger sigaltstack in asan + // builds. + // + // The exact source was + // https://gist.github.com/alexcrichton/6815a5d57a3c5ca94a8d816a9fcc91af for + // future reference if necessary. + if cfg!(asan) { + return; + } + // This thread local is purely used to register a `Stack` to get deallocated // when the thread exists. Otherwise this function is only ever called at // most once per-thread.