Skip to content

Commit

Permalink
Disable sigaltstack overriding in asan builds
Browse files Browse the repository at this point in the history
This commit is an attempt to fix a number of flaky crashes that we've
been seeing on OSS-Fuzz for some time now. These crashes only reproduce
under ASAN and even then have been spotty to reproduce. The current
thinking is that a test with threads (e.g. only `wast_tests` using some
of the threads spec tests) is required to run some wasm which will
register a `sigaltstack`. Destruction of this `sigaltstack` happens
with TLS destructors which seems to have a bad interaction with ASAN
state additionally being destroyed around that time.

This whole interaction means that no one test case is enough to
reproduce the corruption. Many crashes on OSS-Fuzz are likely due to
"some historical test case spawned a thread" which corrupted something
to crash later. The test case that I can reproduce with locally requires
rerunning it in the same process a few thousand times to get a
reproduction.

The purpose of the `sigaltstack` is to ensure that we have a big enough
stack, primarily in debug mode, for testing if a trap is wasm. The hope
is that this extra size of the Rust-standard-library-default's stack
size is not necessary in release mode with ASAN. In the end time will
tell with OSS-Fuzz to see if we can keep this or if we need to both
install a bigger sigaltstack in addition to managing them differently in
ASAN builds.
  • Loading branch information
alexcrichton committed Jan 15, 2025
1 parent 48f4621 commit 97b79d8
Showing 1 changed file with 53 additions and 0 deletions.
53 changes: 53 additions & 0 deletions crates/wasmtime/src/runtime/vm/sys/unix/signals.rs
Original file line number Diff line number Diff line change
Expand Up @@ -393,8 +393,61 @@ unsafe fn set_pc(cx: *mut libc::c_void, pc: usize, arg1: usize) {
/// always large enough for our signal handling code. Override it by creating
/// and registering our own alternate stack that is large enough and has a guard
/// page.
///
/// Note that one might reasonably ask why do this at all? Why not remove
/// `SA_ONSTACK` from our signal handlers entirely? The basic reason for that is
/// because we want to print a message on stack overflow. The Rust standard
/// library will print this message by default and by us overriding the
/// `SIGSEGV` handler above we're now sharing responsibility for that as well.
/// We must have `SA_ONSTACK` to even attempt to being able to printing this
/// message, and so we leave it turned on. Wasmtime will determine a stack
/// overflow fault isn't caused by wasm and then forward to libstd's signal
/// handler which will actually print-and-abort.
///
/// Another reasonable question might be why we need to increase the size of the
/// sigaltstack at all? This is something which we may want to reconsider in the
/// future. For now it helps keep debug builds working which consume more stack
/// when handling normal wasm out-of-bounds and faults. Perhaps in the future we
/// could optimize this more or maybe even do something clever like lazily
/// allocate the sigaltstack on the fault itself. (e.g. trampoline from a tiny
/// stack to the "big stack" during a wasm fault or something like that)
#[cold]
pub fn lazy_per_thread_init() {
// This is a load-bearing requirement to keep address-sanitizer working and
// prevent crashes during fuzzing. The general idea here is that we skip the
// sigaltstack setup below entirely on asan builds, aka fuzzing. The exact
// reason for this is not entirely known, but the closest guess we have at
// this time is something like:
//
// * ASAN builds intercept mmap/munmap to keep track of what's going on.
// * The sigaltstack below registers a TLS destructor for when the current
// thread exits to deallocate the stack.
// * ASAN looks to also have TLS destructors for its own internal state.
// * The current assumption is that the order of these TLS destructors can
// cause corruption in ASAN state where if we run after asan's destructor
// it may intercept munmap and then asan doesn't know it's been
// de-initialized yet.
//
// The reproduction of this involved a standalone project built with
// `-Zsanitizer=address` where internally it would spawn two threads. Each
// thread would build a "hello world" module and then one of the threads
// would execute a noop exported function. If this was run thousands of
// times in a loop in the same process it would eventually crash under asan.
//
// It's notably not quite so simple as frobbing TLS destructors. There's
// clearly something else going on with ASAN state internally which we don't
// fully understand at this time. An attempt to make a standalone C++
// reproduction, for example, was not successful. In lieu of that the best
// we have for now is to disable our custom and larger sigaltstack in asan
// builds.
//
// The exact source was
// https://gist.github.com/alexcrichton/6815a5d57a3c5ca94a8d816a9fcc91af for
// future reference if necessary.
if cfg!(asan) {
return;
}

// This thread local is purely used to register a `Stack` to get deallocated
// when the thread exists. Otherwise this function is only ever called at
// most once per-thread.
Expand Down

0 comments on commit 97b79d8

Please sign in to comment.