Skip to content

Commit f11f749

Browse files
committed
materialized: attempt to produce nice backtraces on stack overflow
Rust produces bad error messages on stack overflow, like "thread 'foo' has overflowed its stack" which provides very little insight into where the recursion that caused the stack to overflow occurred. See rust-lang/rust#51405 for details. This commit adds a SIGSEGV handler that attempts to print a backtrace, following the approach in the backtrace-on-stack-overflow crate. I copied the code from that crate into Materialize and tweaked it because it's a very small amount of code that we'll likely need to modify, and I wanted to improve its error handling. In my manual testing this produces a nice backtrace when Materialize overflows its stack.
1 parent 6a2c77a commit f11f749

File tree

4 files changed

+89
-2
lines changed

4 files changed

+89
-2
lines changed

Cargo.lock

Lines changed: 4 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

src/materialized/Cargo.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,8 +48,10 @@ include_dir = "0.6.0"
4848
itertools = "0.9.0"
4949
krb5-src = { version = "0.2.3", features = ["binaries"] }
5050
lazy_static = "1.4.0"
51+
libc = "0.2.84"
5152
log = "0.4.13"
5253
mz-process-collector = { path = "../mz-process-collector" }
54+
nix = "0.19.1"
5355
num_cpus = "1.0.0"
5456
openssl = { version = "0.10.32", features = ["vendored"] }
5557
openssl-sys = { version = "0.9.59", features = ["vendored"] }

src/materialized/src/bin/materialized/main.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -213,6 +213,7 @@ fn main() {
213213

214214
fn run(args: Args) -> Result<(), anyhow::Error> {
215215
panic::set_hook(Box::new(handle_panic));
216+
sys::enable_sigsegv_backtraces()?;
216217

217218
if args.version > 0 {
218219
println!("materialized {}", materialized::BUILD_INFO.human_version());

src/materialized/src/bin/materialized/sys.rs

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,13 @@
99

1010
//! System support functions.
1111
12+
use std::alloc::{self, Layout};
13+
use std::ptr;
14+
15+
use anyhow::{bail, Context};
1216
use log::{trace, warn};
17+
use nix::errno;
18+
use nix::sys::signal;
1319

1420
#[cfg(not(any(target_os = "macos", target_os = "linux", target_os = "ios")))]
1521
pub fn adjust_rlimits() {
@@ -90,3 +96,79 @@ pub fn adjust_rlimits() {
9096
)
9197
}
9298
}
99+
100+
/// Attempts to enable backtraces when SIGSEGV occurs.
101+
///
102+
/// In particular, this means producing backtraces on stack overflow, as
103+
/// stack overflow raises SIGSEGV. The approach here involves making system
104+
/// calls to handle SIGSEGV on an alternate signal stack, which seems to work
105+
/// well in practice but may technically be undefined behavior.
106+
///
107+
/// Rust may someday do this by default.
108+
/// Follow: https://github.com/rust-lang/rust/issues/51405.
109+
pub fn enable_sigsegv_backtraces() -> Result<(), anyhow::Error> {
110+
// This code is derived from the code in the backtrace-on-stack-overflow
111+
// crate, which is freely available under the terms of the Apache 2.0
112+
// license. The modifications here provide better error messages if any of
113+
// the various system calls fail.
114+
//
115+
// See: https://github.com/matklad/backtrace-on-stack-overflow
116+
117+
// NOTE(benesch): The stack size was chosen to match the default Rust thread
118+
// stack size of 2MiB. Probably overkill, but we'd much rather have
119+
// backtraces on stack overflow than squabble over a few megabytes. Using
120+
// libc::SIGSTKSZ is tempting, but its default of 8KiB on my system makes me
121+
// nervous. Rust code isn't used to running with a stack that small.
122+
const STACK_SIZE: usize = 2 << 20;
123+
124+
// x86_64 and aarch64 require 16-byte alignment. Its hard to imagine other
125+
// platforms that would have more stringent requirements.
126+
const STACK_ALIGN: usize = 16;
127+
128+
// Allocate a stack.
129+
let buf_layout =
130+
Layout::from_size_align(STACK_SIZE, STACK_ALIGN).expect("layout known to be valid");
131+
// SAFETY: layout has non-zero size and the uninitialized memory that is
132+
// returned is never read (at least, not by Rust).
133+
let buf = unsafe { alloc::alloc(buf_layout) };
134+
135+
// Request that signals be delivered to this alternate stack.
136+
let stack = libc::stack_t {
137+
ss_sp: buf as *mut libc::c_void,
138+
ss_flags: 0,
139+
ss_size: STACK_SIZE,
140+
};
141+
// SAFETY: `stack` is a valid pointer to a `stack_t` object and the second
142+
// parameter, `old_ss`, is permitted to be `NULL` according to POSIX.
143+
let ret = unsafe { libc::sigaltstack(&stack, ptr::null_mut()) };
144+
if ret == -1 {
145+
let errno = errno::from_i32(errno::errno());
146+
bail!("failed to configure alternate signal stack: {}", errno);
147+
}
148+
149+
// Install a handler for SIGSEGV.
150+
let action = signal::SigAction::new(
151+
signal::SigHandler::Handler(handle_sigsegv),
152+
signal::SaFlags::SA_NODEFER | signal::SaFlags::SA_ONSTACK,
153+
signal::SigSet::empty(),
154+
);
155+
// SAFETY: see `handle_sigsegv`.
156+
unsafe { signal::sigaction(signal::SIGSEGV, &action) }
157+
.context("failed to install SIGSEGV handler")?;
158+
159+
Ok(())
160+
}
161+
162+
extern "C" fn handle_sigsegv(_: i32) {
163+
// SAFETY: this is is a signal handler function and technically must be
164+
// "async-signal safe" [0]. That typically means no memory allocation, which
165+
// means no panics or backtraces... but if we're here, we're already doomed
166+
// by a segfault. So there is little harm to ignoring the rules and
167+
// panicking. If we're successful, as we often are, the panic will be caught
168+
// by our panic handler and displayed nicely with a backtrace that traces
169+
// *through* the signal handler and includes the frames that led to the
170+
// SIGSEGV.
171+
//
172+
// [0]: https://man7.org/linux/man-pages/man7/signal-safety.7.html
173+
panic!("stack overflow");
174+
}

0 commit comments

Comments
 (0)