-
Notifications
You must be signed in to change notification settings - Fork 35
Segfault in release mode when reading an enum with a smartstring in one of the variants #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've looked a bit closer at the source for smartstring, and my guess is that you simply cannot have a datatype that requires reading from a MaybeUninit, ever. I'm pretty sure this is a case of UB where simply having an instruction that loads from uninitialised memory in a code path that's meant to be unused causes errors, because the compiler has neglected to skip over it having entitled itself to assume it will do nothing. Rustc sees this as a place where anything could happen, and anything does happen. I probably should have read the code before trying to use it, but there are alternatives, like a String-like wrapper around a |
Heh, funny, I used |
Yeah, the other thing that's funny is I reviewed all my code to rule out my own errors and discovered a couple of potentially unsound Eq implementations along the way, so it was a good thing I gave it a go! I would like to know more about what exactly rustc is sending to LLVM to get this behaviour. Maybe I should look at the IR output... I wasn't aware there was such a thing as memory that's so explicitly uninitialised that LLVM will read from it even if no such branch in the program is meant to be taken. My understanding was that every bit of memory has either been written into or not, and LLVM shouldn't be taking branches based on reads that the code doesn't tell it to do. Like, a simple let bigsmall = black_box(Enum::Small);
let string = match bigsmall {
Small => String::new(),
Big(s) => s.to_owned(),
} I think there's more to the story. But I'm also inexplicably confident the problem will go away when you take out the |
Ah... closing, because I managed to make it segfault with String as well. It's all on my end after all. Sorry for plastering this all over your nice crate! |
Oh, that's oddly disappointing, I was almost looking forward to a good UB bug hunt. |
Then I’ll be sure to bring my A game next time. |
@bodil It turned out to be rustc, in the end. rust-lang/rust#77359 |
I'm in the process of refactoring a medium codebase to use smartstring. I'm getting a segfault when reading an enum containing a smarstring as a variant, only in release mode. The invalid address I've seen is is 0x0, which is discovered in my libc's _platform_memmove when cloning the enum (or trying to fmt::Debug it). Weirdly enough, I have observed writing other enum variants, and then reading the one with a smartstring in it instead.
Backtrace:
Here's frame 9, showing that cast() gave us Boxed:
It is called from the derived clone impl of this enum:
I would note, out of ~900 tests that my code is running, all of which exercise this code, only two of them segfault, and they segfault every time in the same way. These tests are over here; a piece of EdgeData is created for a single element like
<label variable="locator">
appearing in one of those files. I thought it was something to do with the content (both having a lot of Greek unicode), but I condensed the test case into a bare minimum repro that didn't produce any SmartStrings at all! The code segfaults when the only EdgeData instance in the entire execution anEdgeData::LocatorLabel
, which I have verified with logging in debug mode. And this program pretty directly creates these edgedata, and then immediately reads them, there is no opportunity for them to be corrupted in between.So I don't lose it, the minimal repro, down from bugreports_GreekStyleProblems.txt, is:
I added some logging just after calling the function where
EdgeData::LocatorLabel
is created, and had to change it towrite!(a_logfile, "{:?}", refir)
because the debug statement caused a segfault. (Progress!) Here's the output:That carriage return symbol is my terminal saying the file ends without a newline, because it segfaulted in the same way while formatting the
EdgeData::LocatorLabel
. I modified my minimal case to try the other variants instead, and couldn't get them to segfault, but I also found LocatorLabel wouldn't segfault if it appeared in the test case in a different context... it's all pretty fragile.Here's the code where the edgedata is returned as part of a larger type:
The Upshot
The upshot is, sometimes a SmartString in an enum will somehow overlap with other discriminants, such that an enum written with one variant will be read as a the SmartString variant. Then, you will be reading uninitialised memory and interpreting it as a SmartString (and hence trying to dereference an uninitialised pointer, or, I can imagine it reading uninitialised inline content). I have viewed the contents of the smartstring, and the data is pretty garbagey, but I need to look closer at it. I think this happens when Rust's release mode optimisation figures out a way to avoid copying the enum around by finding a slot for it, but I'm having a hard time coming up with a minimal code reproduction. I'm going to leave this report as is for now, but I'll try to get a commit + build instructions to repro tomorrow.
The big puzzles are how:
std::string::String
but does with smartstring. No smartstring code gets run until it is erroneously read.size_of::<EdgeData>()
is 32, as one would expect. How can the computer mess up accessing a discriminant field? Isn't this what they're best at? Perhaps rustc/llvm is trying to cheat and getting mixed up because smartstring hasn't quite described itself properly to the compiler... if anything, one would think that the MaybeUninit would rule out anything like that.The text was updated successfully, but these errors were encountered: