Skip to content

Segfault in release mode when reading an enum with a smartstring in one of the variants #13

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cormacrelf opened this issue Sep 24, 2020 · 7 comments

Comments

@cormacrelf
Copy link

cormacrelf commented Sep 24, 2020

  • rustc 1.48.0-nightly (a1947b3f9 2020-09-10) on macOS 10.14.6 64-bit
  • smartstring 0.2.5`

I'm in the process of refactoring a medium codebase to use smartstring. I'm getting a segfault when reading an enum containing a smarstring as a variant, only in release mode. The invalid address I've seen is is 0x0, which is discovered in my libc's _platform_memmove when cloning the enum (or trying to fmt::Debug it). Weirdly enough, I have observed writing other enum variants, and then reading the one with a smartstring in it instead.

Backtrace:

* thread #2, name = 'csl_test_suite::bugreports_GreekStyleProblems.txt', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00007fff6db1d650 libsystem_platform.dylib`_platform_memmove$VARIANT$Nehalem + 112
    frame #1: 0x0000000100014d3f suite-c6ec4dbd521a69d6`_$LT$alloc..string..String$u20$as$u20$core..clone..Clone$GT$::clone::h5d1f698cccbb1a5a [inlined] core::intrinsics::copy_nonoverlapping::h6dcf6051f7b35d2e at intrinsics.rs:1858:14 [opt]
    frame #2: 0x0000000100014d31 suite-c6ec4dbd521a69d6`_$LT$alloc..string..String$u20$as$u20$core..clone..Clone$GT$::clone::h5d1f698cccbb1a5a [inlined] alloc::vec::Vec$LT$T$GT$::append_elements::h9acd358ad7c793de at vec.rs:1268 [opt]
    frame #3: 0x0000000100014d1f suite-c6ec4dbd521a69d6`_$LT$alloc..string..String$u20$as$u20$core..clone..Clone$GT$::clone::h5d1f698cccbb1a5a [inlined] _$LT$alloc..vec..Vec$LT$T$GT$$u20$as$u20$alloc..vec..SpecExtend$LT$$RF$T$C$core..slice..Iter$LT$T$GT$$GT$$GT$::spec_extend::h0c2714dd53677232 at vec.rs:2410 [opt]
    frame #4: 0x0000000100014d1f suite-c6ec4dbd521a69d6`_$LT$alloc..string..String$u20$as$u20$core..clone..Clone$GT$::clone::h5d1f698cccbb1a5a [inlined] alloc::vec::Vec$LT$T$GT$::extend_from_slice::hc9db780e179c9492 at vec.rs:1592 [opt]
    frame #5: 0x0000000100014d1f suite-c6ec4dbd521a69d6`_$LT$alloc..string..String$u20$as$u20$core..clone..Clone$GT$::clone::h5d1f698cccbb1a5a [inlined] alloc::slice::hack::to_vec::h4416f72647053294 at slice.rs:156 [opt]
    frame #6: 0x0000000100014cb8 suite-c6ec4dbd521a69d6`_$LT$alloc..string..String$u20$as$u20$core..clone..Clone$GT$::clone::h5d1f698cccbb1a5a [inlined] alloc::slice::_$LT$impl$u20$$u5b$T$u5d$$GT$::to_vec::h20b2a9edbcf66ddc at slice.rs:391 [opt]
    frame #7: 0x0000000100014cb8 suite-c6ec4dbd521a69d6`_$LT$alloc..string..String$u20$as$u20$core..clone..Clone$GT$::clone::h5d1f698cccbb1a5a [inlined] _$LT$alloc..vec..Vec$LT$T$GT$$u20$as$u20$core..clone..Clone$GT$::clone::hb8bb0739bf3e72a0 at vec.rs:1938 [opt]
    frame #8: 0x0000000100014cb1 suite-c6ec4dbd521a69d6`_$LT$alloc..string..String$u20$as$u20$core..clone..Clone$GT$::clone::h5d1f698cccbb1a5a at string.rs:1679 [opt]
    frame #9: 0x0000000100234f5e suite-c6ec4dbd521a69d6`citeproc_proc::disamb::add_to_graph::h2bf47a2062ed62b4 + 2126
<snip, user code>

Here's frame 9, showing that cast() gave us Boxed:

frame #9: 0x000000010023684e suite-e27366cf4d0ebb7f`citeproc_proc::disamb::add_to_graph::hdd46c801cda0b1e8 at lib.rs:235:59 [opt]
   232      /// [Copy]: https://doc.rust-lang.org/std/marker/trait.Copy.html
   233      fn clone(&self) -> Self {
   234          match self.cast() {
-> 235              StringCast::Boxed(string) => Self::from_boxed(string.string().clone().into()),
   236              StringCast::Inline(string) => Self::from_inline(*string),
   237          }
   238      }

It is called from the derived clone impl of this enum:

#[derive(Clone, Debug, PartialEq, Eq)]
pub enum EdgeData {
    Output(SmartString<LazyCompact>),
    Locator,
    LocatorLabel,
    YearSuffix,
    YearSuffixExplicit,
    YearSuffixPlain,
    CitationNumber,
    CitationNumberLabel,
    Frnn,
    FrnnLabel,
    Accessed,
}

I would note, out of ~900 tests that my code is running, all of which exercise this code, only two of them segfault, and they segfault every time in the same way. These tests are over here; a piece of EdgeData is created for a single element like <label variable="locator"> appearing in one of those files. I thought it was something to do with the content (both having a lot of Greek unicode), but I condensed the test case into a bare minimum repro that didn't produce any SmartStrings at all! The code segfaults when the only EdgeData instance in the entire execution an EdgeData::LocatorLabel, which I have verified with logging in debug mode. And this program pretty directly creates these edgedata, and then immediately reads them, there is no opportunity for them to be corrupted in between.

So I don't lose it, the minimal repro, down from bugreports_GreekStyleProblems.txt, is:

<style class="note" version="1.0">
  <macro name="a"> <label variable="locator"/> </macro>
  <citation><layout> <text macro="a"/> </layout></citation>
</style>

I added some logging just after calling the function where EdgeData::LocatorLabel is created, and had to change it to write!(a_logfile, "{:?}", refir) because the debug statement caused a segfault. (Progress!) Here's the output:

Edge(None)
Edge(None)
Edge(Some(⏎ 

That carriage return symbol is my terminal saying the file ends without a newline, because it segfaulted in the same way while formatting the EdgeData::LocatorLabel. I modified my minimal case to try the other variants instead, and couldn't get them to segfault, but I also found LocatorLabel wouldn't segfault if it appeared in the test case in a different context... it's all pretty fragile.

Here's the code where the edgedata is returned as part of a larger type:

    let mut contents = Vec::with_capacity(els.len());
    let mut overall_gv = GroupVars::new();

    for el in els {
        // the EdgeData is constructed in ref_ir(...)
        let (ir, gv) =
            Disambiguation::<Markup>::ref_ir(el, db, ctx, state, formatting.unwrap_or_default());
        // <snip> manual logging code that segfaults goes here
        match ir {
            RefIR::Edge(None) => {
                overall_gv = overall_gv.neighbour(gv);
            }
            _ => {
                contents.push(ir);
                overall_gv = overall_gv.neighbour(gv)
            }
        }
    }

The Upshot

The upshot is, sometimes a SmartString in an enum will somehow overlap with other discriminants, such that an enum written with one variant will be read as a the SmartString variant. Then, you will be reading uninitialised memory and interpreting it as a SmartString (and hence trying to dereference an uninitialised pointer, or, I can imagine it reading uninitialised inline content). I have viewed the contents of the smartstring, and the data is pretty garbagey, but I need to look closer at it. I think this happens when Rust's release mode optimisation figures out a way to avoid copying the enum around by finding a slot for it, but I'm having a hard time coming up with a minimal code reproduction. I'm going to leave this report as is for now, but I'll try to get a commit + build instructions to repro tomorrow.

The big puzzles are how:

  • This program does not segfault with std::string::String but does with smartstring. No smartstring code gets run until it is erroneously read.
  • An enum discriminant is being written or read wrong (is this perhaps a rustc bug?) I checked and size_of::<EdgeData>() is 32, as one would expect. How can the computer mess up accessing a discriminant field? Isn't this what they're best at? Perhaps rustc/llvm is trying to cheat and getting mixed up because smartstring hasn't quite described itself properly to the compiler... if anything, one would think that the MaybeUninit would rule out anything like that.
@cormacrelf
Copy link
Author

I've looked a bit closer at the source for smartstring, and my guess is that you simply cannot have a datatype that requires reading from a MaybeUninit, ever. I'm pretty sure this is a case of UB where simply having an instruction that loads from uninitialised memory in a code path that's meant to be unused causes errors, because the compiler has neglected to skip over it having entitled itself to assume it will do nothing. Rustc sees this as a place where anything could happen, and anything does happen. I probably should have read the code before trying to use it, but there are alternatives, like a String-like wrapper around a SmallVec<[u8; some constant]>.

@bodil
Copy link
Owner

bodil commented Sep 25, 2020

Heh, funny, I used MaybeUninit there originally with the expectation that it would reduce the incidence of UB. It's easy enough to replace it, though, especially with the body of test cases you've detailed above. This is another excellent reminder that we should all really be running CI in release mode, too.

@cormacrelf
Copy link
Author

Yeah, the other thing that's funny is I reviewed all my code to rule out my own errors and discovered a couple of potentially unsound Eq implementations along the way, so it was a good thing I gave it a go!

I would like to know more about what exactly rustc is sending to LLVM to get this behaviour. Maybe I should look at the IR output... I wasn't aware there was such a thing as memory that's so explicitly uninitialised that LLVM will read from it even if no such branch in the program is meant to be taken. My understanding was that every bit of memory has either been written into or not, and LLVM shouldn't be taking branches based on reads that the code doesn't tell it to do. Like, a simple enum Enum<'a> { Big(&'a str), Small } has uninitialised memory when you create an Enum::Small, right? But you don't see LLVM reading data from the array and branching on that basis. Like the following code doesn't segfault, obviously, but what exactly is different about having a MaybeUninit in there instead of a reference? The branch with UB in it should never be taken!

let bigsmall = black_box(Enum::Small);
let string = match bigsmall {
    Small => String::new(),
    Big(s) => s.to_owned(),
}

I think there's more to the story. But I'm also inexplicably confident the problem will go away when you take out the MaybeUninit. I'm tempted to ask one of the big shot memory safety people to give us a hint. After I've had a closer look and maybe run Miri, perhaps.

@cormacrelf
Copy link
Author

Ah... closing, because I managed to make it segfault with String as well. It's all on my end after all. Sorry for plastering this all over your nice crate!

@bodil
Copy link
Owner

bodil commented Sep 26, 2020

Oh, that's oddly disappointing, I was almost looking forward to a good UB bug hunt.

@cormacrelf
Copy link
Author

Then I’ll be sure to bring my A game next time.

@cormacrelf
Copy link
Author

@bodil It turned out to be rustc, in the end. rust-lang/rust#77359

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants