Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lexing a global variable will read beyond the end of the source given to the parser, causing an assertion failure #3345

Closed
lopopolo opened this issue Dec 25, 2024 · 5 comments · Fixed by #3365
Labels
bug Something isn't working

Comments

@lopopolo
Copy link

I've hooked up prism to a rudimentary libFuzzer harness using cargo-fuzz and got a crash:

$ cargo +nightly fuzz run prism -- -rss_limit_mb=512
    Finished `release` profile [optimized + debuginfo] target(s) in 0.00s
    Finished `release` profile [optimized + debuginfo] target(s) in 0.00s
     Running `fuzz/target/aarch64-apple-darwin/release/prism -artifact_prefix=/Users/lopopolo/dev/artichoke/prism-fuzz/fuzz/artifacts/prism/ -rss_limit_mb=512 /Users/lopopolo/dev/artichoke/prism-fuzz/fuzz/corpus/prism`
prism(38610,0x1ebdbf840) malloc: nano zone abandoned due to inability to reserve vm space.
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 1494678647
INFO: Loaded 1 modules   (1256 inline 8-bit counters): 1256 [0x104916ab0, 0x104916f98),
INFO: Loaded 1 PC tables (1256 PCs): 1256 [0x104916f98,0x10491be18),
INFO:        2 files found in /Users/lopopolo/dev/artichoke/prism-fuzz/fuzz/corpus/prism
INFO: -max_len is not provided; libFuzzer will not generate inputs larger than 4096 bytes
INFO: seed corpus: files: 2 min: 1b max: 1b total: 2b rss: 43Mb
#3      INITED cov: 23 ft: 23 corp: 2/2b exec/s: 0 rss: 44Mb
#1048576        pulse  cov: 23 ft: 23 corp: 2/2b lim: 4096 exec/s: 524288 rss: 464Mb
#2097152        pulse  cov: 23 ft: 23 corp: 2/2b lim: 4096 exec/s: 524288 rss: 468Mb
Assertion failed: (parser->current.end <= parser->end), function parser_lex, file prism.c, line 10614.
==38610== ERROR: libFuzzer: deadly signal
    #0 0x000104e340f0 in __sanitizer_print_stack_trace+0x28 (librustc-nightly_rt.asan.dylib:arm64+0x5c0f0)
    #1 0x0001047e6060 in fuzzer::PrintStackTrace()+0x30 (prism:arm64+0x100062060)
    #2 0x0001047d92c4 in fuzzer::Fuzzer::CrashCallback()+0x54 (prism:arm64+0x1000552c4)
    #3 0x000186ce0180 in _sigtramp+0x34 (libsystem_platform.dylib:arm64+0x4180)
    #4 0x000186caaf6c in pthread_kill+0x11c (libsystem_pthread.dylib:arm64+0x6f6c)
    #5 0x000186bb7904 in abort+0x7c (libsystem_c.dylib:arm64+0x79904)
    #6 0x000186bb6c18 in __assert_rtn+0x118 (libsystem_c.dylib:arm64+0x78c18)
    #7 0x000104899558 in parser_lex.cold.1+0x24 (prism:arm64+0x100115558)
    #8 0x00010479f150 in parser_lex+0x3938 (prism:arm64+0x10001b150)
    #9 0x0001047a7390 in parse_expression_prefix+0x1784 (prism:arm64+0x100023390)
    #10 0x0001047a571c in parse_expression+0x3c (prism:arm64+0x10002171c)
    #11 0x0001047abcfc in parse_expression_infix+0x174 (prism:arm64+0x100027cfc)
    #12 0x0001047a5840 in parse_expression+0x160 (prism:arm64+0x100021840)
    #13 0x0001047a204c in parse_statements+0x100 (prism:arm64+0x10001e04c)
    #14 0x00010479a0d8 in pm_parse+0x84 (prism:arm64+0x1000160d8)
    #15 0x00010479ae7c in pm_serialize_parse+0x5c (prism:arm64+0x100016e7c)
    #16 0x00010478e880 in prism::_::__libfuzzer_sys_run::hbc5539b82432c9aa prism.rs:19
    #17 0x00010478e140 in rust_fuzzer_test_input lib.rs:220
    #18 0x0001047d1e28 in std::panicking::try::do_call::hf152a0b9fc16cc89+0xc4 (prism:arm64+0x10004de28)
    #19 0x0001047d8524 in __rust_try+0x18 (prism:arm64+0x100054524)
    #20 0x0001047d786c in LLVMFuzzerTestOneInput+0x16c (prism:arm64+0x10005386c)
    #21 0x0001047dabbc in fuzzer::Fuzzer::ExecuteCallback(unsigned char const*, unsigned long)+0x150 (prism:arm64+0x100056bbc)
    #22 0x0001047da250 in fuzzer::Fuzzer::RunOne(unsigned char const*, unsigned long, bool, fuzzer::InputInfo*, bool, bool*)+0x48 (prism:arm64+0x100056250)
    #23 0x0001047dbcb8 in fuzzer::Fuzzer::MutateAndTestOne()+0x234 (prism:arm64+0x100057cb8)
    #24 0x0001047dcab8 in fuzzer::Fuzzer::Loop(std::__1::vector<fuzzer::SizedFile, std::__1::allocator<fuzzer::SizedFile>>&)+0x358 (prism:arm64+0x100058ab8)
    #25 0x0001047fd2b8 in fuzzer::FuzzerDriver(int*, char***, int (*)(unsigned char const*, unsigned long))+0x1c30 (prism:arm64+0x1000792b8)
    #26 0x00010480ac70 in main+0x24 (prism:arm64+0x100086c70)
    #27 0x000186928270  (<unknown module>)

NOTE: libFuzzer has rudimentary signal handlers.
      Combine libFuzzer with AddressSanitizer or similar for better crash reports.
SUMMARY: libFuzzer: deadly signal
MS: 5 CMP-ShuffleBytes-ShuffleBytes-ChangeBit-InsertRepeatedBytes- DE: "$-"-; base unit: 3a52ce780950d4d969792a2559cd519d7ee8c727
0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x6c,0x26,0x24,0x2d,
lllllllllllllllllllllllllllllllllllllllllllllllllllllllll&$-
artifact_prefix='/Users/lopopolo/dev/artichoke/prism-fuzz/fuzz/artifacts/prism/'; Test unit written to /Users/lopopolo/dev/artichoke/prism-fuzz/fuzz/artifacts/prism/crash-d3eefb88ac05a746d6b71783bcac0a19313f8ae8
Base64: bGxsbGxsbGxsbGxsbGxsbGxsbGxsbGxsbGxsbGxsbGxsbGxsbGxsbGxsbGxsbGxsbGxsbGxsbGxsJiQt

────────────────────────────────────────────────────────────────────────────────

Failing input:

        fuzz/artifacts/prism/crash-d3eefb88ac05a746d6b71783bcac0a19313f8ae8

Output of `std::fmt::Debug`:

        [108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 108, 38, 36, 45]

Reproduce with:

        cargo fuzz run prism fuzz/artifacts/prism/crash-d3eefb88ac05a746d6b71783bcac0a19313f8ae8

Minimize test case with:

        cargo fuzz tmin prism fuzz/artifacts/prism/crash-d3eefb88ac05a746d6b71783bcac0a19313f8ae8

────────────────────────────────────────────────────────────────────────────────

Error: Fuzz target exited with exit status: 77

This is the harness:

#![no_main]

use std::mem::MaybeUninit;
use std::ptr;
use std::str;

use libfuzzer_sys::fuzz_target;

use prism_fuzz::{pm_buffer_free, pm_buffer_init, pm_buffer_t, pm_serialize_parse};

fuzz_target!(|data: &[u8]| {
    if str::from_utf8(data).is_err() {
        return;
    }
    unsafe {
        let mut buffer = MaybeUninit::<pm_buffer_t>::uninit();
        pm_buffer_init(buffer.as_mut_ptr());
        let mut buffer = buffer.assume_init();
        pm_serialize_parse(&raw mut buffer, data.as_ptr(), data.len(), ptr::null());

        pm_buffer_free(&raw mut buffer);
    }
});

I also got a similar crash with this input:

[3.3.6] > debug_print [8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 36, 45]
=> "\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b\b$-"
@lopopolo
Copy link
Author

I found #463 which looks related but @kddnewton said this was fixed on master in 2023. I am testing with 3c4851c.

@lopopolo
Copy link
Author

lopopolo commented Dec 25, 2024

Doing a bit more debugging, this happens if the given memory is of the form:

$-hf\0...
^^| --- garbage --- |
| <------- source range passed to the parser

The lexer then treats $-hf as a global variable rather than only treating $- (up to the parser end). This means lexing the global variable does an out of bounds read.

This occurs in the default branch of the switch in lex_global_variable because this block of code does not consult parser->end.

From a cursory look, it appears the 0 branch of the switch might also have a similar bug. upon closer inspection this branch correctly checks parser->end before continuing.

This also explains why cargo-fuzz is not able to minimize the crasher for me (it says the crasher doesn't crash): this assertion failure depends on the contents of memory surrounding the data given to the parser.

At least one my machine currently, this test repros:

#[cfg(test)]
mod tests {
    #[test]
    fn test_prism_crasher_2() {
        unsafe {
            let mut buffer = std::mem::MaybeUninit::<pm_buffer_t>::uninit();
            pm_buffer_init(buffer.as_mut_ptr());
            let mut buffer = buffer.assume_init();
            let mut data = *b"G((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((($-";
            pm_serialize_parse(&mut buffer, data.as_mut_ptr(), data.len(), std::ptr::null());
            pm_buffer_free(&mut buffer);
        }
    }
}

@lopopolo lopopolo changed the title Assertion failure in lexer with input that does not have a trailing NUL byte Lexing a global variable will read beyond the end of the source given to the parser, causing an assertion failure Dec 25, 2024
@lopopolo
Copy link
Author

Here is a reliable reproducer which does not rely on memory contents surrounding the Ruby source:

#[cfg(test)]
mod tests {
    #[test]
    fn test_prism_crasher_3() {
        unsafe {
            let mut buffer = std::mem::MaybeUninit::<pm_buffer_t>::uninit();
            pm_buffer_init(buffer.as_mut_ptr());
            let mut buffer = buffer.assume_init();
            let mut data = *b"$-abc";
            pm_serialize_parse(&mut buffer, data.as_mut_ptr(), 2, std::ptr::null());
            pm_buffer_free(&mut buffer);
        }
    }
}

@lopopolo
Copy link
Author

it appears that the do-while construction with the check for being at parser end after advancing the pointer at least once is the culprit

@kddnewton kddnewton added the bug Something isn't working label Dec 27, 2024
@kddnewton
Copy link
Collaborator

Thanks @lopopolo — this pointed out a bunch of stuff I had making incorrect assumptions. I've got a PR attached that should fix this. If you wouldn't mind, after it's merged could you verify that it fixes what you're seeing?

@eregon eregon linked a pull request Jan 6, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants