Skip to content

Support for passing const Miri pointers to C #2498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 26 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
c5e5754
C FFI support for functions with int args and returns (rebasing from …
Jul 18, 2022
41fd802
updates to foreign_items
Jul 18, 2022
e7a6466
PR edits requested by @RalfJung: main changes are removing the use of…
Jul 20, 2022
48c179d
cfg for linux, and remove test SO file
Jul 20, 2022
7f85148
dont need to track `fn_sig`: nice simplification, thanks @oli-obk!
Jul 20, 2022
0b58790
update README; moving libloading load and hack into its own function
Jul 20, 2022
7c97b73
fix outdated doc
Jul 21, 2022
62926b7
upstream master changes; and no libc on windows
Jul 21, 2022
0b65e6f
upstream
Jul 21, 2022
f825988
upstream
Jul 21, 2022
7d0ad06
upstream
Jul 21, 2022
450ef91
broken, but at least finding poitner to internal bytes in allocation.…
Jul 26, 2022
1018b62
store the pointer in the allocextra -- now just need to make it the a…
Jul 26, 2022
47c9546
Merge branch 'master' into pointer-support
Jul 26, 2022
33ef561
DO NOT MERGE - scratchwork
maurer Aug 3, 2022
b8a595d
one level up from full holepunch to alloc bytes
Aug 4, 2022
a87fdb3
no more holepunch to alloc.bytes
Aug 4, 2022
94dacfd
moving the intptrcast int_to_ptr_map from a vector to a btreemap
Aug 5, 2022
89688bb
no more yolo pointer typing
Aug 5, 2022
7d73854
commenting out alignment check fixes the alignment failing tests -- n…
Aug 6, 2022
bca7fcc
avoid infinite recursion into pointers in intptrcast `alloc_base_addr…
Aug 9, 2022
1c0168a
get rid of box leak hack
Aug 19, 2022
73c9783
some cleanup
Aug 19, 2022
e4519e2
putting new allocation address stuff behind a flag, so it only runs i…
Aug 20, 2022
3ced463
formatting
Aug 20, 2022
8ed36ed
adding some comments
Aug 20, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,12 @@ target
/doc
tex/*/out
*.dot
*.out
*.rs.bk
.vscode
*.mm_profdata
perf.data
perf.data.old
flamegraph.svg
tests/extern-so/libtestlib.so
.auto-*
38 changes: 38 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ doctest = false # and no doc tests
[dependencies]
getrandom = { version = "0.2", features = ["std"] }
env_logger = "0.9"
libffi = "3.0.0"
libloading = "0.7"
log = "0.4"
shell-escape = "0.1.4"
rand = "0.8"
Expand Down
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -285,6 +285,14 @@ environment variable. We first document the most relevant and most commonly used
`TERM` environment variable is excluded by default to [speed up the test
harness](https://github.com/rust-lang/miri/issues/1702). This has no effect unless
`-Zmiri-disable-isolation` is also set.
* `-Zmiri-extern-so-file=<path to a shared object file>` is an experimental flag for providing support
for FFI calls.
**WARNING**: If an invalid/incorrect `.so` file is specified, this can cause undefined behaviour in Miri itself!
And of course, Miri cannot do any checks on the actions taken by the external code.
This is **work in progress**; currently, only integer arguments and return values are
supported (and no, pointer/integer casts to work around this limitation will not work;
they will fail horribly).
Follow [the discussion on supporting other types](https://github.com/rust-lang/miri/issues/2365).
* `-Zmiri-env-forward=<var>` forwards the `var` environment variable to the interpreted program. Can
be used multiple times to forward several variables. This takes precedence over
`-Zmiri-env-exclude`: if a variable is both forwarded and exluced, it *will* get forwarded. This
Expand Down
13 changes: 13 additions & 0 deletions src/bin/miri.rs
Original file line number Diff line number Diff line change
Expand Up @@ -510,6 +510,19 @@ fn main() {
"full" => BacktraceStyle::Full,
_ => panic!("-Zmiri-backtrace may only be 0, 1, or full"),
};
} else if let Some(param) = arg.strip_prefix("-Zmiri-extern-so-file=") {
let filename = param.to_string();
if std::path::Path::new(&filename).exists() {
if let Some(other_filename) = miri_config.external_so_file {
panic!(
"-Zmiri-extern-so-file external SO file is already set to {}",
other_filename.display()
);
}
miri_config.external_so_file = Some(filename.into());
} else {
panic!("-Zmiri-extern-so-file path {} does not exist", filename);
}
} else {
// Forward to rustc.
rustc_args.push(arg);
Expand Down
5 changes: 5 additions & 0 deletions src/eval.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ use std::collections::HashSet;
use std::ffi::OsStr;
use std::iter;
use std::panic::{self, AssertUnwindSafe};
use std::path::PathBuf;
use std::thread;

use log::info;
Expand Down Expand Up @@ -125,6 +126,9 @@ pub struct MiriConfig {
pub report_progress: Option<u32>,
/// Whether Stacked Borrows retagging should recurse into fields of datatypes.
pub retag_fields: bool,
/// The location of a shared object file to load when calling external functions
/// TODO! consider allowing users to specify paths to multiple SO files, or to a directory
pub external_so_file: Option<PathBuf>,
}

impl Default for MiriConfig {
Expand Down Expand Up @@ -155,6 +159,7 @@ impl Default for MiriConfig {
preemption_rate: 0.01, // 1%
report_progress: None,
retag_fields: false,
external_so_file: None,
}
}
}
Expand Down
151 changes: 110 additions & 41 deletions src/intptrcast.rs
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
use std::cell::RefCell;
use std::cmp::max;
use std::collections::hash_map::Entry;
use std::collections::{hash_map::Entry, BTreeMap};

use log::trace;
use rand::Rng;

use rustc_data_structures::fx::{FxHashMap, FxHashSet};
use rustc_span::Span;
use rustc_target::abi::{HasDataLayout, Size};
use rustc_target::abi::{Align, HasDataLayout, Size};

use crate::*;

Expand All @@ -26,9 +26,9 @@ pub type GlobalState = RefCell<GlobalStateInner>;

#[derive(Clone, Debug)]
pub struct GlobalStateInner {
/// This is used as a map between the address of each allocation and its `AllocId`.
/// It is always sorted
int_to_ptr_map: Vec<(u64, AllocId)>,
/// This is a map between the address of each allocation and its `AllocId`.
/// Since it's a `BTreeMap`, it is always sorted, and provides efficient insertion.
int_to_ptr_map: BTreeMap<u64, AllocId>,
/// The base address for each allocation. We cannot put that into
/// `AllocExtra` because function pointers also have a base address, and
/// they do not have an `AllocExtra`.
Expand All @@ -46,11 +46,17 @@ pub struct GlobalStateInner {

impl GlobalStateInner {
pub fn new(config: &MiriConfig) -> Self {
// If we're in FFI mode, then the `next_base_addr` is only used to assign fake addresses
// to allocations that don't have associated arrays of bytes.
// CURRENT HACK:
// We start at 1 to avoid overlap with existing/future real memory the program has
// pointers to.
let next_base_addr = if config.external_so_file.is_some() { 1 } else { STACK_ADDR };
GlobalStateInner {
int_to_ptr_map: Vec::default(),
int_to_ptr_map: BTreeMap::default(),
base_addr: FxHashMap::default(),
exposed: FxHashSet::default(),
next_base_addr: STACK_ADDR,
next_base_addr,
provenance_mode: config.provenance_mode,
}
}
Expand All @@ -63,22 +69,26 @@ impl<'mir, 'tcx> GlobalStateInner {
let global_state = ecx.machine.intptrcast.borrow();
assert!(global_state.provenance_mode != ProvenanceMode::Strict);

let pos = global_state.int_to_ptr_map.binary_search_by_key(&addr, |(addr, _)| *addr);

// Determine the in-bounds provenance for this pointer.
// (This is only called on an actual access, so in-bounds is the only possible kind of provenance.)
let alloc_id = match pos {
Ok(pos) => Some(global_state.int_to_ptr_map[pos].1),
Err(0) => None,
Err(pos) => {
// This is the largest of the adresses smaller than `int`,
// i.e. the greatest lower bound (glb)
let (glb, alloc_id) = global_state.int_to_ptr_map[pos - 1];
// This never overflows because `addr >= glb`
let offset = addr - glb;
// If the offset exceeds the size of the allocation, don't use this `alloc_id`.
let size = ecx.get_alloc_info(alloc_id).0;
if offset <= size.bytes() { Some(alloc_id) } else { None }
let alloc_id = match global_state.int_to_ptr_map.get(&addr) {
Some(&id) => Some(id),
None => {
// If the address is not in the map, we check the position it should be inserted at.
// This returns the max key in the map less than `addr`.
match global_state.int_to_ptr_map.range(..addr).next_back() {
// Should be inserted at the beginning.
None => None,
// This is the largest of the adresses smaller than `int`,
// i.e. the greatest lower bound (glb).
Some((glb, &alloc_id)) => {
// This never overflows because `addr >= glb`
let offset = addr - glb;
// If the offset exceeds the size of the allocation, don't use this `alloc_id`.
let size = ecx.get_alloc_info(alloc_id).0;
if offset <= size.bytes() { Some(alloc_id) } else { None }
}
}
}
}?;

Expand Down Expand Up @@ -156,8 +166,72 @@ impl<'mir, 'tcx> GlobalStateInner {
Ok(Pointer::new(Some(Provenance::Wildcard), Size::from_bytes(addr)))
}

// Create a fake address for a new allocation, of a particular size and alignment.
// Ensure this address doesn't overlap with existing or future-assigned memory.
fn get_next_fake_addr(
ecx: &MiriEvalContext<'mir, 'tcx>,
align: Align,
size: Size,
next_base_addr: u64,
) -> (u64, u64) {
// This allocation does not have a base address yet, pick one.
// Leave some space to the previous allocation, to give it some chance to be less aligned.
// It also doesn't correspond to a real array of bytes.
// HACK: we're not going to actually have pointers in the program that correspond to
// the really low addresses, so let's use these as placeholders for these allocations.
// This makes sure we won't overlap with any existing (real) addresses.
// An alternate hack, which we had before, was to create and leak a Box:
// `let new_addr = Box::leak(Box::new(0u128)) as *const u128 as u64;`
let slack = {
let mut rng = ecx.machine.rng.borrow_mut();
// This means that `(global_state.next_base_addr + slack) % 16` is uniformly distributed.
rng.gen_range(0..16)
};
// From next_base_addr + slack, round up to adjust for alignment.
let base_addr = next_base_addr.checked_add(slack).unwrap();
let base_addr = Self::align_addr(base_addr, align.bytes());

// Remember next base address. If this allocation is zero-sized, leave a gap
// of at least 1 to avoid two allocations having the same base address.
// (The logic in `alloc_id_from_addr` assumes unique addresses, and different
// function/vtable pointers need to be distinguishable!)
let next_base_addr = base_addr.checked_add(max(size.bytes(), 1)).unwrap();
(base_addr, next_base_addr)
}

fn alloc_base_addr(ecx: &MiriEvalContext<'mir, 'tcx>, alloc_id: AllocId) -> u64 {
let mut global_state = ecx.machine.intptrcast.borrow_mut();
let in_ffi_mode = ecx.machine.external_so_lib.is_some();
// With our hack, base_addr should always be fully aligned
let mut global_state = match ecx.machine.intptrcast.try_borrow_mut() {
Ok(gstate) => gstate,
Err(_) => {
if in_ffi_mode {
// We're recursing!
let (size, align, _kind) = ecx.get_alloc_info(alloc_id);
let new_addr = unsafe {
// Can't `borrow_mut` to get the global state, so just refer to it
// via pointer instead.
// This is unsafe.
let next_base_addr = (*ecx.machine.intptrcast.as_ptr()).next_base_addr;
let (new_addr, next_base_addr) =
Self::get_next_fake_addr(ecx, align, size, next_base_addr);
(*ecx.machine.intptrcast.as_ptr()).base_addr.insert(alloc_id, new_addr);
(*ecx.machine.intptrcast.as_ptr())
.int_to_ptr_map
.insert(new_addr, alloc_id);
(*ecx.machine.intptrcast.as_ptr()).next_base_addr = next_base_addr;
new_addr
};
trace!(
"Recursive case: Assigning base address {:#x} to allocation {:?}",
new_addr,
alloc_id,
);
return new_addr;
}
panic!("Can't mutably borrow the `intptrcast` global state!");
}
};
let global_state = &mut *global_state;

match global_state.base_addr.entry(alloc_id) {
Expand All @@ -167,34 +241,29 @@ impl<'mir, 'tcx> GlobalStateInner {
// it became dangling. Hence we allow dead allocations.
let (size, align, _kind) = ecx.get_alloc_info(alloc_id);

// This allocation does not have a base address yet, pick one.
// Leave some space to the previous allocation, to give it some chance to be less aligned.
let slack = {
let mut rng = ecx.machine.rng.borrow_mut();
// This means that `(global_state.next_base_addr + slack) % 16` is uniformly distributed.
rng.gen_range(0..16)
// Short circuit -- only call `ecx.get_alloc_base_addr` if we're `in_ffi_mode`.
let base_addr = if in_ffi_mode && let Ok(addr) = ecx.get_alloc_base_addr(alloc_id) {
assert!(addr.bytes() % 16 == 0);
addr.bytes()
} else {
let (new_addr, next_base_addr) = Self::get_next_fake_addr(ecx, align, size, global_state.next_base_addr); //Box::leak(Box::new(0u128)) as *const u128 as u64;
global_state.next_base_addr = next_base_addr;
new_addr
};
// From next_base_addr + slack, round up to adjust for alignment.
let base_addr = global_state.next_base_addr.checked_add(slack).unwrap();
let base_addr = Self::align_addr(base_addr, align.bytes());

// This allocation does not have a base address yet, assign its bytes base.
entry.insert(base_addr);
trace!(
"Assigning base address {:#x} to allocation {:?} (size: {}, align: {}, slack: {})",
"Assigning base address {:#x} to allocation {:?} (size: {}, align: {})",
base_addr,
alloc_id,
size.bytes(),
align.bytes(),
slack,
);

// Remember next base address. If this allocation is zero-sized, leave a gap
// of at least 1 to avoid two allocations having the same base address.
// (The logic in `alloc_id_from_addr` assumes unique addresses, and different
// function/vtable pointers need to be distinguishable!)
global_state.next_base_addr = base_addr.checked_add(max(size.bytes(), 1)).unwrap();
// Given that `next_base_addr` increases in each allocation, pushing the
// corresponding tuple keeps `int_to_ptr_map` sorted
global_state.int_to_ptr_map.push((base_addr, alloc_id));
// Map has no duplicates so no need to remove copies.
// Map is always sorted.
global_state.int_to_ptr_map.insert(base_addr, alloc_id);

base_addr
}
Expand Down
15 changes: 15 additions & 0 deletions src/machine.rs
Original file line number Diff line number Diff line change
Expand Up @@ -356,6 +356,9 @@ pub struct Evaluator<'mir, 'tcx> {
pub(crate) report_progress: Option<u32>,
/// The number of blocks that passed since the last progress report.
pub(crate) since_progress_report: u32,

/// Handle of the optional C shared object file
pub external_so_lib: Option<(libloading::Library, std::path::PathBuf)>,
}

impl<'mir, 'tcx> Evaluator<'mir, 'tcx> {
Expand Down Expand Up @@ -410,6 +413,18 @@ impl<'mir, 'tcx> Evaluator<'mir, 'tcx> {
preemption_rate: config.preemption_rate,
report_progress: config.report_progress,
since_progress_report: 0,
external_so_lib: config.external_so_file.as_ref().map(|lib_file_path| {
// Note: it is the user's responsibility to provide a correct SO file.
// WATCH OUT: If an invalid/incorrect SO file is specified, this can cause
// undefined behaviour in Miri itself!
(
unsafe {
libloading::Library::new(lib_file_path)
.expect("Failed to read specified shared object file")
},
lib_file_path.clone(),
)
}),
}
}

Expand Down
Loading