-
Notifications
You must be signed in to change notification settings - Fork 783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improving ergonomics of getting slices from a PyBuffer #2824
Comments
Taking a step back: accepting read-only buffers seems like a reasonably common use case for people who want memory-efficient APIs, so it'd be nice to have good ergonomics, especially for people who haven't heard about transmute and/or don't feel confident reasoning about whether they can use it. |
Writing C code is not required, at least when NumPy is involved: If I change your example into fn validate_cbor(py: Python<'_>, cbor: &PyAny, cb: &PyAny) -> PyResult<()> {
let buffer = PyBuffer::<u8>::get(cbor)?;
if !buffer.readonly() {
return Err(PyValueError::new_err("Must be read-only byte buffer"));
}
let slice = buffer
.as_slice(py)
.ok_or_else(|| PyTypeError::new_err("Must be a contiguous sequence of bytes"))?;
let before = slice.iter().map(|cell| cell.get()).collect::<Vec<_>>();
cb.call0()?;
let after = slice.iter().map(|cell| cell.get()).collect::<Vec<_>>();
assert_eq!(before, after);
Ok(())
} to include a callback def test_validate_cbor():
x = np.arange(0, 10, dtype=np.uint8)
y = x.view()
y.flags.writeable = False
def cb():
x[3] = 42
validate_cbor(y, cb) yielding
which would not be sound if Of course, the view
but it can still be manipulated using Python code via For this reason, rust-numpy dynamically borrow checks all memory accesses done by safe Rust code so that safe Rust code cannot be the originator of memory unsafety, but considers both other native extension modules as well as Python code unsafe, meaning trusted to manually uphold aliasing discipline, c.f. https://docs.rs/numpy/latest/numpy/borrow/index.html |
Hm, yeah, NumPy views are a compelling argument that you can't trust the So perhaps a more modest approach would be just keeping people from having to do transmutes themselves? use std::mem::size_of;
impl<T> PyBuffer<T> {
/// Safety: ... document constraints here ...
unsafe fn as_regular_slice(&self) -> Option<&[T]> {
if !self.readonly() {
return None;
}
let slice = self.as_slice()?;
const _: () = assert!(
size_of::<ReadOnlyCell<T>>() == size_of::<T>(),
"ReadOnlyCell<T> is different size than T"
);
Some(unsafe { std::mem::transmute::<&[ReadOnlyCell<T>], &[T]>(slice) })
}
} |
As hinted at above, for rust-numpy we took the opposite approach. We do trust Python code as we trust other "unsafe" code. (We consider the read-only flag, but only to prevent safe Rust code writing to read-only buffers, not to assume immutability.)
Yeah, I think such methods make sense, but they should come with extensive documentation on the issue of shared mutability in Python code. |
FWIW I had to do the same thing in kylebarron/geo-index#57 and I think let buffer = PyBuffer::<u8>::get_bound(obj)?;
if !buffer.readonly() {
return Err(PyValueError::new_err("Must be read-only byte buffer."));
}
if buffer.dimensions() != 1 {
return Err(PyValueError::new_err("Expected 1-dimensional array."));
}
// Note: this is probably superfluous for 1D array
if !buffer.is_c_contiguous() {
return Err(PyValueError::new_err("Expected c-contiguous array."));
}
if buffer.len_bytes() == 0 {
return Err(PyValueError::new_err("Buffer has no data."));
}
let len = buffer.item_count();
let data = buffer.buf_ptr() as *const u8;
unsafe { std::slice::from_raw_parts(data, len) } |
I worry that slicing into Python-mutable memory like this risks UB due to optimizations that rustc might do due to the assumption of immutability. I think in practice this UB would most likely show up as data races on reading the buffer when there is also a writer, but I have no idea if the compiler might choose to do more destructive optimizations. I wonder if a safer option than copying directly from the Python buffer and violating Rust's aliasing rules is potentially to allocate a temporary stack-based buffer (say, |
Hmm. Is this worry specifically around not holding the GIL while accessing the slices? I later read the code more and saw that Lines 370 to 381 in ea6a0aa
I was thinking that directly calling |
I also have a use-case for this (trying to expose a Rust API that uses IIUC, the problem with the buffer is basically the one capture din https://alexgaynor.net/2022/oct/23/buffers-on-the-edge/: That said, from a user's POV I do like |
I'm writing a function that accepts anything that can create a read-only buffer:
It's not clear to me why
PyBytes::as_bytes()
shouldn't return&[ReadOnlyCell<u8>]
too. After all, there is nothing preventing a particularly malicious user from overwriting the data in a Python bytes object, just need to write a little bit of C. We need to trust callers to some extent in a world where they can write C.So my feeling is that
as_slice()
can just returnSome(&[T])
, unless there's some reason to expect buffer API implementations to lie about theirreadonly
flag.As a follow-up, if that makes sense then maybe the pattern above could be simplified and built-in to PyO3.
The text was updated successfully, but these errors were encountered: