The `ptr::Pointee` design must incorporate layout info #123353

dead-claudia · 2024-04-02T05:06:59Z

#81513 introduces a ptr::Pointee type. This interface lacks a couple key bits of info, two key bits critical for allocators and stuff like ThinBox.

Every pointee in practice has to have some sort of defined layout, and as pointers exist in a finite space, layout is also guaranteed to be finite. rust-lang/rfcs#3536 offers a partial solution for sizes, but doesn't address the two cases where alignment isn't statically known: the dynamic alignment of dyn Trait and the undefined alignment of opaque types.

So, ptr::Pointee needs to expose a way to get this information, as it was initially intended to. There's a few constraints:

It must be type-safe as much as possible. And what can't be checked at the type level must be easily checked in full by Miri.
Where things can be inferred or generated, it must be inferred or generated. Otherwise, it'd disrupt a lot of code.
It of course obviously needs to work within the confines of Tracking Issue for pointer metadata APIs #81513. This goes without saying.

The simplest way I can think of is unfortunately still somewhat involved, and I'm not sure I can reduce it any further:

Add a required fn layout(&self) -> alloc::Layout method to ptr::Pointee.

Using Layout and &self simplifies the interface a lot.
Add a marker::Aligned trait similar to marker::Sized, representing that the type has a statically defined alignment. Almost everything auto-implements this, but dyn Trait and opaque types will not.

Sized will be modified to subtype this. The only way to declare an unknown alignment is to use an extern type, a type of DST, and so it's not possible to both make the size statically determinable and leave the alignment unknown.

Like its counterpart Sized, Aligned must be implemented to read something by value, and generic parameters implicitly add an Aligned constraint unless you explicitly add ?Aligned. Unlike it, Aligned may be manually implemented for extern types via a #[repr(align(N))] attribute.

Why an attribute? Makes things way easier for the compiler.
Make Pointee an unsafe trait. The lack of safety is because it's defining the memory safety boundaries.

The layout method is safe to call, but the method itself must ensure that:
- The returned layout's size holds a full T.
- The returned layout's align returns the actual runtime alignment of the &self pointer.
Auto-implement Pointee for every type that either implements Aligned or has a final field that implements Pointee.

If the final field does not implement Aligned but does implement Pointee, the auto-generated implementation returns essentially static_prefix.extend(final_field.layout()).unwrap().0.

Pointee is not auto-implemented extern types. Their layout is undefined, and only the programmer will know how it should be laid out. It also makes for a nice escape hatch.
Generics will assume Pointee to be implemented by default. It'll be like Sized rather than Unpin in that regard.

This avoids breaking things like struct Foo<T>(Arc<T>).

Here's how it'd look for the bulk of this at the type level:

use core::alloc::Layout;
use core::ptr::metadata;

// `core::marker::Aligned`
#[lang = "aligned"]
pub trait Aligned {
    // Empty
}

// `core::marker::Sized`
#[lang = "sized"]
pub trait Sized: Aligned {
    // Empty
}

// `core::ptr::Pointee`
#[lang = "pointee"]
pub unsafe trait Pointee {
    type Metadata: Copy + Send + Sync + Ord + core::hash::Hash + Unpin;
    fn layout(&self) -> Layout;
}

unsafe impl<T: Sized + Aligned> Pointee for T {
    type Metadata = ();

    const fn layout(&self) -> Layout {
        Layout::new::<T>()
    }
}

// Generated for every `trait Trait`
unsafe impl Pointee for dyn Trait {
    type Metadata = DynMetadata<Self>;

    fn layout(&self) -> Layout {
        metadata(self).layout()
    }
}

unsafe impl<T: Sized + Aligned> Pointee for [T] {
    type Metadata = usize;

    const fn layout(&self) -> Layout {
        let size = core::mem::size_of::<T>();
        let align = core::mem::align_of::<T>();
        let len = self.len();
        unsafe {
            Layout::from_size_align_unchecked(len.unchecked_mul(size), align)
        }
    }
}

unsafe impl Pointee for str {
    type Metadata = usize;

    const fn layout(&self) -> Layout {
        let len = self.len();
        unsafe {
            Layout::from_size_align_unchecked(len, 1)
        }
    }
}

// `core::ffi::CStr`
extern "C" {
    #[repr(align(1))]
    pub type CStr;
}

unsafe impl Pointee for CStr {
    type Metadata = ();

    fn layout(&self) -> Layout {
        let len = self.len();
        unsafe {
            Layout::from_size_align_unchecked(len.unchecked_add(1), 1)
        }
    }
}

// `Layout::for_value` should just delegate to `Pointee::layout`
impl Layout {
    fn for_value<T: ?Sized>(value: &T) -> Layout {
        <T as core::ptr::Pointee>::layout(value)
    }
}

// `mem::size_of_val` and `mem::align_of_val` no longer need intrinsics
pub fn size_of_val<T: ?Sized + ?Aligned>(value: &T) -> usize {
    <T as core::ptr::Pointee>::layout(value).size()
}

pub fn align_of_val<T: ?Sized + ?Aligned>(value: &T) -> usize {
    <T as core::ptr::Pointee>::layout(value).align()
}

There's a couple other benefits this provides:

It removes the need for two compiler intrinsics: size_of_val and align_of_val. Their implementations are of course in the above code block.
Miri can check custom layouts against the struct's backing memory on construct and on every pointer cast to the type, ensuring the referenced memory region is always valid and that the pointer in question is correctly aligned.

The text was updated successfully, but these errors were encountered:

RalfJung · 2024-04-02T06:42:14Z

I don't think I get it. If I have an &T, which I need to call the new method on Pointee, I can easily call Layout::for_value to get the corresponding layout. No need to involve the Pointee trait I think?

Your proposal lacks a motivation statement for what users of the standard library should be able to do that they currently can't do. You are suggesting to re-implement the way size_of etc work internally, but users shouldn't have to care about that. That motivation needs to come first, before you give any of the technical details for how to achieve this. (See the usual format of our RFCs -- what you're doing here is basically writing an RFC.)

unsafe impl Pointee for CStr {

I think you are incorporating some kind of hypothetical future version of CStr here? For the current CStr this makes little sense. You referenced this from #81513 but this part makes it sound like it is unrelated to the metadata APIs for the current types and only relevant for hypothetical future custom DST?

Again, all that is something that should be explained in the first paragraph of your proposal. Don't make us guess. :)

Miri can check custom layouts against the struct's backing memory on construct and on every pointer cast to the type, ensuring the referenced memory region is always valid and that the pointer in question is correctly aligned.

I don't know what you mean by that. Please give an example of a bug Miri currently cannot find, but would be able to find with your proposal.

CAD97 · 2024-04-02T06:50:05Z

You're essentially co-opting Pointee to be DynSized, which, sure, okay, that makes some sense, but you don't need to be able to know the size of something to point at it. You do to directly own it, but opaque/incomplete types that can't be owned/moved due to unknown size are a useful API design tool.

If the final field does not implement Aligned but does implement Pointee, the auto-generated implementation returns essentially static_prefix.extend(final_field.layout()).unwrap().0.

This definition has the usual issue for Mutex<CStr>. How exactly is UnsafeCell<CStr> going to get &T to call Pointee::layout(&T)?

It's also impossible to access the field without first knowing its alignment, as that impacts its offset in the structure. Along the same lines, Arc<T> requires to be able to get size from a pointer to dropped T. It's FCP decided that this is a thing which is possible to do for any dynamically sized type.

The operation really just wants to be possible from just <T as Pointee>::Metadata.

RalfJung · 2024-04-02T07:25:22Z

The operation really just wants to be possible from just ::Metadata.

Right, what would make a lot of sense is

fn size_of_meta<T: Pointee>(meta: T::Metadata) -> usize

That corresponds pretty well to the underlying primitive in the compiler.

Do we have an issue tracking something like that?

dead-claudia · 2024-04-02T08:17:59Z

@RalfJung

I glossed over a lot here. My apologies.

[...] what you're doing here is basically writing an RFC.

It's kinda like an RFC for an RFC. Because of this, I wasn't sure whether to post this here, in https://internals.rust-lang.org, or as a proper RFC/MCP.

If you know of a better way to raise such a major question around the pointer metadata RFC/feature, please let me know and I'll move this accordingly.

I'll be the first to admit I was somewhat lost on where to post this. And my being lost is admittedly part of how this turned out to be a bit of a stream of consciousness.

I don't think I get it. If I have an &T, which I need to call the new method on Pointee, I can easily call Layout::for_value to get the corresponding layout. No need to involve the Pointee trait I think?

That's true from the average user's perspective. However, there's use in making it pluggable.

Your proposal lacks a motivation statement for what users of the standard library should be able to do that they currently can't do. [...]

I'll defer to rust-lang/rfcs#3536 on rationale for why size_of_val should be pluggable.

Some degree of alignment pluggability is also needed so FFI types like CStr can feasibly switch to raw pointers.

I went with a layout method returning a Layout because it was easy to stub out. You could also just use alignment attributes on opaque types and do the rest statically.

I think you are incorporating some kind of hypothetical future version of CStr here?

Yeah, I should've clarified the hypothetical nature of it. I almost added a note about that dozens of times and simply forgot. All those implementations in that code block are hypothetical.

Miri can check custom layouts against the struct's backing memory on construct and on every pointer cast to the type, ensuring the referenced memory region is always valid and that the pointer in question is correctly aligned.

I don't know what you mean by that. Please give an example of a bug Miri currently cannot find, but would be able to find with your proposal.

The idea is to check that stuff like Gc<dyn Trait> and Gc<WideCStr> align their allocated pointer correctly and expose the right chunk of memory from Deref.

dead-claudia · 2024-04-02T08:30:49Z

@CAD97 I'll bring those up in rust-lang/rfcs#3536 and try to come to a solution there. This was admittedly very premature, despite me spending most of a day on it.

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Apr 2, 2024

This was referenced Apr 2, 2024

RFC: Trait for !Sized thin pointers rust-lang/rfcs#3536

Closed

Tracking Issue for pointer metadata APIs #81513

Open

dead-claudia closed this as not planned Won't fix, can't repro, duplicate, stale Apr 2, 2024

dead-claudia mentioned this issue Jun 18, 2024

Why don't slices use dedicated metadata structs? #125517

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The `ptr::Pointee` design must incorporate layout info #123353

The `ptr::Pointee` design must incorporate layout info #123353

dead-claudia commented Apr 2, 2024 •

edited

Loading

RalfJung commented Apr 2, 2024 •

edited

Loading

CAD97 commented Apr 2, 2024

RalfJung commented Apr 2, 2024

dead-claudia commented Apr 2, 2024

dead-claudia commented Apr 2, 2024

The ptr::Pointee design must incorporate layout info #123353

The ptr::Pointee design must incorporate layout info #123353

Comments

dead-claudia commented Apr 2, 2024 • edited Loading

RalfJung commented Apr 2, 2024 • edited Loading

CAD97 commented Apr 2, 2024

RalfJung commented Apr 2, 2024

dead-claudia commented Apr 2, 2024

dead-claudia commented Apr 2, 2024

The `ptr::Pointee` design must incorporate layout info #123353

The `ptr::Pointee` design must incorporate layout info #123353

dead-claudia commented Apr 2, 2024 •

edited

Loading

RalfJung commented Apr 2, 2024 •

edited

Loading