Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The ptr::Pointee design must incorporate layout info #123353

Closed
dead-claudia opened this issue Apr 2, 2024 · 5 comments
Closed

The ptr::Pointee design must incorporate layout info #123353

dead-claudia opened this issue Apr 2, 2024 · 5 comments
Labels
C-discussion Category: Discussion or questions that doesn't represent real issues. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@dead-claudia
Copy link

dead-claudia commented Apr 2, 2024

#81513 introduces a ptr::Pointee type. This interface lacks a couple key bits of info, two key bits critical for allocators and stuff like ThinBox.

Every pointee in practice has to have some sort of defined layout, and as pointers exist in a finite space, layout is also guaranteed to be finite. rust-lang/rfcs#3536 offers a partial solution for sizes, but doesn't address the two cases where alignment isn't statically known: the dynamic alignment of dyn Trait and the undefined alignment of opaque types.

So, ptr::Pointee needs to expose a way to get this information, as it was initially intended to. There's a few constraints:

  1. It must be type-safe as much as possible. And what can't be checked at the type level must be easily checked in full by Miri.
  2. Where things can be inferred or generated, it must be inferred or generated. Otherwise, it'd disrupt a lot of code.
  3. It of course obviously needs to work within the confines of Tracking Issue for pointer metadata APIs #81513. This goes without saying.

The simplest way I can think of is unfortunately still somewhat involved, and I'm not sure I can reduce it any further:

  • Add a required fn layout(&self) -> alloc::Layout method to ptr::Pointee.

    Using Layout and &self simplifies the interface a lot.

  • Add a marker::Aligned trait similar to marker::Sized, representing that the type has a statically defined alignment. Almost everything auto-implements this, but dyn Trait and opaque types will not.

    Sized will be modified to subtype this. The only way to declare an unknown alignment is to use an extern type, a type of DST, and so it's not possible to both make the size statically determinable and leave the alignment unknown.

    Like its counterpart Sized, Aligned must be implemented to read something by value, and generic parameters implicitly add an Aligned constraint unless you explicitly add ?Aligned. Unlike it, Aligned may be manually implemented for extern types via a #[repr(align(N))] attribute.

    Why an attribute? Makes things way easier for the compiler.

  • Make Pointee an unsafe trait. The lack of safety is because it's defining the memory safety boundaries.

    The layout method is safe to call, but the method itself must ensure that:

    • The returned layout's size holds a full T.
    • The returned layout's align returns the actual runtime alignment of the &self pointer.
  • Auto-implement Pointee for every type that either implements Aligned or has a final field that implements Pointee.

    If the final field does not implement Aligned but does implement Pointee, the auto-generated implementation returns essentially static_prefix.extend(final_field.layout()).unwrap().0.

    Pointee is not auto-implemented extern types. Their layout is undefined, and only the programmer will know how it should be laid out. It also makes for a nice escape hatch.

  • Generics will assume Pointee to be implemented by default. It'll be like Sized rather than Unpin in that regard.

    This avoids breaking things like struct Foo<T>(Arc<T>).

Here's how it'd look for the bulk of this at the type level:

use core::alloc::Layout;
use core::ptr::metadata;

// `core::marker::Aligned`
#[lang = "aligned"]
pub trait Aligned {
    // Empty
}

// `core::marker::Sized`
#[lang = "sized"]
pub trait Sized: Aligned {
    // Empty
}

// `core::ptr::Pointee`
#[lang = "pointee"]
pub unsafe trait Pointee {
    type Metadata: Copy + Send + Sync + Ord + core::hash::Hash + Unpin;
    fn layout(&self) -> Layout;
}

unsafe impl<T: Sized + Aligned> Pointee for T {
    type Metadata = ();

    const fn layout(&self) -> Layout {
        Layout::new::<T>()
    }
}

// Generated for every `trait Trait`
unsafe impl Pointee for dyn Trait {
    type Metadata = DynMetadata<Self>;

    fn layout(&self) -> Layout {
        metadata(self).layout()
    }
}

unsafe impl<T: Sized + Aligned> Pointee for [T] {
    type Metadata = usize;

    const fn layout(&self) -> Layout {
        let size = core::mem::size_of::<T>();
        let align = core::mem::align_of::<T>();
        let len = self.len();
        unsafe {
            Layout::from_size_align_unchecked(len.unchecked_mul(size), align)
        }
    }
}

unsafe impl Pointee for str {
    type Metadata = usize;

    const fn layout(&self) -> Layout {
        let len = self.len();
        unsafe {
            Layout::from_size_align_unchecked(len, 1)
        }
    }
}

// `core::ffi::CStr`
extern "C" {
    #[repr(align(1))]
    pub type CStr;
}

unsafe impl Pointee for CStr {
    type Metadata = ();

    fn layout(&self) -> Layout {
        let len = self.len();
        unsafe {
            Layout::from_size_align_unchecked(len.unchecked_add(1), 1)
        }
    }
}

// `Layout::for_value` should just delegate to `Pointee::layout`
impl Layout {
    fn for_value<T: ?Sized>(value: &T) -> Layout {
        <T as core::ptr::Pointee>::layout(value)
    }
}

// `mem::size_of_val` and `mem::align_of_val` no longer need intrinsics
pub fn size_of_val<T: ?Sized + ?Aligned>(value: &T) -> usize {
    <T as core::ptr::Pointee>::layout(value).size()
}

pub fn align_of_val<T: ?Sized + ?Aligned>(value: &T) -> usize {
    <T as core::ptr::Pointee>::layout(value).align()
}

There's a couple other benefits this provides:

  1. It removes the need for two compiler intrinsics: size_of_val and align_of_val. Their implementations are of course in the above code block.
  2. Miri can check custom layouts against the struct's backing memory on construct and on every pointer cast to the type, ensuring the referenced memory region is always valid and that the pointer in question is correctly aligned.
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Apr 2, 2024
@Noratrieb Noratrieb added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-discussion Category: Discussion or questions that doesn't represent real issues. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Apr 2, 2024
@RalfJung
Copy link
Member

RalfJung commented Apr 2, 2024

I don't think I get it. If I have an &T, which I need to call the new method on Pointee, I can easily call Layout::for_value to get the corresponding layout. No need to involve the Pointee trait I think?

Your proposal lacks a motivation statement for what users of the standard library should be able to do that they currently can't do. You are suggesting to re-implement the way size_of etc work internally, but users shouldn't have to care about that. That motivation needs to come first, before you give any of the technical details for how to achieve this. (See the usual format of our RFCs -- what you're doing here is basically writing an RFC.)

unsafe impl Pointee for CStr {

I think you are incorporating some kind of hypothetical future version of CStr here? For the current CStr this makes little sense. You referenced this from #81513 but this part makes it sound like it is unrelated to the metadata APIs for the current types and only relevant for hypothetical future custom DST?

Again, all that is something that should be explained in the first paragraph of your proposal. Don't make us guess. :)

Miri can check custom layouts against the struct's backing memory on construct and on every pointer cast to the type, ensuring the referenced memory region is always valid and that the pointer in question is correctly aligned.

I don't know what you mean by that. Please give an example of a bug Miri currently cannot find, but would be able to find with your proposal.

@CAD97
Copy link
Contributor

CAD97 commented Apr 2, 2024

You're essentially co-opting Pointee to be DynSized, which, sure, okay, that makes some sense, but you don't need to be able to know the size of something to point at it. You do to directly own it, but opaque/incomplete types that can't be owned/moved due to unknown size are a useful API design tool.

If the final field does not implement Aligned but does implement Pointee, the auto-generated implementation returns essentially static_prefix.extend(final_field.layout()).unwrap().0.

This definition has the usual issue for Mutex<CStr>. How exactly is UnsafeCell<CStr> going to get &T to call Pointee::layout(&T)?

It's also impossible to access the field without first knowing its alignment, as that impacts its offset in the structure. Along the same lines, Arc<T> requires to be able to get size from a pointer to dropped T. It's FCP decided that this is a thing which is possible to do for any dynamically sized type.

The operation really just wants to be possible from just <T as Pointee>::Metadata.

@RalfJung
Copy link
Member

RalfJung commented Apr 2, 2024

The operation really just wants to be possible from just ::Metadata.

Right, what would make a lot of sense is

fn size_of_meta<T: Pointee>(meta: T::Metadata) -> usize

That corresponds pretty well to the underlying primitive in the compiler.

Do we have an issue tracking something like that?

@dead-claudia
Copy link
Author

@RalfJung

I glossed over a lot here. My apologies.

[...] what you're doing here is basically writing an RFC.

It's kinda like an RFC for an RFC. Because of this, I wasn't sure whether to post this here, in https://internals.rust-lang.org, or as a proper RFC/MCP.

If you know of a better way to raise such a major question around the pointer metadata RFC/feature, please let me know and I'll move this accordingly.

I'll be the first to admit I was somewhat lost on where to post this. And my being lost is admittedly part of how this turned out to be a bit of a stream of consciousness.

I don't think I get it. If I have an &T, which I need to call the new method on Pointee, I can easily call Layout::for_value to get the corresponding layout. No need to involve the Pointee trait I think?

That's true from the average user's perspective. However, there's use in making it pluggable.

Your proposal lacks a motivation statement for what users of the standard library should be able to do that they currently can't do. [...]

I'll defer to rust-lang/rfcs#3536 on rationale for why size_of_val should be pluggable.

Some degree of alignment pluggability is also needed so FFI types like CStr can feasibly switch to raw pointers.

I went with a layout method returning a Layout because it was easy to stub out. You could also just use alignment attributes on opaque types and do the rest statically.

I think you are incorporating some kind of hypothetical future version of CStr here?

Yeah, I should've clarified the hypothetical nature of it. I almost added a note about that dozens of times and simply forgot. All those implementations in that code block are hypothetical.

Miri can check custom layouts against the struct's backing memory on construct and on every pointer cast to the type, ensuring the referenced memory region is always valid and that the pointer in question is correctly aligned.

I don't know what you mean by that. Please give an example of a bug Miri currently cannot find, but would be able to find with your proposal.

The idea is to check that stuff like Gc<dyn Trait> and Gc<WideCStr> align their allocated pointer correctly and expose the right chunk of memory from Deref.

@dead-claudia dead-claudia closed this as not planned Won't fix, can't repro, duplicate, stale Apr 2, 2024
@dead-claudia
Copy link
Author

@CAD97 I'll bring those up in rust-lang/rfcs#3536 and try to come to a solution there. This was admittedly very premature, despite me spending most of a day on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-discussion Category: Discussion or questions that doesn't represent real issues. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants