|
| 1 | +- Feature Name: `unsized_thin_pointers` |
| 2 | +- Start Date: 2023-11-29 |
| 3 | +- RFC PR: [rust-lang/rfcs#3536](https://github.com/rust-lang/rfcs/pull/3536) |
| 4 | +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) |
| 5 | + |
| 6 | +# Summary |
| 7 | +[summary]: #summary |
| 8 | + |
| 9 | +Enable user code to define dynamically-sized thin pointers. Such types are |
| 10 | +`!Sized`, but references to them are pointer-sized (i.e. not "fat pointers"). |
| 11 | +The implementation of [`core::mem::size_of_val()`](size_of_val) delegates to |
| 12 | +a new `core::mem::DynSized` trait at runtime. |
| 13 | + |
| 14 | +[size_of_val]: https://doc.rust-lang.org/core/mem/fn.size_of_val.html |
| 15 | + |
| 16 | +# Motivation |
| 17 | +[motivation]: #motivation |
| 18 | + |
| 19 | +Enable ergonomic and efficient references to dynamically-sized values that |
| 20 | +are capable of computing their own size. |
| 21 | + |
| 22 | +It should be possible to declare a Rust type that is `!Sized`, but has |
| 23 | +references that are pointer-sized and therefore only require a single register |
| 24 | +on most architectures. |
| 25 | + |
| 26 | +In particular this RFC aims to support a common pattern in other low-level |
| 27 | +languages, such as C, where a value may consist of a fixed-layout header |
| 28 | +followed by dynamically-sized data: |
| 29 | + |
| 30 | +```c |
| 31 | +struct __attribute__((aligned(8))) request { |
| 32 | + uint32_t size; |
| 33 | + uint16_t id; |
| 34 | + uint16_t flags; |
| 35 | + /* uint8_t request_data[]; */ |
| 36 | +}; |
| 37 | + |
| 38 | +void handle_request(struct request *req) { /* ... */ } |
| 39 | +``` |
| 40 | +
|
| 41 | +This pattern is used frequently in zero-copy APIs that transmit structured data |
| 42 | +between trust boundaries. |
| 43 | +
|
| 44 | +# Background |
| 45 | +[motivation]: #motivation |
| 46 | +
|
| 47 | +There are currently two approved RFCs that cover similar functionality: |
| 48 | +* [RFC 1861] adds `extern type` for declaring types that are opaque to Rust's |
| 49 | + type system. One of the capabilities available to extern types is that they |
| 50 | + can be embedded into a `struct` as the last field, and that `struct` will |
| 51 | + become an unsized type with thin references. |
| 52 | +
|
| 53 | + Stabilizing `extern type` is currently blocked on questions of how to handle |
| 54 | + Rust layout intrinsics such as [`core::mem::size_of_val()`][size_of_val] and |
| 55 | + [`core::mem::align_of_val()`][align_of_val] for fully opaque types. |
| 56 | +
|
| 57 | +* [RFC 2580] adds traits and intrinsics for custom DSTs either with or without |
| 58 | + associated "fat pointer" metadata. A custom DST with thin references can be |
| 59 | + represented as `Pointee<Metadata = ()>`. |
| 60 | +
|
| 61 | + Stabilizing custom DSTs is currently blocked on multiple questions involving |
| 62 | + the content and representation of complex metadata, such as `&dyn` vtables. |
| 63 | +
|
| 64 | +In both of these cases the ability to declare custom DSTs with thin references |
| 65 | +is a minor footnote to the overall feature, and stabilization is blocked by |
| 66 | +issues unrelated to thin-pointer DSTs. |
| 67 | +
|
| 68 | +The objective of this RFC is to extract custom thin-pointer DSTs into its own |
| 69 | +feature, which would hopefully be free of known issues and could be stabilized |
| 70 | +without significant changes to the compiler or ecosystem. |
| 71 | +
|
| 72 | +[RFC 1861]: https://rust-lang.github.io/rfcs/1861-extern-types.html |
| 73 | +[RFC 2580]: https://rust-lang.github.io/rfcs/2580-ptr-meta.html |
| 74 | +
|
| 75 | +[align_of_val]: https://doc.rust-lang.org/core/mem/fn.align_of_val.html |
| 76 | +
|
| 77 | +# Guide-level explanation |
| 78 | +[guide-level-explanation]: #guide-level-explanation |
| 79 | +
|
| 80 | +The unsafe trait `core::mem::DynSized` may be implemented for a `!Sized` type |
| 81 | +to configure how the size of a value is computed from a reference. References |
| 82 | +to a type that implements `DynSized` are not required to store the value size |
| 83 | +as pointer metadata. |
| 84 | +
|
| 85 | +If a type that implements `DynSized` has no other associated pointer metadata |
| 86 | +(such as a vtable), then references to that type will have the same size and |
| 87 | +layout as a normal pointer. |
| 88 | +
|
| 89 | +```rust |
| 90 | +#[repr(C, align(8))] |
| 91 | +struct Request { |
| 92 | + size: u32, |
| 93 | + id: u16, |
| 94 | + flags: u16, |
| 95 | + data: [u8], |
| 96 | +} |
| 97 | +
|
| 98 | +unsafe impl core::mem::DynSized for Request { |
| 99 | + fn size_of_val(&self) -> usize { |
| 100 | + usize::try_from(self.size).unwrap_or(usize::MAX) |
| 101 | + } |
| 102 | +} |
| 103 | +
|
| 104 | +// size_of::<&Request>() == size_of::<*const ()>() |
| 105 | +``` |
| 106 | + |
| 107 | +The `DynSized` trait has a single required method, `size_of_val()`, which |
| 108 | +has the same semantics as `core::mem::size_of_val()`. |
| 109 | + |
| 110 | +```rust |
| 111 | +// core::mem |
| 112 | +pub unsafe trait DynSized { |
| 113 | + // Returns the size of the pointed-to value in bytes. |
| 114 | + fn size_of_val(&self) -> usize; |
| 115 | +} |
| 116 | +``` |
| 117 | + |
| 118 | +It is an error to `impl DynSized` for a type that is `Sized`. In other words, |
| 119 | +the following code is invalid: |
| 120 | + |
| 121 | +```rust |
| 122 | +#[repr(C, align(8))] |
| 123 | +struct SizedRequest { |
| 124 | + size: u32, |
| 125 | + id: u16, |
| 126 | + flags: u16, |
| 127 | + data: [u8; 1024], |
| 128 | +} |
| 129 | + |
| 130 | +// Compiler error: `impl DynSized` on a type that isn't `!Sized`. |
| 131 | +unsafe impl core::mem::DynSized for SizedRequest { |
| 132 | + fn size_of_val(&self) -> usize { |
| 133 | + usize::try_from(self.size).unwrap_or(usize::MAX) |
| 134 | + } |
| 135 | +} |
| 136 | +``` |
| 137 | + |
| 138 | +# Reference-level explanation |
| 139 | +[reference-level-explanation]: #reference-level-explanation |
| 140 | + |
| 141 | +The `core::mem::DynSized` trait acts as a signal to the compiler that the |
| 142 | +size of a value can be computed dynamically by the user-provided trait |
| 143 | +implementation. If references to that type would otherwise be of the layout |
| 144 | +`(ptr, usize)` due to being `!Sized`, then they can be reduced to `ptr`. |
| 145 | + |
| 146 | +The `DynSized` trait does not _guarantee_ that a type will have thin pointers, |
| 147 | +it merely enables it. This definition is intended to be compatible with RFC |
| 148 | +2580, in that types with complex pointer metadata would continue to have fat |
| 149 | +pointers. Such types may choose to implement `DynSized` by extracting their |
| 150 | +custom pointer metadata from `&self`. |
| 151 | + |
| 152 | +Implementing `DynSized` does not affect alignment, so the questions of how to |
| 153 | +handle unknown alignments of RFC 1861 `extern type` DSTs do not apply. |
| 154 | + |
| 155 | +In current Rust, a DST used as a `struct` field must be the final field of the |
| 156 | +`struct`. This restriction remains unchanged, as the offsets of any fields after |
| 157 | +a DST would be impossible to compute statically. |
| 158 | +- This also implies that any given `struct` may have at most one field that |
| 159 | + implements `DynSized`. |
| 160 | + |
| 161 | +A `struct` with a field that implements `DynSized` will also implicitly |
| 162 | +implement `DynSized`. The implicit implementation of `DynSized` computes the |
| 163 | +size of the struct up until the `DynSized` field, and then adds the result of |
| 164 | +calling `DynSized::size_of_val()` on the final field. |
| 165 | +- This implies it's not permitted to manually `impl DynSize` for a type that |
| 166 | + contains a field that implements `DynSize`. |
| 167 | + |
| 168 | +# Drawbacks |
| 169 | +[drawbacks]: #drawbacks |
| 170 | + |
| 171 | +## Mutability of value sizes |
| 172 | + |
| 173 | +If the size of a value is stored in the value itself, then that implies it can |
| 174 | +change at runtime. |
| 175 | + |
| 176 | +```rust |
| 177 | +struct MutableSize { size: usize } |
| 178 | +unsafe impl core::mem::DynSized for MutableSize { |
| 179 | + fn size_of_val(&self) -> usize { self.size } |
| 180 | +} |
| 181 | + |
| 182 | +let mut v = MutableSize { size: 8 }; |
| 183 | +println!("{:?}", core::mem::size_of_val(&v)); // prints "8" |
| 184 | +v.size = 16; |
| 185 | +println!("{:?}", core::mem::size_of_val(&v)); // prints "16" |
| 186 | +``` |
| 187 | + |
| 188 | +There may be existing code that assumes `size_of_val()` is constant for a given |
| 189 | +value, which is true in today's Rust due to the nature of fat pointers, but |
| 190 | +would no longer be true if `size_of_val()` is truly dynamic. |
| 191 | + |
| 192 | +Alternatively, the API contract for `DynSized` implementations could require |
| 193 | +that the result of `size_of_val()` not change for the lifetime of the allocated |
| 194 | +object. This would likely be true for nearly all interesting use cases, and |
| 195 | +would let `DynSized` values be stored in a `Box`. |
| 196 | + |
| 197 | +## Compatibility with existing fat-pointer DSTs |
| 198 | + |
| 199 | +It may be desirable for certain existing stabilized DSTs to implement |
| 200 | +`DynSized` -- for example, it is a natural fit for the planned redefinition of |
| 201 | +[`&core::ffi::CStr`][cstr] as a thin pointer. |
| 202 | + |
| 203 | +[cstr]: https://doc.rust-lang.org/core/ffi/struct.CStr.html |
| 204 | + |
| 205 | +Such a change to existing types might be backwards-incompatible for code that |
| 206 | +embeds those types as a `struct` field, because it would change the reference |
| 207 | +layout. For example, the following code compiles in stable Rust v1.73 but would |
| 208 | +be a compilation error if `&CStr` does not have the same layout as `&[u8]`. |
| 209 | + |
| 210 | +```rust |
| 211 | +struct ContainsCStr { |
| 212 | + cstr: core::ffi::CStr, |
| 213 | +} |
| 214 | +impl ContainsCStr { |
| 215 | + fn as_bytes(&self) -> &[u8] { |
| 216 | + unsafe { core::mem::transmute(self) } |
| 217 | + } |
| 218 | +} |
| 219 | +``` |
| 220 | + |
| 221 | +The above incompatibility of a redefined `&CStr` exists regardless of this RFC, |
| 222 | +but it's worth noting that implementing `DynSized` would be a backwards |
| 223 | +incompatible change for existing DSTs. |
| 224 | + |
| 225 | +# Rationale and alternatives |
| 226 | +[rationale-and-alternatives]: #rationale-and-alternatives |
| 227 | + |
| 228 | +This design is less generic than some of the alternatives (including custom DSTs |
| 229 | +and extern types), but has the advantage being much more tightly scoped and |
| 230 | +therefore is expected to have no major blockers. It directly addresses one of |
| 231 | +the pain points for use of Rust in a low-level performance-sensitive codebase, |
| 232 | +while avoiding large-scale language changes to the extent possible. |
| 233 | + |
| 234 | +Without this change, people will continue to either use thick-pointer DSTs |
| 235 | +(reducing performance relative to C), or write Rust types that claim to be |
| 236 | +`Sized` but actually aren't (the infamous `_data: [u8; 0]` hack). |
| 237 | + |
| 238 | +# Prior art |
| 239 | +[prior-art]: #prior-art |
| 240 | + |
| 241 | +The canonical prior art is the C language idiom of a `struct` that's implicitly |
| 242 | +followed by a dynamically-sized value. This idiom was standardized in C99 under |
| 243 | +the term "flexible array member": |
| 244 | + |
| 245 | +> As a special case, the last element of a structure with more than one named |
| 246 | +> member may have an incomplete array type; this is called a flexible array |
| 247 | +> member. [...] However, when a `.` (or `->`) operator has a left operand that |
| 248 | +> is (a pointer to) a structure with a flexible array member and the right |
| 249 | +> operand names that member, it behaves as if that member were replaced with the |
| 250 | +> longest array (with the same element type) that would not make the structure |
| 251 | +> larger than the object being accessed; |
| 252 | +
|
| 253 | +The use of flexible array members (either with C99 syntax or not) is widespread |
| 254 | +in C APIs, especially when sending structured data between processes ([IPC]) or |
| 255 | +between a process and the kernel. For example, the Linux kernel's [FUSE] |
| 256 | +protocol communicates with userspace via length-prefixed dynamically-sized |
| 257 | +request/response buffers. |
| 258 | + |
| 259 | +They're also common when implementing low-level network protocols, which have |
| 260 | +length-delimited frames comprising a fixed-layout header followed by a variable |
| 261 | +amount of payload data. |
| 262 | + |
| 263 | +[IPC]: https://en.wikipedia.org/wiki/Inter-process_communication |
| 264 | +[FUSE]: https://www.kernel.org/doc/html/v6.3/filesystems/fuse.html |
| 265 | + |
| 266 | +In the context of Rust, the two RFCs mentioned earlier both cover thin-pointer |
| 267 | +DSTs as part of their more general extensions to the Rust type system: |
| 268 | +- [RFC 1861: `extern_types`](https://rust-lang.github.io/rfcs/1861-extern-types.html) |
| 269 | +- [RFC 2580: `ptr_metadata`](https://rust-lang.github.io/rfcs/2580-ptr-meta.html) |
| 270 | + |
| 271 | +Also, there have been non-approved RFC proposals involving thin-pointer DSTs: |
| 272 | +- [[rfcs/pull#709] truly unsized types](https://github.com/rust-lang/rfcs/pull/709) |
| 273 | +- [[rfcs/pull#1524] Custom Dynamically Sized Types](https://github.com/rust-lang/rfcs/pull/1524) |
| 274 | +- [[rfcs/pull#2255] More implicit bounds (?Sized, ?DynSized, ?Move)](https://github.com/rust-lang/rfcs/issues/2255) |
| 275 | + |
| 276 | +# Unresolved questions |
| 277 | +[unresolved-questions]: #unresolved-questions |
| 278 | + |
| 279 | +None so far |
| 280 | + |
| 281 | +# Future possibilities |
| 282 | +[future-possibilities]: #future-possibilities |
| 283 | + |
| 284 | +None so far. Further exploration of opaque types and/or custom pointer metadata |
| 285 | +already has separate dedicated RFCs. This one is just to get an MVP for types |
| 286 | +that should be `!Sized` without fat pointers. |
0 commit comments