-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC - Zero-Sized References #2040
Changes from all commits
fd4ad1a
fe66f6d
7b75968
e6e2239
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,149 @@ | ||
- Feature Name: zero_sized_references | ||
- Start Date: 2017-06-23 | ||
- RFC PR: [#2040](https://github.com/rust-lang/rfcs/pull/2040) | ||
- Rust Issue: (leave this empty) | ||
|
||
# Summary | ||
[summary]: #summary | ||
|
||
References to Zero-Sized Types (both shared and mutable) have been historically the size of `usize`. | ||
The proposed change is to make them also ZST. | ||
|
||
# Motivation | ||
[motivation]: #motivation | ||
|
||
References to any type in rust are represented as a pointer. Usually the pointer is smaller and faster to move around. | ||
However for Zero-Sized Types that only have a single value (for example `()` ) moving around is a no-op, and can be optimized away. | ||
Reading and writing the value is a no-op since it has only a single value anyway and therefore it carries no extra information. | ||
However, currently the compiler can't optimize away the pointer from data structures. | ||
|
||
Zero-Sized Types are useful for functions, lifetime guarantees and destructors, | ||
and references to them can be used to show these types "exist" (for references) or "you are the only one using it" (for mutable references). | ||
The actual value is meaningless and the representation should be optimized to be of size 0 as well. | ||
|
||
In addition, references to Zero-Sized Types often appear in polymorphic code, where they handle non-ZST as well. | ||
|
||
In both of these cases, it will be an advantage for the references to be Zero-Sized. | ||
|
||
# Detailed design | ||
[design]: #detailed-design | ||
|
||
### Calculating size | ||
[calculating-size]: #calculating-size | ||
|
||
Disclaimer: The writer of this RFC is not familiar with the inworkings of the compiler. | ||
|
||
|
||
Finding if a reference points to a ZST or not may not always be trivial. | ||
A struct with a reference to itself will not know it's size until after it knows the size to the reference. | ||
|
||
However, a couple of notes: | ||
|
||
```rust | ||
struct A<'a> (&'a A<'a>); | ||
``` | ||
This struct cannot be instantiated, because the very first instance of it requires an instance of it to already exist. | ||
|
||
```rust | ||
enum A<'a> { | ||
ZeroSized, | ||
SelfRef (&'a A<'a>), | ||
} | ||
``` | ||
The moment an enum has more than a single value, it cannot be Zero-Sized. Otherwise it isn't different than a struct. | ||
|
||
Therefore, I propose to assume that whenever you find a self reference (or multiple types referencing in a loop), | ||
decide the reference is not Zero-Sized, since there most likely WILL be other data somewhere in the chain. | ||
This is most relevant to unions, which could have self-references and be instantiated at the same time. | ||
|
||
### `*const` and `*mut` pointers | ||
[pointers]: #pointers | ||
|
||
It is possible to convert a reference to a pointer. Currently, a reference to ZST points to an arbitrary location, | ||
and when converting to a pointer the pointer recieves that arbitrary location. | ||
|
||
After this change, the reference will not hold any data. I propose that whenever a ZST reference is converted to a pointer, | ||
a warning/error be issued ("Warning: taking the address of a Zero-Sized Type is meaningless") and the pointer will recieve an address | ||
with the same algorithm that assigned an address to the reference in our current implementation. | ||
|
||
The purpose is to not break current code that might do this. We probably don't want to assign Null since for pointers it has | ||
a meaning that the value doesn't exist, which is different than "exist but no data". | ||
|
||
Converting in the other direction, the value of the pointer will be silently dropped - that value never had a meaning in the first place. | ||
|
||
# How We Teach This | ||
[how-we-teach-this]: #how-we-teach-this | ||
|
||
For most rust users, this change will be invisible. Their code will just become a tiny bit smaller. | ||
|
||
Users of unsafe rust might encounter this case. Therefore there should probably be a note in the nomicon that references might be optimized away for ZST. | ||
|
||
# Drawbacks | ||
[drawbacks]: #drawbacks | ||
|
||
### Breaking code | ||
|
||
Any place that assumed a reference holds a pointer might introduce bugs. | ||
|
||
FFI might behave differently than now, breaking code (see Unresolved questions). | ||
|
||
# Alternatives | ||
[alternatives]: #alternatives | ||
|
||
The system that is now works well, and does not have to be changed. | ||
|
||
# Unresolved questions | ||
[unresolved]: #unresolved-questions | ||
|
||
### Definition of `&` | ||
|
||
Is the definition of `&` state that we guarantee it's a pointer? Or do we only promise you can use it to access the data? | ||
|
||
### FFI + Escape mechanism | ||
|
||
Do we need a way to remove the optimization? For example, if we have | ||
```rust | ||
struct ExternalStruct; | ||
|
||
let x: &mut ExternalStruct = ffi_function(); | ||
other_ffi_function(x); | ||
``` | ||
We might want to represent a pointer we got from FFI and checked it's not null as a reference to an object. | ||
Since that object is not accessable directly we represent it as an empty struct. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. iirc that is not how external structures should be represented, so breaking that at compile-time sounds like a win to me. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. afaik that's currently the best way to represent external structures, see the discussion on the |
||
However then the references to it are optimized away and we lose the pointer and then can't call the second function. | ||
|
||
Note: This break will not be silent. The safe wrapper for other_ffi_function will convert the reference back to pointer before | ||
using it, raising the warning/error of converting references to pointers. | ||
|
||
For that case, we might want to mark `ExternalStruct` as Non-Zero-Sized. Possibly | ||
```rust | ||
impl !Sized for ExternalStruct; | ||
``` | ||
and maybe it should require `unsafe`. | ||
|
||
### Errors of conversion | ||
|
||
Do we want to give an error or a warning for converting references to pointers? | ||
|
||
The code above is an example where we break working code, so an error might be needed to show the significance. | ||
|
||
However, some conversions might be meaningless and wouldn't affect the execution, so the programmer might allow the conversion. | ||
|
||
### Mitigating breakage | ||
|
||
Safe rust code should be affected positively by this change. However unsafe code might break. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a dangerous statement. Unsafe code isn't any less stable than safe code. I think there should be some review of existing code that might break before deciding whether to do this. If all the unsafe code that breaks is some code that should never have been written in the first place because there's better ways to do it, then imo that would be ok. |
||
|
||
FFI is especially vulnurable to this change, as shown above. | ||
Are there better ways to deal with these errors without user involvement? | ||
|
||
(Unsafe code might break - are there examples of VALID code that breaks? Or does only invalid uses of references break? | ||
And what is our stance on breaking invalid code, assuming there is a large amount of it?) | ||
|
||
### Specific examples - Pro | ||
|
||
The RFC isn't well-justified until it has at least one detailed use case where it helps. | ||
Please share specific examples of code where Zero-Sized references are useful. | ||
|
||
### Specific examples against | ||
|
||
If you have specific examples where this change is detrimental, please share them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also mess with monomorphization, because you can't know the size of a reference anymore without knowing the type. I'm not sure of the implications for generic functions. Currently
transmute<&T, usize>
works inside of generic functions. But we had the same issue with transmuting fn pointers, so there's some precedent for that.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that's already the case:
&mut Trait
has a different size than&mut i32
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do it based on sized-ness and the kind of "maybe-DST tail" when that is unknown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is sized-ness any easier to track than zero-sizedness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sized-ness is "easier" to track than zero-sizedness, because a type parameter is assumed sized unless one explicitly declares otherwise via the
Sized?
marker. Once you have<T: Sized?>
, then the generic code cannot make assumptions about the sized-ness ofT
.