Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC - Zero-Sized References #2040

Closed
wants to merge 4 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
149 changes: 149 additions & 0 deletions text/0000-zero-sized-references.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
- Feature Name: zero_sized_references
- Start Date: 2017-06-23
- RFC PR: [#2040](https://github.com/rust-lang/rfcs/pull/2040)
- Rust Issue: (leave this empty)

# Summary
[summary]: #summary

References to Zero-Sized Types (both shared and mutable) have been historically the size of `usize`.
The proposed change is to make them also ZST.

# Motivation
[motivation]: #motivation

References to any type in rust are represented as a pointer. Usually the pointer is smaller and faster to move around.
However for Zero-Sized Types that only have a single value (for example `()` ) moving around is a no-op, and can be optimized away.
Reading and writing the value is a no-op since it has only a single value anyway and therefore it carries no extra information.
However, currently the compiler can't optimize away the pointer from data structures.

Zero-Sized Types are useful for functions, lifetime guarantees and destructors,
and references to them can be used to show these types "exist" (for references) or "you are the only one using it" (for mutable references).
The actual value is meaningless and the representation should be optimized to be of size 0 as well.

In addition, references to Zero-Sized Types often appear in polymorphic code, where they handle non-ZST as well.

In both of these cases, it will be an advantage for the references to be Zero-Sized.

# Detailed design
[design]: #detailed-design

### Calculating size
[calculating-size]: #calculating-size

Disclaimer: The writer of this RFC is not familiar with the inworkings of the compiler.


Finding if a reference points to a ZST or not may not always be trivial.
A struct with a reference to itself will not know it's size until after it knows the size to the reference.

However, a couple of notes:

```rust
struct A<'a> (&'a A<'a>);
```
This struct cannot be instantiated, because the very first instance of it requires an instance of it to already exist.

```rust
enum A<'a> {
ZeroSized,
SelfRef (&'a A<'a>),
}
```
The moment an enum has more than a single value, it cannot be Zero-Sized. Otherwise it isn't different than a struct.

Therefore, I propose to assume that whenever you find a self reference (or multiple types referencing in a loop),
decide the reference is not Zero-Sized, since there most likely WILL be other data somewhere in the chain.
This is most relevant to unions, which could have self-references and be instantiated at the same time.

### `*const` and `*mut` pointers
[pointers]: #pointers

It is possible to convert a reference to a pointer. Currently, a reference to ZST points to an arbitrary location,
and when converting to a pointer the pointer recieves that arbitrary location.

After this change, the reference will not hold any data. I propose that whenever a ZST reference is converted to a pointer,
a warning/error be issued ("Warning: taking the address of a Zero-Sized Type is meaningless") and the pointer will recieve an address
with the same algorithm that assigned an address to the reference in our current implementation.

The purpose is to not break current code that might do this. We probably don't want to assign Null since for pointers it has
a meaning that the value doesn't exist, which is different than "exist but no data".

Converting in the other direction, the value of the pointer will be silently dropped - that value never had a meaning in the first place.

# How We Teach This
[how-we-teach-this]: #how-we-teach-this

For most rust users, this change will be invisible. Their code will just become a tiny bit smaller.

Users of unsafe rust might encounter this case. Therefore there should probably be a note in the nomicon that references might be optimized away for ZST.

# Drawbacks
[drawbacks]: #drawbacks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also mess with monomorphization, because you can't know the size of a reference anymore without knowing the type. I'm not sure of the implications for generic functions. Currently transmute<&T, usize> works inside of generic functions. But we had the same issue with transmuting fn pointers, so there's some precedent for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that's already the case: &mut Trait has a different size than &mut i32.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do it based on sized-ness and the kind of "maybe-DST tail" when that is unknown.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is sized-ness any easier to track than zero-sizedness?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sized-ness is "easier" to track than zero-sizedness, because a type parameter is assumed sized unless one explicitly declares otherwise via the Sized? marker. Once you have <T: Sized?>, then the generic code cannot make assumptions about the sized-ness of T.


### Breaking code

Any place that assumed a reference holds a pointer might introduce bugs.

FFI might behave differently than now, breaking code (see Unresolved questions).

# Alternatives
[alternatives]: #alternatives

The system that is now works well, and does not have to be changed.

# Unresolved questions
[unresolved]: #unresolved-questions

### Definition of `&`

Is the definition of `&` state that we guarantee it's a pointer? Or do we only promise you can use it to access the data?

### FFI + Escape mechanism

Do we need a way to remove the optimization? For example, if we have
```rust
struct ExternalStruct;

let x: &mut ExternalStruct = ffi_function();
other_ffi_function(x);
```
We might want to represent a pointer we got from FFI and checked it's not null as a reference to an object.
Since that object is not accessable directly we represent it as an empty struct.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc that is not how external structures should be represented, so breaking that at compile-time sounds like a win to me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

afaik that's currently the best way to represent external structures, see the discussion on the extern type RFC

However then the references to it are optimized away and we lose the pointer and then can't call the second function.

Note: This break will not be silent. The safe wrapper for other_ffi_function will convert the reference back to pointer before
using it, raising the warning/error of converting references to pointers.

For that case, we might want to mark `ExternalStruct` as Non-Zero-Sized. Possibly
```rust
impl !Sized for ExternalStruct;
```
and maybe it should require `unsafe`.

### Errors of conversion

Do we want to give an error or a warning for converting references to pointers?

The code above is an example where we break working code, so an error might be needed to show the significance.

However, some conversions might be meaningless and wouldn't affect the execution, so the programmer might allow the conversion.

### Mitigating breakage

Safe rust code should be affected positively by this change. However unsafe code might break.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a dangerous statement. Unsafe code isn't any less stable than safe code. I think there should be some review of existing code that might break before deciding whether to do this. If all the unsafe code that breaks is some code that should never have been written in the first place because there's better ways to do it, then imo that would be ok.


FFI is especially vulnurable to this change, as shown above.
Are there better ways to deal with these errors without user involvement?

(Unsafe code might break - are there examples of VALID code that breaks? Or does only invalid uses of references break?
And what is our stance on breaking invalid code, assuming there is a large amount of it?)

### Specific examples - Pro

The RFC isn't well-justified until it has at least one detailed use case where it helps.
Please share specific examples of code where Zero-Sized references are useful.

### Specific examples against

If you have specific examples where this change is detrimental, please share them.