Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading Pointer bytes as Integers #547

Open
chorman0773 opened this issue Dec 6, 2024 · 10 comments
Open

Reading Pointer bytes as Integers #547

chorman0773 opened this issue Dec 6, 2024 · 10 comments

Comments

@chorman0773
Copy link
Contributor

chorman0773 commented Dec 6, 2024

This came up in rust-lang/reference#1664. I wanted to ask what T-opsem thinks about the behaviour of reading pointer bytes as integer types (or as char/bool/etc.).

As far as I can tell, there are two "sensible" behaviours, given that integers themselves do no carry provenance:

  • The pointer fragment is ignored,
  • Decoding error (thus undefined behaviour).

Given provenance monotonicity, which would be violated by the decoding error, it seems like the best option is that the fragments are ignored. Is there anything missed here? If not, can we get a formal sign off on this behaviour.

Note that I'm only considering the runtime behaviour, which can be a point against adopting the behaviour. Given that it's impossible to get the address of certain pointers in const-eval, it does need to be undefined behaviour (or otherwise an error) to read pointer bytes (to at least symbolic allocations) as integer types.

@saethlin
Copy link
Member

saethlin commented Dec 6, 2024

Given that it's impossible to get the address of certain pointers

Which pointers?

@chorman0773
Copy link
Contributor Author

I failed to clarify that. It was referring to the consteval AM, where allocations that exist outside of the particular constant evaluation (what I call symbolic pointers) can't be assigned an address.

@RalfJung
Copy link
Member

RalfJung commented Dec 6, 2024

Const-eval can't assign an address to any allocation, "inside" or "outside". (Not sure what you mean with that distinction.)

@chorman0773

This comment has been minimized.

@RalfJung

This comment has been minimized.

@RalfJung
Copy link
Member

RalfJung commented Dec 6, 2024

Anyway that sub-discussion seems off-topic here, please move it to Zulip. And please update the issue description to clarify that "certain pointers" refers to const-eval.

@chorman0773
Copy link
Contributor Author

I suppose the third alternative that should be addressed is that the read exposes the pointer bytes, but I don't like that suggestion (and I recall few people did), as it means that reads can result in a side effect, and such reads as an integer type can never be elided.

Is there any other alternative I'm missing?

@RalfJung
Copy link
Member

Yeah I definitely don't like that suggestion, it pessimizes optimization too much. It is worth mentioning that that third alternative is basically what PNVI-ae-udi mandates for C. I am curious if compilers will actually implement that, though.

@RalfJung
Copy link
Member

RalfJung commented Dec 17, 2024

Note that I'm only considering the runtime behaviour, which can be a point against adopting the behaviour. Given that it's impossible to get the address of certain pointers in const-eval, it does need to be undefined behaviour (or otherwise an error) to read pointer bytes (to at least symbolic allocations) as integer types.

We could characterize this as a "unsupported in const-eval" error rather than a UB error. (Internally in rustc this is already what we do, ReadPointerAsInt is a variant of UnsupportedOpInfo. However we don't clearly distinguish those cases in the error message AFAIK, and we do call this UB in the transmute docs.)

That would be similar to how is_null is sometimes unsupported in const-eval.

@RalfJung
Copy link
Member

There's another aspect of provenance that we haven't officially decided yet and that is implicitly excluded by the current wording in rust-lang/reference#1664: do the individual bytes in a pointer "remember" where in the pointer they are, and have to be put back in the same order? Some formal models require this, and if we ever allow "taking apart" the bytes of a pointer in const-eval we'll also have to require this, but for runtime semantics we could decide either way.

The one example of code that I am aware of that breaks this requirement is XOR linked lists, which can be implemented in the semantics sketched in MiniRust right now but can't be implemented if bytes with provenance remember their position in the pointer. That's not exactly realistic code, but it is somewhat satisfying that (on architectures where pointers have at least 2 bytes), XOR linked lists can be implemented.

The main upside of requiring the same bytes in the same order is that it rules out pointer crimes like XOR linked lists so if there's some unexpected interactions there, we'd not be affected. I am not aware of an optimization that would benefit from this UB, it's mostly a case of "ruling out some rather cursed programs to avoid locking ourselves into an unexpected corner". In some sense the model becomes a bit simpler since we can just say, pointer bytes must be put back together in the same order they started out as before they can be treated as a pointer again, but the actual op.sem would become more complicated because of the extra bookkeeping is required to enforce this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants