Skip to content

Commit 33053bb

Browse files
authored
Merge pull request #60 from RalfJung/const-ub-rfc
require CTFE to detect UB
2 parents 9d8878e + ecc7623 commit 33053bb

File tree

1 file changed

+49
-25
lines changed

1 file changed

+49
-25
lines changed

rfcs/0000-const-ub.md

+49-25
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,8 @@
66
# Summary
77
[summary]: #summary
88

9-
Define UB during const evaluation to lead to an unspecified result for the affected CTFE query, but not otherwise infect the compilation process.
9+
Define how UB during const evaluation is treated:
10+
some kinds of UB must be detected, the rest leads to an unspecified result for the affected CTFE query (but does not otherwise "taint" the compilation process).
1011

1112
# Motivation
1213
[motivation]: #motivation
@@ -21,43 +22,69 @@ There are some values that Rust needs to compute at compile-time.
2122
This includes the initial value of a `const`/`static`, and array lengths (and more general, const generics).
2223
Computing these initial values is called compile-time function evaluation (CTFE).
2324
CTFE in Rust is very powerful and permits running almost arbitrary Rust code.
24-
This begs the question, what happens when there is `unsafe` code and it causes Undefined Behavior (UB)?
25+
This begs the question, what happens when there is `unsafe` code and it causes [Undefined Behavior (UB)][UB]?
2526

26-
The answer is that in this case, the final value that is currently being executed is arbitrary.
27-
For example, when UB arises while computing an array length, then the final array length can be any `usize`, or it can be (partially) uninitialized memory.
28-
No guarantees are made about this final value, and it can be different depending on host and target architecture, compiler flags, and more.
29-
However, UB will not otherwise adversely affect the currently running compiler; type-checking and lints and everything else will work correctly given whatever the result of the CTFE computation is.
30-
31-
Note, however, that this means compile-time UB can later cause runtime UB when the program is actually executed:
32-
for example, if there is UB while computing the initial value of a `Vec<i32>`, the result might be a completely invalid vector that causes UB at runtime when used in the program.
33-
34-
Sometimes, the compiler might be able to detect such problems and show an error or warning about CTFE computation having gone wrong (for example, the compiler might detect when the array length ends up being uninitialized).
35-
But other times, this might not be the case -- UB is not reliably detected during CTFE.
27+
The answer depends on the kind of UB: some kinds of UB are guaranteed to be detected,
28+
while other kinds of UB might either be detected, or else evaluation will continue as if the violated UB condition did not exist (i.e., as if this operation was actually defined).
3629
This can change from compiler version to compiler version: CTFE code that causes UB could build fine with one compiler and fail to build with another.
3730
(This is in accordance with the general policy that unsound code is not subject to strict stability guarantees.)
3831

32+
[UB]: https://doc.rust-lang.org/reference/behavior-considered-undefined.html
33+
3934
# Reference-level explanation
4035
[reference-level-explanation]: #reference-level-explanation
4136

42-
When UB arises as part of CTFE, the result of this evaluation is an unspecified constant.
43-
The compiler might be able to detect that UB occurred and raise an error or a warning, but this is not mandated, and absence of lints does not imply absence of UB.
37+
The following kinds of UB are detected by CTFE, and will cause compilation to stop with an error:
38+
* Incorrect use of compiler intrinsics (e.g., reaching an `unreachable` or violating the assumptions of `exact_div`).
39+
* Dereferencing dangling pointers.
40+
* Using an invalid value in an arithmetic, logical or control-flow operation.
41+
42+
These kinds of UB have in common that there is nothing sensible evaluation can do besides stopping with an error.
43+
44+
Other kinds of UB might or might not be detected:
45+
* Dereferencing unaligned pointers.
46+
* Violating Rust's aliasing rules.
47+
* Producing an invalid value (but not using it in one of the ways defined above).
48+
* Any [other UB][UB] not listed here.
49+
50+
All of this UB has in common that there is an "obvious" way to continue evaluation even though the program has caused UB:
51+
we can just access the underlying memory despite alignment and/or aliasing rules being violated, and we can just ignore the existence of an invalid value as long as it is not used in some arithmetic, logical or control-flow operation.
52+
There is no guarantee that CTFE detects such UB: evaluation may either fail with an error, or continue with the "obvious" result.
53+
54+
If the compile-time evaluation uses operations that are specified as non-deterministic,
55+
and only some of the non-deterministic choices lead to CTFE-detected UB,
56+
then CTFE may choose any possible execution and thus miss the possible UB.
57+
For example, if we end up specifying the value of padding after a typed copy to be non-deterministically chosen, then padding will be initialized in some executions and uninitialized in others.
58+
If the program then performs integer arithmetic on a padding byte, that might or might not be detected as UB, depending on the non-deterministic choice made by CTFE.
59+
60+
## Note to implementors
61+
62+
This requirement implies that CTFE must happen on code that was *not subject to UB-exploiting optimizations*.
63+
In general, optimizations of Rust code may assume that the source program does not have UB, so programs that exhibit UB can simply be ignored when arguing for the correctness of an optimization.
64+
However, this can lead to programs with UB being translated into programs without UB, so if constant evaluation runs after such an optimization, it might fail to detect the UB.
65+
The only permissible optimizations are those that preserve all UB and that preserve the behavior of programs whose UB CTFE does not detect.
66+
Formally speaking this means they must be correct optimizations for the abstract machine *that CTFE actually implements*, not just for the abstract machine that specifies Rust; and moreover they must preserve the location and kind of UB that is detected by CTFE.
4467

4568
# Drawbacks
4669
[drawbacks]: #drawbacks
4770

48-
This means UB during CTFE can silently "corrupt" the build in a way that the final program has UB when being executed
49-
(but not more so than if the CTFE code would instead have been run at runtime).
71+
To be able to either detect UB or continue evaluation in a well-defined way, CTFE must run on unoptimized code.
72+
This means when compiling a `const fn` in some crate, the unoptimized code needs to be stored.
73+
So either the code is stored twice (optimized and unoptimized), or optimizations can only happen after all CTFE results have been computed.
74+
[Experiments in rustc](https://perf.rust-lang.org/compare.html?start=35debd4c111610317346f46d791f32551d449bd8&end=3dbdd3b981f75f965ac04452739653a3d47ff0ed) showed a severe performance impact on CTFE stress-tests, but no impact on real code except for a slowdown of "incr-unchanged" (which are rather fast so small changes lead to large percentages).
5075

5176
# Rationale and alternatives
5277
[rationale-and-alternatives]: #rationale-and-alternatives
5378

5479
The most obvious alternative is to say that UB during CTFE will definitely be detected.
5580
However, that is expensive and might even be impossible.
5681
Even Miri does not currently detect all UB, and Miri is already performing many additional checks that would significantly slow down CTFE.
57-
Furthermore, since optimizations can "hide" UB (an optimization can turn a program with UB into one without), this means we would have to run CTFE on unoptimized MIR.
58-
And finally, implementing these checks requires a more precise understanding of UB than we currently have; basically, this would block having any potentially-UB operations at const-time on having a spec for Rust that precisely describes their UB in a checkable way.
82+
Furthermore, implementing these checks requires a more precise understanding of UB than we currently have; basically, this would block having any potentially-UB operations at const-time on having a spec for Rust that precisely describes their UB in a checkable way.
5983
In particular, this would mean we need to decide on an aliasing model before permitting raw pointers in CTFE.
6084

85+
To avoid the need for keeping the unoptimized sources of `const fn` around, we could weaken the requirement for detecting UB and instead say that UB might cause arbitrary evaluation results.
86+
Under the assumption that unsound code is not subject to the usual stability guarantees, this is an option we can still move to in the future, should it turn out that the proposal made in this RFC is too expensive.
87+
6188
Another extreme alternative would be to say that UB during CTFE may have arbitrary effects in the host compiler, including host-level UB.
6289
Basically this would mean that CTFE would be allowed to "leave its sandbox".
6390
This would allow JIT'ing CTFE and running the resulting code unchecked.
@@ -68,11 +95,9 @@ While compiling untrusted code should only be done with care (including addition
6895

6996
C++ requires compilers to detect UB in `constexpr`.
7097
However, the fragment of C++ that is available to `constexpr` excludes pointer casts, pointer arithmetic (beyond array bounds), and union-based type punning, which makes such checks not very complicated and avoids most of the poorly specified parts of UB.
71-
72-
If we found a way to run CTFE on unoptimized MIR, then detecting UB for programs that do not use unions, `transmute`, or raw pointers is not very hard.
73-
CTFE already has almost all the checks required for this, except for alignment checks which are disabled during CTFE.
74-
(Disabling them was the easiest way forward to solve some issues around packed structs in patterns, but we could use a different solution and reinstate CTFE alignment checks.
75-
The relevant code paths still exist for Miri.)
98+
The corresponding type-punning-free fragment of Rust (no raw pointers, no `union`, no `transmute`) can only cause UB that is defined UB to be definitely detected during CTFE.
99+
In that sense, rust achieves feature parity with C++ in terms of UB detection during CTFE.
100+
(Indeed, this was the prime motivation for making such strict UB detection requirements in the first place.)
76101

77102
# Unresolved questions
78103
[unresolved-questions]: #unresolved-questions
@@ -84,5 +109,4 @@ Currently none.
84109

85110
This RFC provides an easy way forward for "unconst" operations, i.e., operations that are safe at run-time but not at compile-time.
86111
Primary examples of such operations are anything involving the integer representation of pointers, which cannot be known at compile-time.
87-
If this RFC were accepted, we could declare such operations UB during CTFE (and thus naturally they would only be permitted in an `unsafe` block).
88-
This still leaves the door open for providing better guarantees in the future.
112+
If this RFC were accepted, we could declare such operations "definitely detected UB" during CTFE (and thus naturally they would only be permitted in an `unsafe` block).

0 commit comments

Comments
 (0)