Skip to content

Commit 356cb57

Browse files
authored
Merge pull request #2360 from gnzlbg/black_box
RFC: hint::bench_black_box
2 parents 56508b1 + bda0ba6 commit 356cb57

File tree

1 file changed

+282
-0
lines changed

1 file changed

+282
-0
lines changed

text/2360-bench-black-box.md

+282
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,282 @@
1+
- Feature Name: `bench_black_box`
2+
- Start Date: 2018-03-12
3+
- RFC PR: [rust-lang/rfcs#2360](https://github.com/rust-lang/rfcs/pull/2360)
4+
- Rust Issue: [rust-lang/rust#64102](https://github.com/rust-lang/rust/issues/64102)
5+
6+
# Summary
7+
[summary]: #summary
8+
9+
This RFC adds `core::hint::bench_black_box` (see [black box]), an identity function
10+
that hints the compiler to be maximally pessimistic in terms of the assumptions
11+
about what `bench_black_box` could do.
12+
13+
[black box]: https://en.wikipedia.org/wiki/black_box
14+
15+
# Motivation
16+
[motivation]: #motivation
17+
18+
Due to the constrained nature of synthetic benchmarks, the compiler is often
19+
able to perform optimizations that wouldn't otherwise trigger in practice, like
20+
completely removing a benchmark if it has no side-effects.
21+
22+
Currently, stable Rust users need to introduce expensive operations into their
23+
programs to prevent these optimizations. Examples thereof are volatile loads and
24+
stores, or calling unknown functions via C FFI. These operations incur overheads
25+
that often would not be present in the application the synthetic benchmark is
26+
trying to model.
27+
28+
# Guide-level explanation
29+
[guide-level-explanation]: #guide-level-explanation
30+
31+
## `hint::bench_black_box`
32+
33+
The hint:
34+
35+
```rust
36+
pub fn bench_black_box<T>(x: T) -> T;
37+
```
38+
39+
behaves like the [identity function][identity_fn]: it just returns `x` and has
40+
no effects. However, Rust implementations are _encouraged_ to assume that
41+
`bench_black_box` can use `x` in any possible valid way that Rust code is allowed to
42+
without introducing undefined behavior in the calling code. That is,
43+
implementations are encouraged to be maximally pessimistic in terms of
44+
optimizations.
45+
46+
This property makes `bench_black_box` useful for writing code in which certain
47+
optimizations are not desired, but too unreliable when disabling these
48+
optimizations is required for correctness.
49+
50+
### Example 1 - basics
51+
52+
Example 1 ([`rust.godbolt.org`](https://godbolt.org/g/YP2GCJ)):
53+
54+
```rust
55+
fn foo(x: i32) -> i32 {
56+
hint::bench_black_box(2 + x);
57+
3
58+
}
59+
let a = foo(2);
60+
```
61+
62+
In this example, the compiler may simplify the expression `2 + x` down to `4`.
63+
However, even though `4` is not read by anything afterwards, it must be computed
64+
and materialized, for example, by storing it into memory, a register, etc.
65+
because the current Rust implementation assumes that `bench_black_box` could try to
66+
read it.
67+
68+
### Example 2 - benchmarking `Vec::push`
69+
70+
The `hint::bench_black_box` is useful for producing synthetic benchmarks that more
71+
accurately represent the behavior of a real application. In the following
72+
example, the function `bench` executes `Vec::push` 4 times in a loop:
73+
74+
```rust
75+
fn push_cap(v: &mut Vec<i32>) {
76+
for i in 0..4 {
77+
v.push(i);
78+
}
79+
}
80+
81+
pub fn bench_push() -> Duration {
82+
let mut v = Vec::with_capacity(4);
83+
let now = Instant::now();
84+
push_cap(&mut v);
85+
now.elapsed()
86+
}
87+
```
88+
89+
This example allocates a `Vec`, pushes into it without growing its capacity, and
90+
drops it, without ever using it for anything. The current Rust implementation
91+
emits the following `x86_64` machine code (https://rust.godbolt.org/z/wDckJF):
92+
93+
94+
```asm
95+
example::bench_push:
96+
sub rsp, 24
97+
call std::time::Instant::now@PLT
98+
mov qword ptr [rsp + 8], rax
99+
mov qword ptr [rsp + 16], rdx
100+
lea rdi, [rsp + 8]
101+
call std::time::Instant::elapsed@PLT
102+
add rsp, 24
103+
ret
104+
```
105+
106+
LLVM is pretty amazing: it has optimized the `Vec` allocation and the calls to
107+
`push_cap` away. In doing so, it has made our benchmark useless. It won't
108+
measure the time it takes to perform the calls to `Vec::push` as we intended.
109+
110+
In real applications, the program will use the vector for something, preventing
111+
these optimizations. To produce a benchmark that takes that into account, we can
112+
hint the compiler that the `Vec` is used for something
113+
(https://rust.godbolt.org/z/CeXmxN):
114+
115+
```rust
116+
fn push_cap(v: &mut Vec<i32>) {
117+
for i in 0..4 {
118+
bench_black_box(v.as_ptr());
119+
v.push(bench_black_box(i));
120+
bench_black_box(v.as_ptr());
121+
}
122+
}
123+
```
124+
125+
Inspecting the machine code reveals that, for this particular Rust
126+
implementation, `bench_black_box` successfully prevents LLVM from performing the
127+
optimization that removes the `Vec::push` calls that we wanted to measure.
128+
129+
# Reference-level explanation
130+
[reference-level-explanation]: #reference-level-explanation
131+
132+
The
133+
134+
```rust
135+
mod core::hint {
136+
/// Identity function that disables optimizations.
137+
pub fn bench_black_box<T>(x: T) -> T;
138+
}
139+
```
140+
141+
is a `NOP` that returns `x`, that is, its operational semantics are equivalent
142+
to the [identity function][identity_fn].
143+
144+
145+
Implementations are encouraged, _but not required_, to treat `bench_black_box` as an
146+
_unknown_ function that can perform any valid operation on `x` that Rust is
147+
allowed to perform without introducing undefined behavior in the calling code.
148+
That is, to optimize `bench_black_box` under the pessimistic assumption that it might
149+
do anything with the data it got, even though it actually does nothing.
150+
151+
[identity_fn]: https://doc.rust-lang.org/nightly/std/convert/fn.identity.html
152+
153+
# Drawbacks
154+
[drawbacks]: #drawbacks
155+
156+
Slightly increases the surface complexity of `libcore`.
157+
158+
# Rationale and alternatives
159+
[alternatives]: #alternatives
160+
161+
Further rationale influencing this design is available in
162+
https://github.com/nikomatsakis/rust-memory-model/issues/45
163+
164+
## `clobber`
165+
166+
A previous version of this RFC also provided a `clobber` function:
167+
168+
```rust
169+
/// Flushes all pending writes to memory.
170+
pub fn clobber() -> ();
171+
```
172+
173+
In https://github.com/nikomatsakis/rust-memory-model/issues/45 it was realized
174+
that such a function cannot work properly within Rust's memory model.
175+
176+
## `value_fence` / `evaluate_and_drop`
177+
178+
An alternative design was proposed during the discussion on
179+
[rust-lang/rfcs/issues/1484](https://github.com/rust-lang/rfcs/issues/1484), in
180+
which the following two functions are provided instead:
181+
182+
```rust
183+
#[inline(always)]
184+
pub fn value_fence<T>(x: T) -> T {
185+
let y = unsafe { (&x as *T).read_volatile() };
186+
std::hint::forget(x);
187+
y
188+
}
189+
190+
#[inline(always)]
191+
pub fn evaluate_and_drop<T>(x: T) {
192+
unsafe {
193+
let mut y = std::hint::uninitialized();
194+
std::ptr::write_volatile(&mut y as *mut T, x);
195+
drop(y); // not necessary but for clarity
196+
}
197+
}
198+
```
199+
200+
This approach is not pursued in this RFC because these two functions:
201+
202+
* add overhead ([`rust.godbolt.org`](https://godbolt.org/g/aCpPfg)): `volatile`
203+
reads and stores aren't no ops, but the proposed `bench_black_box` and `clobber`
204+
functions are.
205+
* are implementable on stable Rust: while we could add them to `std` they do not
206+
necessarily need to be there.
207+
208+
## `bench_input` / `bench_outpu`
209+
210+
@eddyb proposed
211+
[here](https://github.com/rust-lang/rfcs/pull/2360#issuecomment-463594450) (and
212+
the discussion that followed) to add two other hints instead:
213+
214+
* `bench_input`: `fn(T) -> T` (identity-like) may prevent some optimizations
215+
from seeing through the valid `T` value, more specifically, things like
216+
const/load-folding and range-analysis miri would still check the argument, and
217+
so it couldn't be e.g. uninitialized the argument computation can be
218+
optimized-out (unlike `bench_output`) mostly implementable today with the same
219+
strategy as `black_box`.
220+
221+
* `bench_output`: `fn(T) -> ()` (drop-like) may prevent some optimizations from
222+
optimizing out the computation of its argument the argument is not treated as
223+
"escaping into unknown code", i.e., you can't implement `bench_output(x)` as
224+
`{ bench_input(&mut x); x }`. What that would likely prevent is placing `x`
225+
into a register instead of memory, but optimizations might still see the old
226+
value of `x`, as if it couldn't have been mutated potentially implementable
227+
like `black_box` but `readonly`/`readnone` in LLVM.
228+
229+
From the RFC discussion there was consensus that we might want to add these
230+
benchmarking hints in the future as well because their are easier to specify and
231+
provide stronger guarantees than `bench_black_box`.
232+
233+
Right now, however, it is unclear whether these two hints can be implemented
234+
strictly in LLVM. The comment thread shows that the best we can actually do
235+
ends up implementing both of these as `bench_black_box` with the same effects.
236+
237+
Without a strict implementation, it is unclear which value these two intrinsics
238+
would add, and more importantly, since their difference in semantics cannot be
239+
shown, it is also unclear how we could teach users to use them correctly.
240+
241+
If we ever able to implement these correctly, we might want to consider
242+
deprecating `bench_black_box` at that point, but whether it will be worth
243+
deprecating is not clear either.
244+
245+
# Prior art
246+
[prior-art]: #prior-art
247+
248+
Similar functionality is provided in the [`Google
249+
Benchmark`](https://github.com/google/benchmark) C++ library: are called
250+
[`DoNotOptimize`](https://github.com/google/benchmark/blob/61497236ddc0d797a47ef612831fb6ab34dc5c9d/include/benchmark/benchmark.h#L306)
251+
(`bench_black_box`) and
252+
[`ClobberMemory`](https://github.com/google/benchmark/blob/61497236ddc0d797a47ef612831fb6ab34dc5c9d/include/benchmark/benchmark.h#L317).
253+
The `black_box` function with slightly different semantics is provided by the
254+
`test` crate:
255+
[`test::black_box`](https://github.com/rust-lang/rust/blob/master/src/libtest/lib.rs#L1551).
256+
257+
# Unresolved questions
258+
[unresolved]: #unresolved-questions
259+
260+
* `const fn`: it is unclear whether `bench_black_box` should be a `const fn`. If it
261+
were, that would hint that it cannot have any side-effects, or that it cannot
262+
do anything that `const fn`s cannot do.
263+
264+
* Naming: during the RFC discussion it was unclear whether `black_box` is the
265+
right name for this primitive but we settled on `bench_black_box` for the time
266+
being. We should resolve the naming before stabilization.
267+
268+
Also, we might want to add other benchmarking hints in the future, like
269+
`bench_input` and `bench_output`, so we might want to put all of this
270+
into a `bench` sub-module within the `core::hint` module. That might
271+
be a good place to explain how the benchmarking hints should be used
272+
holistically.
273+
274+
Some arguments in favor or against using "black box" are that:
275+
* pro: [black box] is a common term in computer programming, that conveys
276+
that nothing can be assumed about it except for its inputs and outputs.
277+
con: [black box] often hints that the function has no side-effects, but
278+
this is not something that can be assumed about this API.
279+
* con: `_box` has nothing to do with `Box` or `box`-syntax, which might be confusing
280+
281+
Alternative names suggested: `pessimize`, `unoptimize`, `unprocessed`, `unknown`,
282+
`do_not_optimize` (Google Benchmark).

0 commit comments

Comments
 (0)