|
| 1 | +- Feature Name: `capture_disjoint_fields` |
| 2 | +- Start Date: 2017-11-28 |
| 3 | +- RFC PR: [rust-lang/rfcs#2229](https://github.com/rust-lang/rfcs/pull/2229) |
| 4 | +- Rust Issue: [rust-lang/rust#53488](https://github.com/rust-lang/rust/issues/53488) |
| 5 | + |
| 6 | +# Summary |
| 7 | +[summary]: #summary |
| 8 | + |
| 9 | +This RFC proposes that closure capturing should be minimal rather than maximal. |
| 10 | +Conceptually, existing rules regarding borrowing and moving disjoint fields |
| 11 | +should be applied to capturing. If implemented, the following code examples |
| 12 | +would become valid: |
| 13 | + |
| 14 | +```rust |
| 15 | +let a = &mut foo.a; |
| 16 | +|| &mut foo.b; // Error! cannot borrow `foo` |
| 17 | +somefunc(a); |
| 18 | +``` |
| 19 | + |
| 20 | +```rust |
| 21 | +let a = &mut foo.a; |
| 22 | +move || foo.b; // Error! cannot move `foo` |
| 23 | +somefunc(a); |
| 24 | +``` |
| 25 | + |
| 26 | +Note that some discussion of this has already taken place: |
| 27 | +- rust-lang/rust#19004 |
| 28 | +- [Rust internals forum](https://internals.rust-lang.org/t/borrow-the-full-stable-name-in-closures-for-ergonomics/5387) |
| 29 | + |
| 30 | +# Motivation |
| 31 | +[motivation]: #motivation |
| 32 | + |
| 33 | +In the rust language today, any variables named within a closure will be fully |
| 34 | +captured. This was simple to implement but is inconsistent with the rest of the |
| 35 | +language because rust normally allows simultaneous borrowing of disjoint |
| 36 | +fields. Remembering this exception adds to the mental burden of the programmer |
| 37 | +and makes the rules of borrowing and ownership harder to learn. |
| 38 | + |
| 39 | +The following is allowed; why should closures be treated differently? |
| 40 | + |
| 41 | +```rust |
| 42 | +let _a = &mut foo.a; |
| 43 | +loop { &mut foo.b; } // ok! |
| 44 | +``` |
| 45 | + |
| 46 | +This is a particularly annoying problem because closures often need to borrow |
| 47 | +data from `self`: |
| 48 | + |
| 49 | +```rust |
| 50 | +pub fn update(&mut self) { |
| 51 | + // cannot borrow `self` as immutable because `self.list` is also borrowed as mutable |
| 52 | + self.list.retain(|i| self.filter.allowed(i)); |
| 53 | +} |
| 54 | +``` |
| 55 | + |
| 56 | +# Guide-level explanation |
| 57 | +[guide-level-explanation]: #guide-level-explanation |
| 58 | + |
| 59 | +Rust understands structs sufficiently to know that it's possible |
| 60 | +to borrow disjoint fields of a struct simultaneously. Structs can also be |
| 61 | +destructed and moved piece-by-piece. This functionality should be available |
| 62 | +anywhere, including from within closures: |
| 63 | + |
| 64 | +```rust |
| 65 | +struct OneOf { |
| 66 | + text: String, |
| 67 | + of: Vec<String>, |
| 68 | +} |
| 69 | + |
| 70 | +impl OneOf { |
| 71 | + pub fn matches(self) -> bool { |
| 72 | + // Ok! destructure self |
| 73 | + self.of.into_iter().any(|s| s == self.text) |
| 74 | + } |
| 75 | + |
| 76 | + pub fn filter(&mut self) { |
| 77 | + // Ok! mutate and inspect self |
| 78 | + self.of.retain(|s| s != &self.text) |
| 79 | + } |
| 80 | +} |
| 81 | +``` |
| 82 | + |
| 83 | +Rust will prevent dangerous double usage: |
| 84 | + |
| 85 | +```rust |
| 86 | +struct FirstDuplicated(Vec<String>) |
| 87 | + |
| 88 | +impl FirstDuplicated { |
| 89 | + pub fn first_count(self) -> usize { |
| 90 | + // Error! can't destructure and mutate same data |
| 91 | + self.0.into_iter() |
| 92 | + .filter(|s| &s == &self.0[0]) |
| 93 | + .count() |
| 94 | + } |
| 95 | + |
| 96 | + pub fn remove_first(&mut self) { |
| 97 | + // Error! can't mutate and inspect same data |
| 98 | + self.0.retain(|s| s != &self.0[0]) |
| 99 | + } |
| 100 | +} |
| 101 | +``` |
| 102 | + |
| 103 | +# Reference-level explanation |
| 104 | +[reference-level-explanation]: #reference-level-explanation |
| 105 | + |
| 106 | +This RFC does not propose any changes to the borrow checker. Instead, the MIR |
| 107 | +generation for closures should be altered to produce the minimal capture. |
| 108 | +Additionally, a hidden `repr` for closures might be added, which could reduce |
| 109 | +closure size through awareness of the new capture rules *(see unresolved)*. |
| 110 | + |
| 111 | +In a sense, when a closure is lowered to MIR, a list of "capture expressions" is |
| 112 | +created, which we will call the "capture set". Each expression is some part of |
| 113 | +the closure body which, in order to capture parts of the enclosing scope, must |
| 114 | +be pre-evaluated when the closure is created. The output of the expressions, |
| 115 | +which we will call "capture data", is stored in the anonymous struct which |
| 116 | +implements the `Fn*` traits. If a binding is used within a closure, at least one |
| 117 | +capture expression which borrows or moves that binding's value must exist in the |
| 118 | +capture set. |
| 119 | + |
| 120 | +Currently, lowering creates exactly one capture expression for each used |
| 121 | +binding, which borrows or moves the value in its entirety. This RFC proposes |
| 122 | +that lowering should instead create the minimal capture, where each expression |
| 123 | +is as precise as possible. |
| 124 | + |
| 125 | +This minimal set of capture expressions *might* be created through a sort of |
| 126 | +iterative refinement. We would start out capturing all of the local variables. |
| 127 | +Then, each path would be made more precise by adding additional dereferences and |
| 128 | +path components depending on which paths are used and how. References to structs |
| 129 | +would be made more precise by reborrowing fields and owned structs would be made |
| 130 | +more precise by moving fields. |
| 131 | + |
| 132 | +A capture expression is minimal if it produces a value that is used by the |
| 133 | +closure in its entirety (e.g. is a primitive, is passed outside the closure, |
| 134 | +etc.) or if making the expression more precise would require one the following. |
| 135 | + |
| 136 | +- a call to an impure function |
| 137 | +- an illegal move (for example, out of a `Drop` type) |
| 138 | + |
| 139 | +When generating a capture expression, we must decide if the output should be |
| 140 | +owned or if it can be a reference. In a non-`move` closure, a capture expression |
| 141 | +will *only* produce owned data if ownership of that data is required by the body |
| 142 | +of the closure. A `move` closure will *always* produce owned data unless the |
| 143 | +captured binding does not have ownership. |
| 144 | + |
| 145 | +Note that *all* functions are considered impure (including to overloaded deref |
| 146 | +implementations). And, for the sake of capturing, all indexing is considered |
| 147 | +impure. It is possible that overloaded `Deref::deref` implementations could be |
| 148 | +marked as pure by using a new, marker trait (such as `DerefPure`) or attribute |
| 149 | +(such as `#[deref_transparent]`). However, such a solution should be proposed in |
| 150 | +a separate RFC. In the meantime, `<Box as Deref>::deref` could be a special case |
| 151 | +of a pure function *(see unresolved)*. |
| 152 | + |
| 153 | +Also note that, because capture expressions are all subsets of the closure body, |
| 154 | +this RFC does not change *what* is executed. It does change the order/number of |
| 155 | +executions for some operations, but since these must be pure, order/repetition |
| 156 | +does not matter. Only changes to lifetimes might be breaking. Specifically, the |
| 157 | +drop order of uncaptured data can be altered. |
| 158 | + |
| 159 | +We might solve this by considering a struct to be minimal if it contains unused |
| 160 | +fields that implement `Drop`. This would prevent the drop order of those fields |
| 161 | +from changing, but feels strange and non-orthogonal *(see unresolved)*. |
| 162 | +Encountering this case at all could trigger a warning, so that this extra rule |
| 163 | +could exist temporarily but be removed over the next epoc *(see unresolved)*. |
| 164 | + |
| 165 | +## Reference Examples |
| 166 | + |
| 167 | +Below are examples of various closures and their capture sets. |
| 168 | + |
| 169 | +```rust |
| 170 | +let foo = 10; |
| 171 | +|| &mut foo; |
| 172 | +``` |
| 173 | + |
| 174 | +- `&mut foo` (primitive, ownership not required, used in entirety) |
| 175 | + |
| 176 | +```rust |
| 177 | +let a = &mut foo.a; |
| 178 | +|| (&mut foo.b, &mut foo.c); |
| 179 | +somefunc(a); |
| 180 | +``` |
| 181 | + |
| 182 | +- `&mut foo.b` (ownership not required, used in entirety) |
| 183 | +- `&mut foo.c` (ownership not required, used in entirety) |
| 184 | + |
| 185 | +The borrow checker passes because `foo.a`, `foo.b`, and `foo.c` are disjoint. |
| 186 | + |
| 187 | +```rust |
| 188 | +let a = &mut foo.a; |
| 189 | +move || foo.b; |
| 190 | +somefunc(a); |
| 191 | +``` |
| 192 | + |
| 193 | +- `foo.b` (ownership available, used in entirety) |
| 194 | + |
| 195 | +The borrow checker passes because `foo.a` and `foo.b` are disjoint. |
| 196 | + |
| 197 | +```rust |
| 198 | +let hello = &foo.hello; |
| 199 | +move || foo.drop_world.a; |
| 200 | +somefunc(hello); |
| 201 | +``` |
| 202 | + |
| 203 | +- `foo.drop_world` (ownership available, can't be more precise without moving |
| 204 | + out of `Drop`) |
| 205 | + |
| 206 | +The borrow checker passes because `foo.hello` and `foo.drop_world` are disjoint. |
| 207 | + |
| 208 | +```rust |
| 209 | +|| println!("{}", foo.wrapper_thing.a); |
| 210 | +``` |
| 211 | + |
| 212 | +- `&foo.wrapper_thing` (ownership not required, can't be more precise because |
| 213 | + overloaded `Deref` on `wrapper_thing` is impure) |
| 214 | + |
| 215 | +```rust |
| 216 | +|| foo.list[0]; |
| 217 | +``` |
| 218 | + |
| 219 | +- `foo.list` (ownership required, can't be more precise because indexing is |
| 220 | + impure) |
| 221 | + |
| 222 | +```rust |
| 223 | +let bar = (1, 2); // struct |
| 224 | +|| myfunc(bar); |
| 225 | +``` |
| 226 | + |
| 227 | +- `bar` (ownership required, used in entirety) |
| 228 | + |
| 229 | +```rust |
| 230 | +let foo_again = &mut foo; |
| 231 | +|| &mut foo.a; |
| 232 | +somefunc(foo_again); |
| 233 | +``` |
| 234 | + |
| 235 | +- `&mut foo.a` (ownership not required, used in entirety) |
| 236 | + |
| 237 | +The borrow checker fails because `foo_again` and `foo.a` intersect. |
| 238 | + |
| 239 | +```rust |
| 240 | +let _a = foo.a; |
| 241 | +|| foo.a; |
| 242 | +``` |
| 243 | + |
| 244 | +- `foo.a` (ownership required, used in entirety) |
| 245 | + |
| 246 | +The borrow checker fails because `foo.a` has already been moved. |
| 247 | + |
| 248 | +```rust |
| 249 | +let a = &drop_foo.a; |
| 250 | +move || drop_foo.b; |
| 251 | +somefunc(a); |
| 252 | +``` |
| 253 | + |
| 254 | +- `drop_foo` (ownership available, can't be more precise without moving out of |
| 255 | + `Drop`) |
| 256 | + |
| 257 | +The borrow checker fails because `drop_foo` cannot be moved while borrowed. |
| 258 | + |
| 259 | +```rust |
| 260 | +|| &box_foo.a; |
| 261 | +``` |
| 262 | + |
| 263 | +- `&<Box<_> as Deref>::deref(&box_foo).b` (ownership not required, `Box::deref` is pure) |
| 264 | + |
| 265 | +```rust |
| 266 | +move || &box_foo.a; |
| 267 | +``` |
| 268 | + |
| 269 | +- `box_foo` (ownership available, can't be more precise without moving out of |
| 270 | + `Drop`) |
| 271 | + |
| 272 | +```rust |
| 273 | +let foo = &mut a; |
| 274 | +let other = &mut foo.other; |
| 275 | +move || &mut foo.bar; |
| 276 | +somefunc(other); |
| 277 | +``` |
| 278 | + |
| 279 | +- `&mut foo.bar` (ownership *not* available, borrow can be split) |
| 280 | + |
| 281 | + |
| 282 | +# Drawbacks |
| 283 | +[drawbacks]: #drawbacks |
| 284 | + |
| 285 | +This RFC does ruin the intuition that all variables named within a closure are |
| 286 | +*completely* captured. I argue that that intuition is not common or necessary |
| 287 | +enough to justify the extra glue code. |
| 288 | + |
| 289 | +# Rationale and alternatives |
| 290 | +[alternatives]: #alternatives |
| 291 | + |
| 292 | +This proposal is purely ergonomic since there is a complete and common |
| 293 | +workaround. The existing rules could remain in place and rust users could |
| 294 | +continue to pre-borrow/move fields. However, this workaround results in |
| 295 | +significant useless glue code when borrowing many but not all of the fields in |
| 296 | +a struct. It also produces a larger closure than necessary which could make the |
| 297 | +difference when inlining. |
| 298 | + |
| 299 | +# Unresolved questions |
| 300 | +[unresolved]: #unresolved-questions |
| 301 | + |
| 302 | +- How to optimize pointers. Can borrows that all reference parts of the same |
| 303 | + object be stored as a single pointer? How should this optimization be |
| 304 | + implemented (e.g. a special `repr`, refinement typing)? |
| 305 | + |
| 306 | +- How to signal that a function is pure. Is this even needed/wanted? Any other |
| 307 | + places where the language could benefit? |
| 308 | + |
| 309 | +- Should `Box` be special? |
| 310 | + |
| 311 | +- Drop order can change as a result of this RFC, is this a real stability |
| 312 | + problem? How should this be resolved? |
0 commit comments