Skip to content

Commit f5b49d3

Browse files
authored
Launching the Lock Poisoning Survey (#708)
1 parent a90e20d commit f5b49d3

File tree

1 file changed

+129
-0
lines changed

1 file changed

+129
-0
lines changed
+129
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
---
2+
layout: post
3+
title: "Launching the Lock Poisoning Survey"
4+
author: Ashley Mannix
5+
team: The Libs team <https://www.rust-lang.org/governance/teams/library>
6+
---
7+
8+
The Libs team is looking at how we can improve the `std::sync` module, by potentially splitting it up into new modules and making some changes to APIs along the way.
9+
One of those API changes we're looking at is non-poisoning implementations of `Mutex` and `RwLock`.
10+
To find the best path forward we're conducting a survey to get a clearer picture of how the standard locks are used out in the wild.
11+
12+
The survey is a Google Form.
13+
[You can fill it out here](https://docs.google.com/forms/d/e/1FAIpQLSehk-GkwoCag_w3YfXDfgeANulR0h5m2d3EzUMQaiY1vRfIEw/viewform).
14+
15+
### What is this survey for?
16+
17+
The survey is intended to answer the following questions:
18+
19+
- When is poisoning on `Mutex` and `RwLock` being used deliberately.
20+
- Whether `Mutex` and `RwLock` (and their guard types) appear in the public API of libraries.
21+
- How much friction there is switching from the poisoning `Mutex` and `RwLock` locks to non-poisoning ones (such as from `antidote` or `parking_lot`).
22+
23+
This information will then inform an RFC that will set out a path to non-poisoning locks in the standard library.
24+
It may also give us a starting point for looking at the tangentially related `UnwindSafe` and `RefUnwindSafe` traits for panic safety.
25+
26+
### Who is this survey for?
27+
28+
If you write code that uses locks then this survey is for you.
29+
That includes the standard library's `Mutex` and `RwLock` as well as locks from `crates.io`, such as `antidote`, `parking_lot`, and `tokio::sync`.
30+
31+
### So what is poisoning anyway?
32+
33+
Let's say you have an `Account` that can update its balance:
34+
35+
```rust
36+
impl Account {
37+
pub fn update_balance(&mut self, change: i32) {
38+
self.balance += change;
39+
self.changes.push(change);
40+
}
41+
}
42+
```
43+
44+
Let's also say we have the invariant that `balance == changes.sum()`.
45+
We'll call this the _balance invariant_.
46+
So at any point when interacting with an `Account` you can always depend on its `balance` being the sum of its `changes`, thanks to the balance invariant.
47+
48+
There's a point in our `update_balance` method where the balance invariant isn't maintained though:
49+
50+
```rust
51+
impl Account {
52+
pub fn update_balance(&mut self, change: i32) {
53+
self.balance += change;
54+
// self.balance != self.changes.sum()
55+
self.changes.push(change);
56+
}
57+
}
58+
```
59+
60+
That seems ok, because we're in the middle of a method with exclusive access to our `Account` and everything is back to good when we return.
61+
There isn't a `Result` or `?` to be seen so we know there's no chance of an early return before the balance invariant is restored. Or so we think.
62+
63+
What if `self.changes.push` didn't return normally?
64+
What if it panicked instead without actually doing anything?
65+
Then we'd return from `update_balance` early without restoring the balance invariant.
66+
That seems ok too, because a panic will start unwinding the thread it was called from, leaving no trace of any data it owned behind.
67+
Ignoring the `Drop` trait, no data means no broken invariants.
68+
Problem solved, right?
69+
70+
What if our `Account` wasn't owned by that thread that panicked?
71+
What if it was shared with other threads as a `Arc<Mutex<Account>>`?
72+
Unwinding one thread isn't going to protect other threads that could still access the `Account`, and they're not going to know that it's now invalid.
73+
74+
This is where poisoning comes in.
75+
The `Mutex` and `RwLock` types in the standard library use a strategy that makes panics (and by extension the possibility for broken invariants) observable.
76+
The next consumer of the lock, such as another thread that didn't unwind, can decide at that point what to do about it.
77+
This is done by storing a switch in the lock itself that's flipped when a panic causes a thread to unwind through its guard.
78+
Once that switch is flipped the lock is considered _poisoned_, and the next attempt to acquire it will receive an error instead of a guard.
79+
80+
The standard approach for dealing with a poisoned lock is to propagate the panic to the current thread by unwrapping the error it returns:
81+
82+
```rust
83+
let mut guard = shared.lock().unwrap();
84+
```
85+
86+
That way nobody can ever observe the possibly violated balance invariant on our shared `Account`.
87+
88+
That sounds great! So why would we want to remove it?
89+
90+
### What's wrong with lock poisoning?
91+
92+
There's nothing wrong with poisoning itself.
93+
It's an excellent pattern for dealing with failures that can leave behind unworkable state.
94+
The question we're really asking is whether it should be used by the _standard locks_, which are `std::sync::Mutex` and `std::sync::RwLock`.
95+
We're asking whether it's a standard lock's job to implement poisoning. Just to avoid any confusion, we'll distinguish the poisoning pattern from the API of the standard locks by calling the former _poisoning_ and the latter _lock poisoning_.
96+
We're just talking about lock poisoning.
97+
98+
In the previous section we motivated poisoning as a way to protect us from possibly broken invariants.
99+
Lock poisoning isn't actually a tool for doing this in the way you might think.
100+
In general, a poisoned lock can't tell whether or not any invariants are _actually_ broken.
101+
It assumes that a lock is shared, so is likely going to outlive any individual thread that can access it.
102+
It also assumes that if a panic leaves any data behind then it's more likely to be left in an unexpected state, because panics aren't part of normal control flow in Rust.
103+
Everything _could_ be fine after a panic, but the standard lock can't guarantee it.
104+
Since there's no guarantee there's an escape hatch.
105+
We can always still get access to the state guarded by a poisoned lock:
106+
107+
```rust
108+
let mut guard = shared.lock().unwrap_or_else(|err| err.into_inner());
109+
```
110+
111+
All Rust code needs to remain free from any possible undefined behavior in the presence of panics, so ignoring panics is always safe.
112+
Rust doesn't try guarantee all safe code is free from logic bugs, so broken invariants that don't potentially lead to undefined behavior aren't strictly considered unsafe.
113+
Since ignoring lock poisoning is also always safe it doesn't really give you a dependable tool to protect state from panics.
114+
You can always ignore it.
115+
116+
So lock poisoning doesn't give you a tool for guaranteeing safety in the presence of panics.
117+
What it does give you is a way to propagate those panics to other threads.
118+
The machinery needed to do this adds costs to using the standard locks.
119+
There's an ergonomic cost in having to call `.lock().unwrap()`, and a runtime cost in having to actually track state for panics.
120+
121+
With the standard locks you pay those costs whether you need to or not.
122+
That's not typically how APIs in the standard library work.
123+
Instead, you compose costs together so you only pay for what you need.
124+
Should it be a standard lock's job to synchronize access _and_ propagate panics?
125+
We're not so sure it is.
126+
If it's not then what should we do about it?
127+
That's where the survey comes in.
128+
We'd like to get a better idea of how you use locks and poisoning in your projects to help decide what to do about lock poisoning.
129+
[You can fill it out here](https://docs.google.com/forms/d/e/1FAIpQLSehk-GkwoCag_w3YfXDfgeANulR0h5m2d3EzUMQaiY1vRfIEw/viewform).

0 commit comments

Comments
 (0)