Skip to content

Commit

Permalink
Rollup merge of #137731 - SparrowLii:waiter, r=nnethercote
Browse files Browse the repository at this point in the history
Resume one waiter at once in deadlock handler

When multiple query loop errors occur in the code, only one waiter should be resumed at a time to avoid waking up multiple waiters at the same time and causing deadlock due to thread grabbing.

This fixes the UI failures in #132051

cc `@Zoxc` `@cjgillot` `@nnethercote` `@bjorn3` `@Kobzol`

Zulip discussion [here](https://rust-lang.zulipchat.com/#narrow/channel/187679-t-compiler.2Fwg-parallel-rustc/topic/Deadlocks.20and.20Rayon)

Edit: We can't reproduce these bugs with the existing test suits, so we keep them until we merge #132051
UPDATES #129912
UPDATES #120757
UPDATES #129911
  • Loading branch information
jieyouxu authored Mar 5, 2025
2 parents 257b494 + cc1e4ed commit 927c11f
Showing 1 changed file with 15 additions and 2 deletions.
17 changes: 15 additions & 2 deletions compiler/rustc_query_system/src/query/job.rs
Original file line number Diff line number Diff line change
Expand Up @@ -477,8 +477,8 @@ fn remove_cycle(
/// Detects query cycles by using depth first search over all active query jobs.
/// If a query cycle is found it will break the cycle by finding an edge which
/// uses a query latch and then resuming that waiter.
/// There may be multiple cycles involved in a deadlock, so this searches
/// all active queries for cycles before finally resuming all the waiters at once.
/// There may be multiple cycles involved in a deadlock, but we only search
/// one cycle at a call and resume one waiter at once. See `FIXME` below.
pub fn break_query_cycles(query_map: QueryMap, registry: &rayon_core::Registry) {
let mut wakelist = Vec::new();
let mut jobs: Vec<QueryJobId> = query_map.keys().cloned().collect();
Expand All @@ -488,6 +488,19 @@ pub fn break_query_cycles(query_map: QueryMap, registry: &rayon_core::Registry)
while jobs.len() > 0 {
if remove_cycle(&query_map, &mut jobs, &mut wakelist) {
found_cycle = true;

// FIXME(#137731): Resume all the waiters at once may cause deadlocks,
// so we resume one waiter at a call for now. It's still unclear whether
// it's due to possible issues in rustc-rayon or instead in the handling
// of query cycles.
// This seem to only appear when multiple query cycles errors
// are involved, so this reduction in parallelism, while suboptimal, is not
// universal and only the deadlock handler will encounter these cases.
// The workaround shows loss of potential gains, but there still are big
// improvements in the common case, and no regressions compared to the
// single-threaded case. More investigation is still needed, and once fixed,
// we can wake up all the waiters up.
break;
}
}

Expand Down

0 comments on commit 927c11f

Please sign in to comment.