-
Notifications
You must be signed in to change notification settings - Fork 746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix deadlock on block cache. #6412
Conversation
Thread dump after node is stuck for future reference:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice find.
I would love to get the deadlock detector running again
Looks like |
@mergify queue |
🛑 The pull request has been removed from the queue
|
CI is failing because the Windows runner was updated to 1.81 and we need: |
@mergify requeue |
✅ This pull request will be re-embarked automaticallyThe followup |
✅ The pull request has been merged automaticallyThe pull request has been merged automatically at 46e0d66 |
Issue Addressed
I ran into another deadlock scenario again today during testing.
On thread A, read lock (1) for
block_cache
is acquired here:lighthouse/beacon_node/beacon_chain/src/eth1_chain.rs
Line 478 in 99e53b8
lighthouse/beacon_node/eth1/src/service.rs
Lines 477 to 479 in 9b3b730
On thread B, a write lock (2) for
block_cache
is acquired inprune_blocks
, so it waits for (1) to release the locklighthouse/beacon_node/eth1/src/inner.rs
Line 63 in c824142
On thread A, a read lock (3) for
block_cache
is acquired again, now this is waiting on (2), but (1) may not have release the lock before this, so we're in a deadlock.lighthouse/beacon_node/beacon_chain/src/eth1_chain.rs
Line 516 in 99e53b8
lighthouse/beacon_node/eth1/src/service.rs
Lines 499 to 501 in 9b3b730
Strangely it's on code that hasn't been touched for ages, maybe it's more easily reproducible when resources are very constrained (I'm running a kurtosis devnet).
Proposed Changes
Drop the
block_cache
at (1) after using, and before it gets acquired again.