Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add partial support for cancelling async write mutex requests #6486

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tgoyne
Copy link
Member

@tgoyne tgoyne commented Apr 10, 2023

While we can't cancel the actual wait on the write mutex, we can dequeue specific Transactions which are waiting for their turn to write, and only block when the DB itself is destroyed. This makes it so that individual Transactions with cancelled async writes can be cleaned up while the write lock is held.

This is done by changing the async write queue in DB::AsyncCommitHelper from arbitrary callbacks to a queue of Transaction instances. It holds unowned Transactions to avoid ever having the final reference to the DB held on the worker thread. This required some adjustments to the locking to ensure that we're holding a lock whenever we need a pointer to remain valid, and to avoid lock order inversions this means that all of the calls to AsyncCommitHelper haver to be done without a lock held. We only do those calls from the Transaction's thread, so that didn't actually cause many problems.

The changes to BowlOfStonesSemaphore scoping in sync tests is to fix a pre-existing race condition in those tests which tsan is now complaining about. The semaphore was often being captured by something which outlived it, and could theoretically be destroyed before the call to pthread_cond_signal() returned. I doubt this ever caused any actual problems, but it could explain extremely rare crashes in sync tests.

This fixes the same problem as #6413, but without the part where we'd sometimes end up closing the DB from the async commit helper thread, as that really didn't work.

While we can't cancel the actual wait on the write mutex, we can dequeue
specific Transactions which are waiting for their turn to write, and only block
when the DB itself is destroyed. This makes it so that individual Transactions
with cancelled async writes can be cleaned up while the write lock is held.

This is done by changing the async write queue in DB::AsyncCommitHelper from
arbitrary callbacks to a queue of Transaction instances. This increases the
coupling between the types, but makes it easier to dequeue specific instances
and relying on the specific details of what Transaction will do simplifies the
locking involved.
@tgoyne tgoyne force-pushed the tg/async-write-cancel-2 branch from 2604832 to e28a395 Compare April 13, 2023 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant