-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage controller: don't hold detached tenants in memory #10264
Open
jcsp
wants to merge
10
commits into
main
Choose a base branch
from
jcsp/drop-detached-tenants
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+310
−26
Open
Changes from 1 commit
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
83d22d6
storcon: only load non-detached tenants at startup
jcsp de9adde
storcon: load detached tenants on demand
jcsp 5637b99
storcon: drop detached tenants
jcsp d421940
storcon: handle detached shards in consistency_check
jcsp b1f1032
storcon: on-demand load for tenant mutators
jcsp 4289efa
tests: add test_storage_controller_detach_lifecycle
jcsp 93a7688
nit: refine name of shard listing fn
jcsp ec2c0e0
fixup: remove redundant filter (it's filtered in DB query)
jcsp 61bf182
fixup: properly require tenant op lock around dropping tenant
jcsp dd47681
storcon: reliably drop tenants
jcsp File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit scary. Since service state sits behind a sync lock we follow this pattern:
If step (3) doesn't expect the removal, then we run into trouble. I couldn't find any place with problematic
expect
orunwrap
calls. Generally, this should be pretty safe since we wait on the reconcile spawned by the detach intenant_location_config
and that holds the tenant exclusive lock, but might run into issues if detaches end up taking long.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment isn't really actionable, but I'm curious about your thoughts on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In request handlers, the
tenant_op_locks
for the tenant should prevent any shenanigans like this.However, you make an excellent point in this particular context: process_result does not hold that lock. If a request handler took the lock, read the existence of the tenant, and then raced with the processing of a reconciler completion, this could violate the assumption that while holding the lock, a tenant that is in memory should remain in memory.
I think the neatest solution to this is to try and get an exclusive lock around this, and to make our maybe_load and maybe_drop functions take refs to lock handles to prove the callers aren't using them outside the lock, let's try that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tightened up the use of op locks in 61bf182
In the process I had the interesting observation that tenant_location_conf holds the lock much longer than it needs to, really it should drop it before waiting for reconcilers, but I don't want to make that change inline here in case it has any spooky side effects.