-
Notifications
You must be signed in to change notification settings - Fork 3.9k
QQ grow to a target quorum cluster size #13873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…it_quorum_queue api
Hi, what happens when add member call errors in the middle of the loop? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Ayanda-D from the early days of Ra and QQs, we have decided to not add N members at a time, by design. Some other Raft implementations explicitly prohibit such parallel membership changes, or the logic of the spec becomes really hard to follow at times.
So at best we can do this after a pause of say, 10 seconds minimum. And ideally after learning that the previous membership change has completed.
hi folks, thanks for taking a look. @michaelklishin i agree "parallel membership changes" would be a bit tricky and hard to follow, here. Everything is still sequential. We are still retaining the exact same behaviour of @deadtrickster if grow fails middle of the loop, still the same behaviour of current Nothing much's changed functionality wise, we're just running the same grow command, |
@Ayanda-D per discussion with @kjnilsson, we suggest doing the following before adding more members (replicas). We need to make sure that every QQ member is a voter. That means that they have caught up enough to assume that it's really safe to proceed. If that's not the case yet, the algorithm should skip the queue where it's been the case. Then we can retry the process for such queues. Worst case the operator will have to re-run the We should also do that before we start the process (if we don't do so already). That variation of this change we are willing to accept. |
Proposed Changes
Hello team 👋
The following changes are a proposed extension to the
rabbitmq-queues grow
command. Currently this command only allows growing queues to a single target node per execution, for a set of queues matching the--queue-pattern
. This means the command needs to be executed a number of times to fully restore a quorum queue (replicas) back to N-number of nodes (which can also be expensive from the multiple RPCs from the temporary CLI execution node). As an extension to therabbitmq-queues grow
command, we'd like to allow growing queues to a specified number of nodes N / target quorum cluster size (allowing the broker to do all the grow processing to multiple nodes in one CLI execution). The command signature becomes:When quorum queues get into a bad state, e.g. become leaderless (#12701, #13101) and shrink operations (#12427) to a single node to attempt rescue are applied, fast recovery/growing back to N nodes / to a target quorum cluster size for the set of queues becomes a critical requirement.
Please take a look. Thanks!
Types of Changes
What types of changes does your code introduce to this project?
Put an
x
in the boxes that applyChecklist
Put an
x
in the boxes that apply.You can also fill these out after creating the PR.
If you're unsure about any of them, don't hesitate to ask on the mailing list.
We're here to help!
This is simply a reminder of what we are going to look for before merging your code.
CONTRIBUTING.md
document