-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backup Restore rabbit cluster managed by operator #1491
Comments
I had a brief look at Kanister, bear in mind that I'm no expert on that technology 🐻 I have a long-shot idea that may solve the backup issue. You can use virtual host limits and set the maximum number of connections to 0 in all vhosts. This effectively rejects any connection to RabbitMQ. I don't recall if applying this limit also closes current connections, but there's a You are right that the Operator does not allow scaling down, mainly because it's not safe to do so and data loss is highly likely if you scale down rabbit. However, that concern does not apply in the restore scenario, because, well, you are restoring, you don't care about current data. One idea to stop the Operator from meddling in your restore sequence is to pause the reconcilliation, and let Kanister scale down the StatefulSet directly. Luckily, the name of the STS is derived from the Do you have any specific ask in terms of functionality that you would like to change in the Operator as part of this issue? |
Thanks for the insight. |
So was able to bet the pre-hook configs working and tested. |
So finally got something working. TL;DR quiesceRabbitCluster:
activateRabbitCluster:
revertRabbitCluster: (in-case of error)
Even with pausing the Operator reconciliation, Kasten was not able to successfully remove all the components. I did find this note that if you delete the rabbitcluster object, the Operator will not interfere (by design). So in my case, I need to destroy the entire cluster before I can restore it. |
Thank you for reporting back and for your effort figuring out the right sequence of steps. We can automate the steps to quiesce the rabbitmq cluster from the Operator, and the steps to "activate" the rabbitmq cluster. I have one question in the quiesce procedure. The step My understanding was that the sequence would be something like:
Did I misunderstood something? |
Your summary of the steps is correct. If something like that was built into the operator would be great. |
This issue has been marked as stale due to 60 days of inactivity. Stale issues will be closed after a further 30 days of inactivity; please remove the stale label in order to prevent this occurring. |
Closing stale issue due to further inactivity. |
Is your feature request related to a problem? Please describe.
Existing issue is the ability to quiesce a cluster for the ability to perform backups. The second part is performing a scale-down operation to replace the cluster during a restore.
Backup issue: (use Kasten v6.0.11 and custom Kanister blueprint)
Backup Idea:
This idea will cause the entire cluster to stop responding clients requests. That is acceptable at this point (as it is really the responsibility of the client to retry anyway).
Also want to accomplish this without the Operator trying to re-deploy the entire cluster for changes (it would make the backup procedure too long).
rabittmq-upgrade --timeout 10 drain
Idea Issue:
Restore Issue:
As mentioned, I currently use Kasten to backup my K8s workloads. Inherently, when Kasten is performing a restore to an existing workload with a PV attached, it will scaledown the workload to remove/replace the PV with the backed up data.
Ideas??
So any ideas how this logic to quiesce a Rabbit cluster to backup might be accomplished
The text was updated successfully, but these errors were encountered: