-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Revise the rabbitmq controller to prevent unexpected downscale (#211)
The current version of RabbitMQ controller doesn't support scaling down; directly reducing the replicas of the stateful set can lead to quorum loss and even data loss. More details in [this issue](rabbitmq/cluster-operator#223). Thus, we decide to disallow downscale for the stateful set in our implementation. Using the [validation rule](https://github.com/vmware-research/verifiable-controllers/blob/f5236647bf4fb26daa1359fde3c61a282a886735/src/controller_examples/rabbitmq_controller/spec/rabbitmqcluster.rs#L108) can guarantee that updating the deployment won't decrease the replicas. But here is a corner case: a workaround for downscale operation is to delete the current deployment and create a new one with fewer replicas, which doesn't violate the validation rule. If the execution happens in this way, chances are that the stateful set created from the new cr may not have been deleted by the garbage collector when the controller tries to update the stateful set with the new cr which has a smaller `replicas` field. Thus, the controller implementation still needs to compare the old and new replicas before updating stateful set to make sure scaling down doesn't happen. This makes the proof a lot more difficult, because we have to show that the replicas of old stateful set is lower than that of the current cr, whose proof requires us to show that if a stateful set has owner reference pointing to some cr, its replicas is no larger. Therefore, we decide to let the controller wait for the garbage collector to delete the old stateful set, which avoids the corner case, and does not introduce too much complexity to the proof since it's needless to compare the stateful set now. In this case, if the old stateful set doesn't have an owner reference pointing to the current cr, the reconcile will simply go to the error state and wait for the next round of reconcile. To make the liveness proof work, I change `the_object_in_reconcile_has_spec_as` into `the_object_in_reconcile_has_spec_and_uid_as` so that the owner_references can also be derived from the desired custom resource. The left work is as follows: 1. Add an eventual safety property showing that `spec /\ []desired_state_is(cr) |= true ~> (sts_key_exists => sts.owner_references.contains(cr.owner_ref()))`. 2. Add `[](sts_key_exists => sts.owner_references.contains(cr.owner_ref()))` to the assumption of the spec. 3. Add reasoning about the steps after create/update server config map and the stateful set part should be similar as zookeeper as long as 1 and 2 are done. --------- Signed-off-by: Wenjie Ma <[email protected]>
- Loading branch information
1 parent
d92059c
commit 8503f64
Showing
19 changed files
with
1,468 additions
and
1,101 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.