Quorum queue setup on a datacenter with exactly two availability zones (racks) #11877
-
Hi, we would like to setup a RabbitMQ cluster using quorum queues in a single datacenter consisting of exactly two availability zones (=racks). Unfortunately the RabbitMQ documentation is not clear to us, if this is a supported scenario. For example in RackAwareness three Racks are mentioned, i.e. an odd number of racks. In quorum-requirements it is mentioned that the cluster will not work, if the majority of nodes fails, which could happen with only two racks and an odd number of RabbitMQ nodes. At the moment it seems, that it is not possible to setup a reliable quorum cluster spanning two racks. Therefore please could you give us some insights what we could do? Some ideas we were thinking about but not sure it they will work:
Edit: To clarify my question: My point is about failure handling of a whole Rack. All nodes on that Rack will immediately be unavailable. If that Rack would have hosted 2 of 3 RabbitMQ nodes, we only have one node left which is, as far as I understand, not a working condition for QQ. The examples in the RabbitMQ blog (see link above) use an odd number of racks, this would avoid that issue, but at the moment we only have two racks. Best |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 8 replies
-
@dev4342345235 it is perfectly possible to set up a reliable quorum queue that has three replicas, which means three cluster nodes. They can use two racks if that's your limit for any reason. Two replica QQs are not supported. There is no such thing as a clear majority, which is very important for any practical Raft-based system, in a two node replica quorum queue for fairly obvious reasons. |
Beta Was this translation helpful? Give feedback.
-
Quorum queues will not be aware of racks. You need to make sure every rack has a RabbitMQ node deployed to it, and that there are three nodes total. Nothing beyond that. Exactly the same limitation and solution applies to streams. |
Beta Was this translation helpful? Give feedback.
-
Thank you for you fast response! Good to hear but I still don't get, how the cluster is able to survice if the rack goes down which hosts the majority of the nodes. Please take a look to the diagram from the RabbitMQ website: With only two racks we would have a scenario as depicted in the first picture: The documentation recommends to split the three nodes across three racks: So my question is, how we could achieve availability with two racks and a failure of the rack hosting the majority of the nodes? Do we need to go up to 5 nodes, put 3 nodes on Rack-A and 2 on Rack-B? But then we would still loose the majority of the nodes in case of Rack-A failure. From my understanding only 2 nodes are acceptable for failure if 5 are used in total. Best |
Beta Was this translation helpful? Give feedback.
With two racks, assuming an entire rack can fail, you cannot. You need to use three or assume that the risk of a rack failure is not important enough compared to host/node failure.
Two replica QQs and streams is an explicitly unsupported configuration (of course, you can extend a QQ or stream to just two replicas but it won't offer much in terms of availability).