-
Notifications
You must be signed in to change notification settings - Fork 14
Add permdown to topologies #14
Comments
@michalmuskala Will we be using similar logic to PHX pubsub? A timer keeping track of how long a node is down to determine if it's |
We've discussed this today. The way the permdown detection should work in the Erlang topology is that:
|
I'm a little surprised by the first point - that a node with the same name but different version would be considered permanently down. In many consistency models, Raft for example, a configuration is permanent until the cluster is instructed to change that configuration, and so nodes with the same name/address (one's choice) can go down for long periods of time, come back up, and still be considered a member of the cluster. In the model you've expressed though, an implementation of Raft would not be able to use the permdown event for anything useful, or rather, it would just ignore it. Perhaps it is not important for it to be generally applicable to higher-level consistency models, but I do wonder if that implies that the concept of a permdown event is not particularly useful, since it doesn't always mean what one thinks it means. Or put another way, the concept of being permanently down is in many cases determined by application rules, not the network topology. I could see the concept of permdown based on a timeout being useful, as it would allow a Raft implementation to generate automatic configuration changes based on that event (whereas nodedown is just not reliable for that), but I don't think versioning is as useful. I'm also curious how this potentially impacts systems being upgraded, i.e. why would a cluster want to treat a node as permanently down if it can potentially come up with the same state as the old version (such as in cases where the node commits to a persistent log). Perhaps I'm thinking at the wrong level here, in which case definitely let me know, but since I'm working on stuff that potentially would leverage firenest, I figured I would chime in with my thoughts on what you've proposed. |
I may be interpreting this incorrectly, but I believe @michalmuskala is making the same point @bitwalker. If a new nodes comes up with a different version, the consistency model should mark the node with the previous version as permdown.
I believe this is precisely why this feature should be implemented. By setting a timeout for what one considers permdown, you avoid waiting indefinitely for a node to come back. All of this is assuming the same model as phoenix pubsub however. |
To clarify - by version I mean that in The version of a node is different each time a node starts. So if a node comes down and back up again it has a different version. The only reason for a node to go through a down/up cycle is network issues - a node can't resurrect with the same version. It's entirely reasonable for something that implements on top of |
@ArThien The reason why it doesn't matter how long a node is down in Raft (for example) is because when a node comes back, it is caught up by the leader regardless of how long it was gone, but while it is down it does not participate in elections (but is still counted in the cluster for purposes of determining quorum sizes). In many consistency models it is explicitly bad to automatically retire nodes, since you can't control how long a partition lasts, and you must be able to ensure that a split brain situation cannot occur - if you automatically retire a node based on a timeout, you potentially can have a quorum on both sides of a partition based on the new cluster size as seen by both partitions.
I think that's reasonable - I guess my point was more about whether that really belongs in Firenest versus the application layer, because that is where the rules around what constitutes permdown really are defined. That said, if it is a feature that consumers of Firenest opt into, rather than having to opt out of, then I think it is much more useful (since you can explicitly decide whether Firenest's rules around permdown are useful for your application, or decide to provide your own, but be able to surface your own permdown using the same message). |
A permdown will only be delivered after a down and it is just a notification - the meaning is always added at the application layer. If a system does not care about the topology definition of a permdown, then it could ignore it. Although the alternative is to keep this definition at the application level indeed. One benefit is that this does not need to be implemented for every new topology. The downside is that we may need to make the node name a bit less opaque (not sure if this is possible today though). Maybe this is indeed best defined as a feature of the SyncedServer. |
Just wanted to add some clarity around the raft use case. Here's a single use case that I think will help me explain my pov. Let's say we a cluster of 5 nodes: A, B, C, D, E. Due to operator error or some sort of egregious fault the cluster is partitioned into a group of 3 (A,B,C) and a group of 2 (D, E). During this time the cluster of 3 maintains a majority so they can continue servicing requests. The error is not transient so after the timeout nodes on either side of the split receive permdown events from the nodes on the other side. At this point there isn't a safe operation that we can take to try to automatically heal the cluster. For instance we could issue a configuration change message between D and E so that they can start receiving traffic. But doing that effectively splits the operators cluster forever with no way for them to reconcile their state. From that perspective our raft library won't ever try to use these events to take actions because they are inherently dangerous and could cause lost writes. In the above scenario the correct decision is to empower the operator to choose how they want to heal their cluster. They might choose to issue cluster change commands directly, attempt to repair the partition, etc. For them to take these actions we need to send alerts to operators when we see issues like this. So while we probably couldn't ever safely use permdown to do any cluster configuration we could definitely use it for triggering alarms to operators. I'm not sure thats a use case that makes sense or how this would affect the underlying design decisions but I thought I would provide some clarity here. |
This should send a
named_permdown
message to thesync_named
subscribers once the topology detects a permdown of a node.The text was updated successfully, but these errors were encountered: