Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

=cluster handle aggressively rejoining/replacing nodes from same host/port pair #1083

Merged

Conversation

ktoso
Copy link
Member

@ktoso ktoso commented Nov 1, 2022

Resolves : #1082


Short version: there is an edge case that was not handled well when very aggressively rejoining the cluster from the same host/port.

This can happen in k8s when a pod gets aggressively restarted, or on command line apps when someone joins a cluster "sends just one request and kills the app" since they both then still may be present in SWIM gossip (correctly) as dead, and the cluster may misinterpret this about information about "itself" when the new node joins.


More analysis:

steps:

  • 7 joins 8
  • 7 is leader
    • minumum 2 nodes before we elect one
    • lower address wins
  • 7 dies
  • 8 cannot declare 7 as down
    • only leader can do this
    • this is ok, as designed // these systems expect tog et back their node count and then recover
  • 7 reboot -- let's call it 77
    • same host/port
    • new UID
  • handshake with 8
    • 8 accepts
    • 77 gets accept
    • 8 declares "previous 7" as down, since 77 is the replacement
  • 8 declaring 7 down is correct
    • but it means we have a down 7 in membership
    • this is also correct; other nodes may not yet know about this, so we want to spread this down information that 8 first noticed
  • gossip in includes old node 7 (okey)
    • Node 77 receives gossip through SWIM and that includes 7:
    • 2022-11-01T13:08:56+0900 trace Client : actor/id=/user/swim actor/path=/user/swim cluster/node=sact://[email protected]:7337 swim/incarnation=0 swim/members/all=["SWIM.Member(SWIMActor(id:sact://RemoteCluster:[email protected]:8337/user/swim, node:sact://RemoteCluster:[email protected]:8337, alive(incarnation: 0), protocolPeriod: 1)", "SWIM.Member(SWIMActor(id:/user/swim, node:sact://[email protected]:7337, alive(incarnation: 0), protocolPeriod: 0)"] swim/members/count=2 swim/ping/origin=sact://RemoteCluster:[email protected]:8337/user/swim swim/ping/payload=membership([SWIM.Member(SWIMActor(id:sact://[email protected]:7337/user/swim, node:sact://[email protected]:7337, suspect(incarnation: 0, suspectedBy: Set([sact://[email protected]:8337#7602583950674506995])), protocolPeriod: 56), SWIM.Member(SWIMActor(id:sact://RemoteCluster:[email protected]:8337/user/swim, node:sact://RemoteCluster:[email protected]:8337, alive(incarnation: 0), protocolPeriod: 0), SWIM.Member(SWIMActor(id:/user/swim, node:sact://[email protected]:7337, alive(incarnation: 0), protocolPeriod: 56)]) swim/ping/seqNr=4 swim/protocolPeriod=1 swim/suspects/count=0 swim/timeoutSuspectsBeforePeriodMax=11 swim/timeoutSuspectsBeforePeriodMin=4 [DistributedCluster] Received ping@4
    • Note this: swim/ping/payload=membership([
    • In other words, SWIM spreads information about both nodes since it is not confirmDead yet -- THIS IS OK. But continuing to act on the removed node's information is NOT ok.

Long story short: SWIM tells us that a node on this address was dead, but we know we are not dead -- this should only happen on high level gossip, when we see a .down somewhere about us. So we can ignore this from the SWIM level.

This should also get fixed in SWIM itself though, I'll follow up there.

@ktoso ktoso changed the title =pretty nicer log formatting in pretty formatter (in tests) =cluster handle aggressively rejoining/replacing nodes from same host/port pair Nov 1, 2022
@ktoso ktoso requested a review from yim-lee November 1, 2022 09:15
actorIdentifier = "[\(id)]"
_ = metadata.removeValue(forKey: "actor/path") // discard it
} else if let path = metadata.removeValue(forKey: "actor/path") {
actorIdentifier = "[\(path)]"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revamping this now that we have more distributed actor types and less paths 👍

@@ -106,7 +106,7 @@ struct SamplePrettyLogHandler: LogHandler {

let file = file.split(separator: "/").last ?? ""
let line = line
print("\(self.timestamp()) [\(file):\(line)] [\(nodeInfo)\(Self.CONSOLE_BOLD)\(label)\(Self.CONSOLE_RESET)] [\(level)] \(message)\(metadataString)")
print("\(self.timestamp()) \(level) [\(nodeInfo)\(Self.CONSOLE_BOLD)\(label)\(Self.CONSOLE_RESET)][\(file):\(line)] \(message)\(metadataString)")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same formatting as our example formatters in samples

.lazy
.filter { member in member.status <= status }
.count
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

used in test but generally useful

@@ -223,6 +238,21 @@ extension Cluster {
}
}

extension Cluster.Membership: Sequence {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally useful to for member in membership 👍

change.member.node.asClusterNode == self.id.node
{
return
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fix™

Though this deserves a deeper dig into SWIM as well.

Also, SwiftFormat's formatting of multi-line if is annoying omg!

@ktoso
Copy link
Member Author

ktoso commented Nov 1, 2022

SWIM follow up apple/swift-cluster-membership#91

@ktoso
Copy link
Member Author

ktoso commented Nov 1, 2022

How I love a trailing space being the only failure :P

@ktoso ktoso enabled auto-merge (squash) November 1, 2022 23:04
@ktoso ktoso merged commit 4934208 into apple:main Nov 1, 2022
@ktoso ktoso deleted the wip-join-over-dont-down-myself-based-on-old-node-id branch November 2, 2022 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: node replacement can fail if happening too aggressively
2 participants