=cluster handle aggressively rejoining/replacing nodes from same host/port pair #1083

ktoso · 2022-11-01T09:12:41Z

Resolves : #1082

Short version: there is an edge case that was not handled well when very aggressively rejoining the cluster from the same host/port.

This can happen in k8s when a pod gets aggressively restarted, or on command line apps when someone joins a cluster "sends just one request and kills the app" since they both then still may be present in SWIM gossip (correctly) as dead, and the cluster may misinterpret this about information about "itself" when the new node joins.

More analysis:

steps:

7 joins 8
7 is leader
- minumum 2 nodes before we elect one
- lower address wins
7 dies
8 cannot declare 7 as down
- only leader can do this
- this is ok, as designed // these systems expect tog et back their node count and then recover
7 reboot -- let's call it 77
- same host/port
- new UID
handshake with 8
- 8 accepts
- 77 gets accept
- 8 declares "previous 7" as down, since 77 is the replacement
8 declaring 7 down is correct
- but it means we have a down 7 in membership
- this is also correct; other nodes may not yet know about this, so we want to spread this down information that 8 first noticed
gossip in includes old node 7 (okey)
- Node 77 receives gossip through SWIM and that includes 7:
- 2022-11-01T13:08:56+0900 trace Client : actor/id=/user/swim actor/path=/user/swim cluster/node=sact://[email protected]:7337 swim/incarnation=0 swim/members/all=["SWIM.Member(SWIMActor(id:sact://RemoteCluster:[email protected]:8337/user/swim, node:sact://RemoteCluster:[email protected]:8337, alive(incarnation: 0), protocolPeriod: 1)", "SWIM.Member(SWIMActor(id:/user/swim, node:sact://[email protected]:7337, alive(incarnation: 0), protocolPeriod: 0)"] swim/members/count=2 swim/ping/origin=sact://RemoteCluster:[email protected]:8337/user/swim swim/ping/payload=membership([SWIM.Member(SWIMActor(id:sact://[email protected]:7337/user/swim, node:sact://[email protected]:7337, suspect(incarnation: 0, suspectedBy: Set([sact://[email protected]:8337#7602583950674506995])), protocolPeriod: 56), SWIM.Member(SWIMActor(id:sact://RemoteCluster:[email protected]:8337/user/swim, node:sact://RemoteCluster:[email protected]:8337, alive(incarnation: 0), protocolPeriod: 0), SWIM.Member(SWIMActor(id:/user/swim, node:sact://[email protected]:7337, alive(incarnation: 0), protocolPeriod: 56)]) swim/ping/seqNr=4 swim/protocolPeriod=1 swim/suspects/count=0 swim/timeoutSuspectsBeforePeriodMax=11 swim/timeoutSuspectsBeforePeriodMin=4 [DistributedCluster] Received ping@4
- Note this: swim/ping/payload=membership([
  - SWIM.Member(SWIMActor(id:sact://[email protected]:7337/user/swim, node:sact://[email protected]:7337, suspect(incarnation: 0, suspectedBy: Set([sact://[email protected]:8337#7602583950674506995])), protocolPeriod: 56),
  - SWIM.Member(SWIMActor(id:sact://[email protected]:7337/user/swim, node:sact://[email protected]:7337, alive(incarnation: 0), protocolPeriod: 56),
  - SWIM.Member(SWIMActor(id:sact://RemoteCluster:[email protected]:8337/user/swim, node:sact://RemoteCluster:[email protected]:8337, alive(incarnation: 0), protocolPeriod: 0)
- In other words, SWIM spreads information about both nodes since it is not confirmDead yet -- THIS IS OK. But continuing to act on the removed node's information is NOT ok.

Long story short: SWIM tells us that a node on this address was dead, but we know we are not dead -- this should only happen on high level gossip, when we see a .down somewhere about us. So we can ignore this from the SWIM level.

This should also get fixed in SWIM itself though, I'll follow up there.

…pected crash in the future

ktoso · 2022-11-01T09:15:39Z

Sources/DistributedActorsTestKit/LogCapture.swift

+                    actorIdentifier = "[\(id)]"
+                    _ = metadata.removeValue(forKey: "actor/path") // discard it
+                } else if let path = metadata.removeValue(forKey: "actor/path") {
+                    actorIdentifier = "[\(path)]"


revamping this now that we have more distributed actor types and less paths 👍

ktoso · 2022-11-01T09:15:49Z

Samples/Sources/SampleDiningPhilosophers/SamplePrettyLogHandler.swift

@@ -106,7 +106,7 @@ struct SamplePrettyLogHandler: LogHandler {

        let file = file.split(separator: "/").last ?? ""
        let line = line
-        print("\(self.timestamp()) [\(file):\(line)] [\(nodeInfo)\(Self.CONSOLE_BOLD)\(label)\(Self.CONSOLE_RESET)] [\(level)] \(message)\(metadataString)")
+        print("\(self.timestamp()) \(level) [\(nodeInfo)\(Self.CONSOLE_BOLD)\(label)\(Self.CONSOLE_RESET)][\(file):\(line)] \(message)\(metadataString)")


same formatting as our example formatters in samples

ktoso · 2022-11-01T09:16:04Z

Sources/DistributedCluster/Cluster/Cluster+Membership.swift

+                .lazy
+                .filter { member in member.status <= status }
+                .count
+        }


used in test but generally useful

ktoso · 2022-11-01T09:16:20Z

Sources/DistributedCluster/Cluster/Cluster+Membership.swift

@@ -223,6 +238,21 @@ extension Cluster {
    }
 }

+extension Cluster.Membership: Sequence {


generally useful to for member in membership 👍

anymore

ktoso · 2022-11-01T09:35:38Z

Sources/DistributedCluster/Cluster/SWIM/SWIMActor.swift

+           change.member.node.asClusterNode == self.id.node
+        {
+            return
+        }


The fix™

Though this deserves a deeper dig into SWIM as well.

Also, SwiftFormat's formatting of multi-line if is annoying omg!

Sources/DistributedCluster/Cluster/Cluster+Membership.swift

Sources/MultiNodeTestKit/MultiNodeTestKit+Control.swift

ktoso · 2022-11-01T21:59:40Z

SWIM follow up apple/swift-cluster-membership#91

Co-authored-by: Yim Lee <[email protected]>

ktoso · 2022-11-01T22:58:07Z

How I love a trailing space being the only failure :P

ktoso added 7 commits November 1, 2022 16:20

=pretty nicer log formatting in pretty formatter (in tests)

87c4ffa

+membership conform to Sequence

8879f25

=multi-node use struct as output of test logs; so we can report an ex…

7f5dfc6

…pected crash in the future

WIP on test reproducing the issue

bfe17f5

=log improve logging format to handle ID better; not just paths

accb0de

+cluster offer a count(atMost: some status) same as we do for members

854edde

=test improve test to check for node replacement explicitly

c06684b

ktoso changed the title ~~=pretty nicer log formatting in pretty formatter (in tests)~~ =cluster handle aggressively rejoining/replacing nodes from same host/port pair Nov 1, 2022

ktoso requested a review from yim-lee November 1, 2022 09:15

ktoso commented Nov 1, 2022

View reviewed changes

ktoso added 4 commits November 1, 2022 18:22

!cluster rename member access APIs; "unique" is not a word we use

7d160aa

anymore

=cluster the fix for aggressive node replacements

3608af3

=log use more detailed node logging by default for now

794bcca

=log undo too verbose node printing

eb52ee0

ktoso commented Nov 1, 2022

View reviewed changes

ktoso added 2 commits November 1, 2022 18:41

=test recover the unreachable set in TimeoutBasedDowningStrategy

6a92463

=docc link fixes

88a8cc9

yim-lee approved these changes Nov 1, 2022

View reviewed changes

Sources/DistributedCluster/Cluster/Cluster+Membership.swift Outdated Show resolved Hide resolved

Sources/MultiNodeTestKit/MultiNodeTestKit+Control.swift Outdated Show resolved Hide resolved

ktoso mentioned this pull request Nov 1, 2022

Check why we might emit a .dead for "current" node when "previous" was dead apple/swift-cluster-membership#91

Open

ktoso and others added 2 commits November 2, 2022 07:19

Apply suggestions from code review

f2c3a30

Co-authored-by: Yim Lee <[email protected]>

Apply suggestions from code review

9cc2141

whitespace fix

6f429f7

ktoso enabled auto-merge (squash) November 1, 2022 23:04

ktoso merged commit 4934208 into apple:main Nov 1, 2022

ktoso deleted the wip-join-over-dont-down-myself-based-on-old-node-id branch November 2, 2022 00:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

=cluster handle aggressively rejoining/replacing nodes from same host/port pair #1083

=cluster handle aggressively rejoining/replacing nodes from same host/port pair #1083

ktoso commented Nov 1, 2022 •

edited

Loading

ktoso Nov 1, 2022

ktoso Nov 1, 2022

ktoso Nov 1, 2022

ktoso Nov 1, 2022

ktoso Nov 1, 2022

ktoso commented Nov 1, 2022

ktoso commented Nov 1, 2022

=cluster handle aggressively rejoining/replacing nodes from same host/port pair #1083

=cluster handle aggressively rejoining/replacing nodes from same host/port pair #1083

Conversation

ktoso commented Nov 1, 2022 • edited Loading

ktoso Nov 1, 2022

Choose a reason for hiding this comment

ktoso Nov 1, 2022

Choose a reason for hiding this comment

ktoso Nov 1, 2022

Choose a reason for hiding this comment

ktoso Nov 1, 2022

Choose a reason for hiding this comment

ktoso Nov 1, 2022

Choose a reason for hiding this comment

ktoso commented Nov 1, 2022

ktoso commented Nov 1, 2022

ktoso commented Nov 1, 2022 •

edited

Loading