Open
Description
I know that the code doesn't mentioned, or not really design for fault tolerance. Though I think there's a problem if it's going to be used on a cluster where fault tolerance might be a requirement.
If there's a call as such,
node 1 (count 1) --> check for count --> not equal --> exit
node 2 (count 2) --> check for count --> not equal --> exit
node 3 (count 3) --> (crash)
Note: time denoted by the amount of space in between, with going right means time are advancing.
In this instance, there's no node that will call the callback, thus rendering the process on a stuck state.
I'm still thinking of a solution for this problem, maybe do you have any idea how to make it more fault tolerant?
Metadata
Metadata
Assignees
Labels
No labels