Description
Feature Request
Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Example: "I have an issue when (...)"
From leader/leader.go, leader re-election works after the default timeout 5-min since the condition
Pod.status.phase == "Failed" && Pod.Status.Reason == "Evicted" when a worker node is failed.
I have an opinion that leader re-election can work almost immediately when the condition contains checking the status of the node where the leader pod is running.
Describe the solution you'd like
A clear and concise description of what you want to happen. Add any considered drawbacks.
Check the condition of the node where the leader pod is running with [Node.Type == "NodeReady" && Node.Status != "ConditionTrue"] and when the node has been failed, delete the leaderPod (only mark the pod with 'terminating' because the node where the pod is running has been failed) and the configmap lock whose OwnerReference is leaderPod)
Pictures below are the test of node-check for leader re-election.(test-operator-xxx-xxxqb was the leaderPod)
Making --pod-eviction-timeout to be short can be another approach. However, I sure that above approach can bring more reliability since we don't know appropriate time out.
And Is there any drawbacks when making --pod-eviction-timeout to be very very short?