Skip to content

Check the node status when leader re-election. #24

Closed
@kasonglee

Description

@kasonglee

Feature Request

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Example: "I have an issue when (...)"

From leader/leader.go, leader re-election works after the default timeout 5-min since the condition
Pod.status.phase == "Failed" && Pod.Status.Reason == "Evicted" when a worker node is failed.
I have an opinion that leader re-election can work almost immediately when the condition contains checking the status of the node where the leader pod is running.

Describe the solution you'd like
A clear and concise description of what you want to happen. Add any considered drawbacks.

Check the condition of the node where the leader pod is running with [Node.Type == "NodeReady" && Node.Status != "ConditionTrue"] and when the node has been failed, delete the leaderPod (only mark the pod with 'terminating' because the node where the pod is running has been failed) and the configmap lock whose OwnerReference is leaderPod)

Pictures below are the test of node-check for leader re-election.(test-operator-xxx-xxxqb was the leaderPod)

3
4
5

Making --pod-eviction-timeout to be short can be another approach. However, I sure that above approach can bring more reliability since we don't know appropriate time out.

And Is there any drawbacks when making --pod-eviction-timeout to be very very short?

@HyungJune

Metadata

Metadata

Labels

lifecycle/rottenDenotes an issue or PR that has aged beyond stale and will be auto-closed.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions