-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add pre-stop hook to unregister a compute node before its pod's termination #71
Conversation
Tested locally. |
Unregistering workers will deregister the worker from meta. The corresponding worker will receive a notification that it has expired and then terminate itself. It is unclear whether early unregistering of the worker may cause the worker to quickly restart as a new one, rejoin, and then get killed by k8s 🤔 |
Actually it's well defined in the doc. It should be triggered when and only when k8s thinks the container needs to be terminated. It means the case could exist when a Pod fails to pass probe and gets restarted, and will quickly rejoin the cluster. IMO, it should be fine since the cluster will never be stable in any circumstances whenever a pod's killed. And also a cluster should converge toward a stable point when the pods are all running.
https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/#container-hooks What I really concern about is the cost of recovery, as unregistration will definitely trigger one. There's no guarantee of how many times unregistration and recovery will happen before the it arrives the stable point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I believe this needs @shanicky's consent
… termination Signed-off-by: arkbriar <[email protected]>
Signed-off-by: arkbriar <[email protected]>
6337cfb
to
78a5f50
Compare
Added an option |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
In addition,
terminationGracePeriodSeconds
of compute pods to 10s to give the hook some time,