-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cilium sometimes ends up in a failed state unable to contact K8s API Server #592
Comments
Hello @playworker ,
|
I'm experiencing this issue as well on my EKS cluster. I configured cilium
The important part is the k8sServiceHost which I pointed at my eks cluster api endpoint. In my case this results in nodes being destroyed and created continously by Karpenter which is our autoscaler. |
This is resolved in the latest version of the snap. |
Summary
Sometimes when things go wrong with the k8s cluster Cilium ends up in a failed state and I don't know how to recover it.
The Cilium failure looks a lot like this: cilium/cilium#20679
The Cilium Operator and the Daemonset pods are trying to contact the K8s API Server but can't, I don't believe the IP address they're trying is correct, it's a 10.x.x.x address.
I've not been able to recover from this error, but I haven't really tried much. Disabling the network using the k8s CLI tool doesn't have any effect.
What Should Happen Instead?
I'm not sure what the underlying issue is, if it is a issue with the API Server IP address being wrong then I guess that needs to be set correctly somehow.
Reproduction Steps
The most recent time this happened I set the containerd_custom_registries setting to a bad value, it included a semi-colon in the middle of the string:
I corrected the setting but the k8s cluster in Juju ended up in an errored state and the Cilium Operator and Pods ended up in the situation described above. I managed to recover the k8s units in Juju by downgrading the release then bumping it back up again, but I am unable to recover the Cilium installation back to a working state
System information
inspection-report-20240808_102202.tar.gz
Can you suggest a fix?
No response
Are you interested in contributing with a fix?
No response
The text was updated successfully, but these errors were encountered: