-
Notifications
You must be signed in to change notification settings - Fork 767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pods stuck in unknown state after reboot #1520
Comments
Experiencing similar issues with dashboard here. Is |
@zar3bski could you attach the |
@davigar15 the attached inspection report seems corrupted. |
Here it is. Since then, I also tried to remove the pods manually. It remained stuck on
inspection-report-20200917_165856.tar.gz
Tried both fixes but it did not change much |
Facing same issue after restart of VM, the pods are in unknown state. Tried restarting service and stop and stop of microk8s service as well. |
Thank you for you patience and apologies for the inconvenience this issue may have caused. When the node starts it needs to invalidate IPs and update pods with new IPs. In kubelet logs you can see this call failing:
On the API server side we see the failed call to "admission.juju.is" to webhook. This webhook is supposed to intercept the REST API call and authorize it. However the pod of the webhook is hosted in the cluster so its IP is not correct so it cannot be found.
We have a bug opened for this at https://bugs.launchpad.net/juju/+bug/1898718 As a temporary workaround you could use Juju 2.7 until this gets addressed. |
@ktsakalozos Currently working on a fix for Juju to resolve this. Have updated lp bug. |
Fixed committed in Juju. Will be available in 2.8.6 |
@ktsakalozos still facing the same issue, what might be the case? |
Issue still exists in 1.22.2, any ideas on how to fix this besides resetting/reinstalling? |
@arnitkun could you please attach a |
@ktsakalozos I shall do it the next time it happens, apparently after another restart everything was good again. |
HI all, I also got same issue. but check checking inspection report i found |
Not sure whether related: #3293 |
I don't have juju installed in my machine. After reboot, all the pods status is |
my pods also went to unknown mode, I guess after rebooting this happened for both of my servers which has microk8s and centOS7, I post issue here also: #3545 |
Is it possible that the reboot caused the system to start from another kernel? |
tnx for ur reply: #docker run -itd busybox:latest the environement is the same, so if container gets created with docker on the same machine with 3.xx kernel and docker-ce and containerd version as following then we can eliminate the solution that says: "with upgrade of kernel version or downgrade docker & containerd version the problem will get solved!" #uname -sr Server: Docker Engine - Community #rpm -qa kernel #rpm -qa | grep -i kernel |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hi, We are using microk8s 1.19, with Single Node Cluster, so when the time is adjusted by networktimeprotocol automatically when moved backward for 1 hour+, then we rebooted this single node cluster PC, all cluster pods + applications pods went to unknow state, the cluster pods are hang/corrupted/unusable. Could you please resolve this issue, even if we move time 1 hour back manually as well, the Pods went to unknow state. Attached are microk8s inspect logs. Please let me know if you need any further info, the productions servers are unable due to this unknow state issue. Once the time is moved forward either if we waited or moved manually it is working. Please investigate and resolve this. If in customer location if we installed networktimeprotocol then we are getting this issue. NOTE : FYI...Certificate are valid & not expired. ERROR : inspection-report/snap.microk8s.daemon-kubelet/systemctl.log:Sep 18 09:21:39 host-pc microk8s.daemon-kubelet[8291]: E0918 09:21:39.493660 8291 pod_workers.go:191] Error syncing pod 1785b49a-4dc2-4b99-b75d-6a181d22322e ("service-0_default(1785b49a-4dc2-4b99-b75d-6a181d22322e)"), skipping: failed to "CreatePodSandbox" for "service-0_default(1785b49a-4dc2-4b99-b75d-6a181d22322e)" with CreatePodSandboxError: "CreatePodSandbox for pod "service-0_default(1785b49a-4dc2-4b99-b75d-6a181d22322e)" failed: rpc error: code = Unknown desc = failed to reserve sandbox name "service-0_default_1785b49a-4dc2-4b99-b75d-6a181d22322e_5": name "service-0_default_1785b49a-4dc2-4b99-b75d-6a181d22322e_5" is reserved for "95beab4145623a3cc0d86c814da1e5cca4593997990245f8c626f0fc87c6c788"" |
I'm using latest/edge (with calico cni) and after rebooting the machine I'm getting all pods in Unknown state.
Logs of the calico node:
An interesting this if that calico is detecting the network used for LXD.
Following @ktsakalozos suggestions, I added this in /var/snap/microk8s/current/args/cni-network/cni.yaml and apply that spec.
The calico node did not restart, so I kill it to force the restart. But it did not come up even with
microk8s.stop && microk8s.start
This is the tarball generated by microk8s.inspect
inspection-report.zip
The text was updated successfully, but these errors were encountered: