You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened:
We are running an EKS cluster with 1.28 Kubernetes version, this cluster uses Karpenter for dynamically scale the cluster. There is constant movement in the cluster so new nodes are constantly appearing.
We've found that in some cases after a new node appears the aws-node in that node is stuck and new pods can not start due to aws-node not being reachable so the container networking is not configured. See the pod event:
Warning FailedCreatePodSandBox 3m53s (x268 over 62m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "94ee9c38f7a821bf5abf88a511f2ec99b13c37
43e3dfc6327e82ad84833a9e69": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: Error received from AddNetwork gRPC call: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 12
7.0.0.1:50051: connect: connection refused"
Checking the aws-node I see two things:
The container is marked as not fully running:
NAME READY STATUS RESTARTS AGE
aws-node-4xbrr 1/2 Running 0 4h5m
What happened:
We are running an EKS cluster with 1.28 Kubernetes version, this cluster uses Karpenter for dynamically scale the cluster. There is constant movement in the cluster so new nodes are constantly appearing.
We've found that in some cases after a new node appears the
aws-node
in that node is stuck and new pods can not start due toaws-node
not being reachable so the container networking is not configured. See the pod event:Checking the
aws-node
I see two things:I did not have debug logs enabled.
After restarting the pod it works again.
Node AMI:
v1.28.11-eks-1552ad0
AWS CNI:
v1.18.2-eksbuild.1
Thanks!
The text was updated successfully, but these errors were encountered: