Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod add/delete operations result in "error getting ClusterInformation: connection is unauthorized: Unauthorized" #9428

Open
steven-webster opened this issue Nov 1, 2024 · 3 comments

Comments

@steven-webster
Copy link

OS: Linux kernel v. 6.6.0-1
Kubernetes version v1.29.2
Calico version v3.28.0

Hello, I understand this issue has been raised/closed before with a potential solution in #5910. We should have this commit in our Calico 3.28.0, but have recently experienced the "connection is unauthorized: Unauthorized" in a couple of systems. The workaround as others have mentioned in other bug reports seems to be to simply restart the calico-node pod. These systems have been re-installed as well, with no re-occurance of the particular message. We retain logs from the affected / problematic systems however, which may help.

Looking at the calico-node logs, there doesn't seem to be any issue reported by the 'token watch' facility, with updates/writes of the kubeconfig file always seeming to be successful. eg:

2024-10-19T20:50:00.035673627Z stdout F 2024-10-19 20:50:00.035 [INFO][81] cni-config-monitor/token_watch.go 226: Update of CNI kubeconfig triggered based on elapsed time.
2024-10-19T20:50:00.035909968Z stdout F 2024-10-19 20:50:00.035 [INFO][81] cni-config-monitor/token_watch.go 281: Wrote updated CNI kubeconfig file. path="/host/etc/cni/net.d/calico-kubeconfig"

Expected Behavior

Ability to start/stop pods.

Current Behavior

Here's an example of what one of the error messages looks like when trying to start a pod:

2024-10-22T22:30:01.648 controller-0 kubelet[13782]: info I1022 22:30:01.648705 13782 util.go:30] "No sandbox for pod can be found. Need to start a new one" pod="kube-system/ceph-pools-audit-28827270-65l9h"
2024-10-22T22:30:02.145 controller-0 kubelet[13782]: info E1022 22:30:02.145259 13782 remote_runtime.go:193] "RunPodSandbox from runtime service failed" err="rpc error: code = Unknown desc = failed to setup network for sandbox "50242a9d9cc28594d0ef9190046f0fef6ab17a977ec922c25d28fc59f2a352b1": plugin type="multus" name="multus-cni-network" failed (add): [kube-system/ceph-pools-audit-28827270-65l9h/21fbd5d4-046f-46fe-8d40-4b7b52f5daab:chain]: error adding container to network "chain": error getting ClusterInformation: connection is unauthorized: Unauthorized"

Possible Solution

Killing/restarting the calico-node pod resolves things.

Steps to Reproduce (for bugs)

Unfortunately I don't have a reliable way to reproduce this.

@caseydavenport
Copy link
Member

: error adding container to network "chain": error getting ClusterInformation: connection is unauthorized: Unauthorized"

I've seen other instances of this caused by:

  • Date / time issues with the Nodes not being in sync with the apiserver.
  • Stale data in etcd - for example, after restoring Kubernetes etcd from a backup.
  • Bad TLS certificates being given to pods by Kubernetes

I think the generic "connection is unauthorized: Unauthorized" error message typically means something more fundamental is wrong with the system, rather than say using an expired token or having inadequate RBAC, both of which would result in a more specialized error message.

@steven-webster
Copy link
Author

Hey @caseydavenport thanks for the response. In this system we are using NTP. Looking around the time the system started reporting the "connection is unauthorized: Unauthorized" I can see the following NTPQ info:

		remote           refid      	      st     t      when poll reach   delay   offset  jitter

==================================================================================
info NTPQ: +2606:82c0:21::e 216.239.35.12 2 u 108 128 377 185.897 -0.994 0.243
info NTPQ: +64:ff9b::2d3d:b 194.0.5.123 3 u 78 128 377 64.570 -4.262 0.988
info NTPQ: *2620:10a:a001:a 206.108.0.132 2 u 57 128 377 0.257 +2.627 0.087
info NTPQ: +2620:10a:a001:d 90.139.163.33 3 s 101 128 376 50.174 +0.640 0.157
info NTPQ: +2606:82c0:21::e 216.239.35.12 2 u 11 128 377 185.704 -1.257 0.127
info NTPQ: +64:ff9b::2d3d:b 194.0.5.123 3 u 111 128 377 64.570 -4.262 0.921

Sorry if this formatting is bad, but we do see a spike around -4.262 milliseconds offset around where the problem happens. But looking at the rest of the logs, it doesn't seem like this is an abberation from the whole. As well, this particular one is to an external NTP server via 6to4.

I guess it's just a mystery to me why if we restart the calico-node pod (and in other workarounds with this similar issue) that things resolve themselves.

@caseydavenport
Copy link
Member

Sounds unlikely to be time mismatch then.

hard to say what else it might be - I'd probably recommend that next time this happens, take a look at the Kubernetes apiserver logs to see if it can provide an explanation for why it is rejecting the requests as Unauthorized - sometimes there is additional context in those logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants