Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable cilium addon fails in multinode cluster #4446

Open
slapcat opened this issue Feb 29, 2024 · 2 comments
Open

Enable cilium addon fails in multinode cluster #4446

slapcat opened this issue Feb 29, 2024 · 2 comments

Comments

@slapcat
Copy link

slapcat commented Feb 29, 2024

Summary

Enabling the cilium addon on an existing multinode cluster only works on the current node the command is run on. The change fails to take effect on other nodes leading to two issues:

  • New pods cannot be scheduled on the other nodes
  • If the other node is rebooted, all pods fail to start on it

The root cause seems to be that the cilium addon depends on the community addon being enabled, but this is not done automatically on the other nodes when enabling cilium. This leads to a situation where the other nodes are still configured for the calico CNI, but it does not exist.

What Should Happen Instead?

Cilium should be correctly configured on all nodes after running microk8s enable cilium.

Reproduction Steps

I used juju when testing the issue:

  1. juju deploy microk8s --channel=1.28/stable -n 3 --series jammy
  2. juju ssh microk8s/leader sudo microk8s enable community
  3. juju ssh microk8s/leader sudo microk8s enable cilium
  4. juju ssh microk8s/leader sudo microk8s.kubectl create deploy --replicas=3 --image=nginx test-deploy

You should now see a pod running on the microk8s/leader node, but pending on all others. You can also see that the contents of /var/snap/microk8s/current/args/cni-network on the microk8s nodes are different.

Introspection Report

N/A

Can you suggest a fix?

There is currently a workaround where you copy the contents of /var/snap/microk8s/current/args/cni-network on the working node and transfer it to the other nodes. Then snap restart microk8s.

If you are building the cluster from scratch, or moving from single node to multinode, you can also prepare new nodes by enabling community and cilium addons before running add-node.

Are you interested in contributing with a fix?

@ktsakalozos This regards an issue I asked you about earlier this week.

@slapcat
Copy link
Author

slapcat commented Mar 1, 2024

Additional Details

Pod errors on other nodes after enabling cilium:

  Normal   Scheduled               3m2s                default-scheduler  Successfully assigned default/web-cilium-5f668dd859-mm5d8 to juju-9c0265-microk8s-1
  Warning  FailedCreatePodSandBox  3m2s                kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "5b702f9572b97ec1f34b3e43691f1f0c5422326a3bfe5a799a96d70f0f913ea9": plugin type="calico" failed (add): error getting ClusterInformation: connection is unauthorized: Unauthorized
  Normal   SandboxChanged          3s (x15 over 3m1s)  kubelet            Pod sandbox changed, it will be killed and re-created.

Working node /var/snap/microk8s/current/args/cni-network/ contents:

05-cilium-cni.conf  10-calico.conflist  calico-kubeconfig  cni.yaml.disabled

Broken node /var/snap/microk8s/current/args/cni-network/ contents:

10-calico.conflist  calico-kubeconfig  cni.yaml  cni.yaml.backup

@mcosti
Copy link

mcosti commented Sep 5, 2024

I am getting the exact same errors as you, but weirdly not always. I have a daemonset that can be launched into the other node, but the regular deployments cannot.

I am also confused on why calico is mentioned when it should have been removed from the system (I guess?)

Did you find any solution? I am connecting my nodes via tailscale.

With calico it worked, but it was a bit flaky, which is why I'm trying out cilium.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants