Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two AddressBlocks are created when coil-controller is temporarily down #271

Open
1 task
masa213f opened this issue Jan 23, 2024 · 5 comments · May be fixed by #309
Open
1 task

Two AddressBlocks are created when coil-controller is temporarily down #271

masa213f opened this issue Jan 23, 2024 · 5 comments · May be fixed by #309
Labels
bug Something isn't working

Comments

@masa213f
Copy link
Contributor

masa213f commented Jan 23, 2024

Describe the bug

The Coil create two AddressBlocks when the coil-controller is temporarily down.
And one of the two AddressBlocks may leak when the pod which uses the AddressBlocks is deleted.

Environments

  • Version: Coil v2.5.1

To Reproduce

  1. Setup kind cluster
$ cd ~/go/src/github.com/cybozu-go/coil/v2/e2e
$ make start install-coil
$ kubectl apply -f manifests/default_pool.yaml
  1. Create AddressPool and Namespace
$ kubectl apply -f - << EOF
apiVersion: coil.cybozu.com/v2
kind: AddressPool
metadata:
  name: test-pool
spec:
  blockSizeBits: 0
  subnets:
  - ipv4: 10.0.0.0/30
EOF

$ kubectl apply -f - << EOF
apiVersion: v1
kind: Namespace
metadata:
  annotations:
    coil.cybozu.com/pool: test-pool
  name: test-ns
EOF
  1. Stop coil-controllers
$ kubectl patch deployment -n kube-system coil-controller -p '{"spec":{"replicas":0}}'
$ kubectl get pod -n kube-system -l app.kubernetes.io/component=coil-controller
  1. Create a Pod
$ kubectl apply -f - << EOF
apiVersion: v1
kind: Pod
metadata:
  name: test-pod
  namespace: test-ns
spec:
  containers:
  - name: ubuntu
    image: ghcr.io/cybozu/ubuntu:22.04
    command: ["pause"]
EOF
  1. Wait 1 minute

  2. Start the coil-controllers

$ kubectl patch deployment -n kube-system coil-controller -p '{"spec":{"replicas":2}}'
$ kubectl get pod -n kube-system -l app.kubernetes.io/component=coil-controller

Then, the Coil sometimes creates two AddressBlocks for the test Pod.

$ kubectl get pod -n test-ns -o wide
NAME       READY   STATUS    RESTARTS   AGE     IP         NODE           NOMINATED NODE   READINESS GATES
test-pod   1/1     Running   0          2m56s   10.0.0.1   coil-worker3   <none>           <none>

$ kubectl get addresspool,addressblock 
NAME                                    BLOCKSIZE BITS
addresspool.coil.cybozu.com/default     0
addresspool.coil.cybozu.com/test-pool   0

NAME                                       NODE                 POOL        IPV4            IPV6
addressblock.coil.cybozu.com/default-0     coil-control-plane   default     10.244.0.0/32   
addressblock.coil.cybozu.com/default-1     coil-control-plane   default     10.244.0.1/32   
addressblock.coil.cybozu.com/default-2     coil-control-plane   default     10.244.0.2/32   
addressblock.coil.cybozu.com/test-pool-0   coil-worker3         test-pool   10.0.0.0/32    ★ Two address blocks exist.
addressblock.coil.cybozu.com/test-pool-1   coil-worker3         test-pool   10.0.0.1/32    ★

After this, when the test Pod is deleted, one AddressBlock remains.

$ kubectl delete pod -n test-ns test-pod
pod "test-pod" deleted

$ kubectl get pod -n test-ns -o wide
No resources found in test-ns namespace.

$ kubectl get addresspool,addressblock 
NAME                                    BLOCKSIZE BITS
addresspool.coil.cybozu.com/default     0
addresspool.coil.cybozu.com/test-pool   0

NAME                                       NODE                 POOL        IPV4            IPV6
addressblock.coil.cybozu.com/default-0     coil-control-plane   default     10.244.0.0/32   
addressblock.coil.cybozu.com/default-1     coil-control-plane   default     10.244.0.1/32   
addressblock.coil.cybozu.com/default-2     coil-control-plane   default     10.244.0.2/32   
addressblock.coil.cybozu.com/test-pool-0   coil-worker3         test-pool   10.0.0.0/32    ★ This doesn't go away.

Tasks

Preview Give feedback
  1. terassyi
@masa213f masa213f added the bug Something isn't working label Jan 23, 2024
@ymmt2005
Copy link
Member

ymmt2005 commented Jan 23, 2024

I believe coild on the assigned node will eventually collect unused AddressBlocks.
https://github.com/cybozu-go/coil/blob/main/docs/design.md#addressblock

At startup, coild also checks each AddressBlock for the Node, and if no Pod is using the addresses in the block, it deletes the AddressBlock.

Please reopen if I'm wrong.

@masa213f
Copy link
Contributor Author

@ymmt2005
In the actual cluster, an unused AddressBlock had been left for nearly half a year...

@ymmt2005
Copy link
Member

If that is so serious, I'd like to suggest calling the GC logic periodically,
not only at the process startup.

@masa213f
Copy link
Contributor Author

I'd like to suggest calling the GC logic periodically,

That sounds good.

But this is not a serious problem, except when deleting addresspools.
I don't think we should bother to modify it.

@ymmt2005
Copy link
Member

ymmt2005 commented Jan 31, 2024

A workaround is to manually delete the coild Pod of the node.
That will trigger a GC.

Deleting coild is a safe operation. It does not interrupt networking.

@terassyi terassyi reopened this Apr 8, 2024
@terassyi terassyi linked a pull request Jan 15, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants