-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cilium Cluster Scope IPAM melts DRBD storage fabric #705
Comments
I guess my first question here is this : Does linstor/drbd supports operating two separate instances in the same subnet ? (In my case it seems like it is two separate clusters pods subnets that accidentally happened to be able to talk to each-other through some Cilium config trick) |
Sure, we don't do anything fancy with network. For LINSTOR/DRBD there are just the other cluster nodes that are identified by their IP address. What you seem to be experiencing is that there are multiple Pods that are assigned the same IP address. I'm guessing it's that a Pod in cluster A is assigned the same IP address as a Pod in cluster B. This would explain the "unexpected connection from..." messages. This would then cause some weird states in DRBD I guess... |
Update:
Note: these problems don't appear in a virtual environment (In my case a proxmox/qemu machines) deployment of Cilium. Only on bare-metal. For anyone reading this trying to deploy this in Prod you've been warned. Re : confliction IP : I dont think two nodes had the same IP though... I think they just were assigned IP from the wrong pool but thats it. I can test again if I re-create another cluster on bare-metal but for now I have to move on with this. |
Is there an existing issue for this?
Version
Cilium equal or higher than v1.16.0 and lower than v1.17.0
Piraeus Operator: v2.6.0
What happened?
Cross posting cilium/cilium#34745 here, because maybe it is a DRBD config issue more than a Cilium issue.
Anyway, when I create cluster with
ipam.mode
as default (not set) my interfaces on my nodes get assigned these random IPs in 10.0.0.0/8. Another cluster exists on the same subnet (and that is fine) but when this other cluster also has DRBD nodes somehow DRBD goes of the rails with a lot of errors and taints the nodes with quorum lost.In dmesg I see errors like "unexpected connection from..."
When I check the node pool in DRBD I see weird things such as this :
The nodes have picked up cilium clusterpool IPs (?) instead of an IP from the services CIDR (?).
FYI my cluster is configured with
and the other cluster has completely different IPAM CIDRS. They simply can talk to each other (2 switches appart, without firewall)
logs on the storage controller of the melted nodes are an infinite loop of
If I use
ipam.mode=kubernetes
, I will see :which is the correct values from the pod CIDR. No quorum lost there.
I actually started noticing this when I turned on encryption strict mode in prod cluster. But after digging a bit I am thinking there is an issue with
ipam.mode=clusterpool
somehow. This severely breaks storage accross ALL clusters in 10.0.0.0/8 apparently. You have been warned.Right now I am trying to figure out what is going on, so this Bug report will be very vague and I am posting here to see if anyone has any idea what is up with this.
How can we reproduce the issue?
ipam.mode=clusterpool
Kernel Version
6.6.43-talos
Kubernetes Version
1.30.3
The text was updated successfully, but these errors were encountered: