Node Annotations are Only Evaluated on Startup and not Actively Watched During Runtime #676

Feder1co5oave · 2019-02-21T16:41:19Z

On an already existing kubernetes cluster with kube-router successfully running in route reflector mode (rr.server annotation on 3 master nodes, rr.client annotation on all the workers), my workflow for joining new worker nodes is like this (the cluster was created with kubeadm):

for each new node:

run kubeadm join on the node
kubectl annotate node kube-router.io/rr.client=23
restart kube-router on the new node to put them in RR mode

Turns out the nodes previously joined to the cluster will receive routes for all the new nodes, but the new nodes will only receive routes for the "old" nodes. Restarting kube-router on the route reflector nodes solved this issue.

My troubleshooting suggested that whenever a new node joins, the rr server kube-router daemons will peer with that node right away, even before the rr.client annotation gets added to it. Thus, the new peer is assumed to be in full mesh mode even if in fact it will be in RR mode and will only peer with RR servers. BGP route reflector allows for BGP daemons in the same AS to peer in either full-mesh or RR mode. Export policies are such that RR servers will reflect advertisements:

from RR peers to RR peers
from full-mesh peers to RR peers
from RR peers to full-mesh peers
but NOT from full-mesh peers to other full-mesh peers (they're supposed to peer with each other in full-mesh, duh)

So the new nodes, assumed to be forming full-mesh, don't get advertisements about other new nodes.
Restarting RR servers forces them to reload the node list and annotations, and correctly peer with the new nodes in RR mode.

To fix this, kube-router should watch for annotation changes on nodes, and update its internal information about which nodes are forming full-mesh, and which are joined to a RR cluster.
While waiting for a fix to be implemented, I suggest the workaround to restart RR servers be documented for newcomers!

The text was updated successfully, but these errors were encountered:

aauren · 2020-04-25T06:32:44Z

Closed via #677

ofen · 2020-04-29T13:00:39Z

Is it possible to make it automatically (without manual POD restart)?

murali-reddy · 2020-04-29T14:49:11Z

@ofen reopening as still below make sense to be fixed.

kube-router should watch for annotation changes on nodes, and update its internal information about which nodes are forming full-mesh, and which are joined to a RR cluster.
While waiting for a fix to be implemented, I suggest the workaround to restart RR servers be documented for newcomers!

zhaixigui · 2020-08-19T16:51:00Z

Why can't it automatically recognize the newly added node? And to propagate routing。Why have to do such a dangerous operation to kill RR?

zhaixigui · 2020-08-19T16:54:27Z

Do we have any choice but to kill the RR?

Feder1co5oave · 2020-08-19T16:55:31Z

@zhaixigui the new nodes are recognized, but they are assumed to be in full mesh mode, and there's no clean way to do the switch to RR mode other than restart kube-router on both RR server and client to reload the nodes' information.

Killing RR servers should not bring any disruption as long as you enabled soft-restart. I've done it several times without any repercussions.

zhaixigui · 2020-08-19T18:00:54Z

@Feder1co5oave a bad idea？ when the RR server watched a ”kube-router.io/rr.client=42“ annotation from
most recently joined node. it first calls method DeleteNeighbor to delete this node, and then calls method AddNeighbor to add this node as RR client, and RR will automatically propagate this route ？

zhaixigui · 2020-08-20T02:31:11Z

@zhaixigui the new nodes are recognized, but they are assumed to be in full mesh mode, and there's no clean way to do the switch to RR mode other than restart kube-router on both RR server and client to reload the nodes' information.

Killing RR servers should not bring any disruption as long as you enabled soft-restart. I've done it several times without any repercussions.

soft-restart is GracefulRestart ?

zhaixigui · 2020-08-22T06:58:01Z

Does calico have the same problem?

Feder1co5oave · 2020-08-22T12:48:47Z

I don't know. Calico has a great reputation as network plugin and also great features. When I considered it for my clusters I found it is more complicated and has more moving parts than kube-router, so I ended up choosing the latter because of its "simplicity".

zhaixigui · 2020-08-25T13:42:26Z

@Feder1co5oave a bad idea？ when the RR server watched a ”kube-router.io/rr.client=42“ annotation from
most recently joined node. it first calls method DeleteNeighbor to delete this node, and then calls method AddNeighbor to add this node as RR client, and RR will automatically propagate this route ？

@Feder1co5oave Feder， Is this a bad idea? is there a better idea or solution?

Feder1co5oave · 2020-08-25T13:47:37Z

@zhaixigui not at all, in fact it is exactly the solution I proposed in my first post. But I'm not a gopher and this would involve a bit of modifications to the control flow, i.e. watching nodes. I believe we'll have to wait for a volunteer to write a PR.

sebltm · 2024-08-22T14:45:23Z

I think it's as easy as doing something like this: #1723
I've tested it in my local environment, seems to work as expected

Feder1co5oave added a commit to Feder1co5oave/kube-router that referenced this issue Feb 21, 2019

document workaround for cloudnativelabs#676

cfd51fd

Feder1co5oave mentioned this issue Feb 21, 2019

document workaround for #676 #677

Merged

murali-reddy pushed a commit that referenced this issue Mar 10, 2019

document workaround for #676 (#677)

7b20ae9

aauren closed this as completed Apr 25, 2020

murali-reddy reopened this Apr 29, 2020

aauren added the enhancement label Jul 10, 2020

zhaixigui mentioned this issue Aug 22, 2020

calico need to restart RR after added a new node projectcalico/calico#3915

Closed

Feder1co5oave closed this as completed Aug 22, 2020

Feder1co5oave reopened this Aug 22, 2020

aauren changed the title ~~Newly joined nodes don't receive all routes when using RR mode~~ Node Annotations are Only Evaluated on Startup and not Actively Watched During Runtime Aug 13, 2021

aauren mentioned this issue Nov 15, 2022

Mismatched local address when peering with external router on difference network #1371

Closed

aauren added the override-stale Don't allow automatic management of stale issues / PRs label Sep 4, 2023

aauren added this to the v2.1.0 milestone Jan 6, 2024

aauren self-assigned this Jan 6, 2024

aauren modified the milestones: v2.1.0, v2.2.0 Mar 2, 2024

aauren mentioned this issue May 15, 2024

Initial BGP sync during kube-router startup extremely slow in kubernetes v1.29 #1668

Closed

sebltm linked a pull request Aug 22, 2024 that will close this issue

fix: update peers on node update #1723

Open

aauren mentioned this issue Sep 22, 2024

Refactor: Abstract Node Info #1739

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Node Annotations are Only Evaluated on Startup and not Actively Watched During Runtime #676

Node Annotations are Only Evaluated on Startup and not Actively Watched During Runtime #676

Feder1co5oave commented Feb 21, 2019

aauren commented Apr 25, 2020

ofen commented Apr 29, 2020

murali-reddy commented Apr 29, 2020

zhaixigui commented Aug 19, 2020

zhaixigui commented Aug 19, 2020

Feder1co5oave commented Aug 19, 2020

zhaixigui commented Aug 19, 2020 •

edited

Loading

zhaixigui commented Aug 20, 2020

zhaixigui commented Aug 22, 2020

Feder1co5oave commented Aug 22, 2020 •

edited

Loading

zhaixigui commented Aug 25, 2020 •

edited

Loading

Feder1co5oave commented Aug 25, 2020

sebltm commented Aug 22, 2024

Node Annotations are Only Evaluated on Startup and not Actively Watched During Runtime #676

Node Annotations are Only Evaluated on Startup and not Actively Watched During Runtime #676

Comments

Feder1co5oave commented Feb 21, 2019

aauren commented Apr 25, 2020

ofen commented Apr 29, 2020

murali-reddy commented Apr 29, 2020

zhaixigui commented Aug 19, 2020

zhaixigui commented Aug 19, 2020

Feder1co5oave commented Aug 19, 2020

zhaixigui commented Aug 19, 2020 • edited Loading

zhaixigui commented Aug 20, 2020

zhaixigui commented Aug 22, 2020

Feder1co5oave commented Aug 22, 2020 • edited Loading

zhaixigui commented Aug 25, 2020 • edited Loading

Feder1co5oave commented Aug 25, 2020

sebltm commented Aug 22, 2024

zhaixigui commented Aug 19, 2020 •

edited

Loading

Feder1co5oave commented Aug 22, 2020 •

edited

Loading

zhaixigui commented Aug 25, 2020 •

edited

Loading