Skip to content

Latest commit

 

History

History
259 lines (214 loc) · 9.15 KB

readme.md

File metadata and controls

259 lines (214 loc) · 9.15 KB

MetalLB + FRR and Arista L3LS

Alt text

Summary

This is a repo to show MetalLB with FRR doing service IP announcements over BGP to a single top of rack switch. This is a common ask now for someone to be able to simply run FRR + Metallb in either openshift or vanilla Kubernetes to be able to advetise kubernetes services aka VIPs in the network world.

Explanation of Metallb and FRR

Metallb is a placeholder and loadbalancer for VIP space if someone is entirely unfamiliar with this concept. Metallb is used in private data centers to be able to have kubernetes request a public IP automatically and pull from a ip pool space and automatically assign an IP either v4/v6 to a kubernetes service. This is done with a lot of kubernetes operators and webhooks. Metallb also has the functionality to either have a Layer2 segment just for VIPs and have a load balancer like itself arp for a new address everytime on a L2 segment for each new service. The issue with the Layer2 segment style of load balancing is typically, you will have a single node advertising a VIP and there is no ECMP.

The preferred method for most network people is always to squash layer 2 if possible. This is where Metallb will talk to FRR via the operator. There is a kubernetes sidecar that runs per what is called metallb speaker. Each time a kubernetes service is either added or removed the FRR routers will simply reload its configuration via the operator and advertise a host route. So if there is a new kubernetes service called 1.2.3.4/32 for each VIP FRR will then advertise the VIP into BGP. It does not have to be a host route. This can be a /24 if posible of the entire services network. The Metallb documentation has a lot of good examples of advanced configuration.

This is all interestingly possible because containerlab has the ability to leverage KIND as node types. So kubernetes minions can be linked entirely to KIND docker in docker nodes. Networking is not entirely fun in this environment because of complex kubernetes nat and docker in docker but this can drive the point across of demoing FRR+Metallb with Kubernetes on private data centers.

Run containerlab

sudo containerlab -t topology.yaml deploy

While this is running export the kubeconfig

kind export kubeconfig --name=k01

While you have kubeaccess label k01 worker ingress=boarder1

kubectl label nodes k01-worker ingress=boarder1

The reason for this is because we want k01-worker to be where our FRR router lives and not a random kubernetes node. This way we can control where the router is and the peering with Node selector

daemonset.yaml
      nodeSelector:
        kubernetes.io/os: linux
        ingress: boarder1

Needs done while waiting for CL to finish.

kubectl apply -f manifest/calico.yaml

This file was changed a bit to leverage VXLAN

- name: CALICO_IPV4POOL_VXLAN
  value: "Always"

Run AVD playbooks to push configs

ansible-playbook playbooks/fabric-deploy-config.yaml -e avd_ignore_requirements=True

apply Metallb FRR operator

kubectl apply -f manifest/metallb-frr.yaml

Might need to waite a few for the pools it will get rejected until the webhook is deployed.

kubectl apply -f manifest/pools.yaml

This may take a few tries.

The pool is where the controller checks to see where the next possible IP addresses are located.

manifest/pools.yaml
apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: first-pool
  namespace: metallb-system
spec:
  addresses:
  - 10.100.1.0/24

So what this means is all of our kubernetes services will use 10.100.1.0/24 pool of IP's for kubernetes services or what is a load balancer vip.

apply the bgp configuration(peer information as well as networks to advertise)

kubectl apply -f manifest/bgp.yaml 
manifest/bgp.yaml 
apiVersion: metallb.io/v1beta1
kind: BGPAdvertisement
metadata:
  name: example
  namespace: metallb-system
spec:
  ipAddressPools:
  - first-pool
  peers: 
  - example
---
apiVersion: metallb.io/v1beta2
kind: BGPPeer
metadata:
  name: example
  namespace: metallb-system
spec:
  myASN: 64513
  peerASN: 65103
  peerAddress: 10.1.11.1
  peerPort: 179
  nodeSelectors:
  - matchLabels:
      ingress: boarder1

What this is saying in the first yaml block is to advertise what we created within the first-pool so every service from 10.100.1.0/24. The big takeaway is the matchLabels ingress: boarder1. This means to only place this configuration on any node that has the labels ingress: boarder1. This is a lot more flexibility with this but this is a very static example.

apply the daemonset for frr speakers

kubectl apply -f manifest/daemonset.yaml

Mentioned previously but the daemonset will apply the single router for frr on only the nodeselector of ingress: boarder1

daemonset.yaml
      nodeSelector:
        kubernetes.io/os: linux
        ingress: boarder1

apply a generic nginx service.

kubectl apply -f manifest/servicetest.yaml 
➜  cl-kind kubectl get services
NAME            TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
kubernetes      ClusterIP      10.96.0.1       <none>        443/TCP        19m
nginx-service   LoadBalancer   10.96.192.245   10.100.1.0    80:32460/TCP   15m

We can see we have our nginx-service with the ip of 10.100.1.0 which will be advertised via bgp to dc1-boarder1.

Exec into the border leaf

docker exec -it clab-cl-kind-boarder1 Cli

DC1_BOARDER1#show ip bgp summary vrf Tenant_A_OP_Zon
VRF Tenant_A_OP_Zon does not exist
DC1_BOARDER1#show ip bgp summary vrf Tenant_A_OP_Zone
BGP summary information for VRF Tenant_A_OP_Zone
Router identifier 192.168.255.7, local AS number 65103
Neighbor Status Codes: m - Under maintenance
  Description              Neighbor     V AS           MsgRcvd   MsgSent  InQ OutQ  Up/Down State   PfxRcd PfxAcc
  frr                      10.1.11.5    4 64513              7        14    0    0 00:00:04 Estab   1      1
  DC1_BOARDER2_Vlan3009    10.255.251.9 4 65103             20        19    0    0 00:04:18 Estab   7      7

DC1_BOARDER1#show ip route vrf Tenant_A_OP_Zone bgp 

VRF: Tenant_A_OP_Zone
Codes: C - connected, S - static, K - kernel, 
       O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
       E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
       N2 - OSPF NSSA external type2, B - Other BGP Routes,
       B I - iBGP, B E - eBGP, R - RIP, I L1 - IS-IS level 1,
       I L2 - IS-IS level 2, O3 - OSPFv3, A B - BGP Aggregate,
       A O - OSPF Summary, NG - Nexthop Group Static Route,
       V - VXLAN Control Service, M - Martian,
       DH - DHCP client installed default route,
       DP - Dynamic Policy Route, L - VRF Leaked,
       G  - gRIBI, RC - Route Cache Route,
       CL - CBF Leaked Route

 B E      10.100.1.0/32 [20/0] via 10.1.11.5, Vlan111

Checking the FRR pod for fun

kubectl exec -it speaker-zvx7h -n metallb-system -- sh

Defaulted container "frr" out of: frr, reloader, frr-metrics, speaker, cp-frr-files (init), cp-reloader (init), cp-metrics (init)
/ # 

vtysh
show ip bgp summary

k01-worker# show ip bgp summary

IPv4 Unicast Summary (VRF default):
BGP router identifier 192.168.32.4, local AS number 64513 vrf-id 0
BGP table version 1
RIB entries 1, using 96 bytes of memory
Peers 1, using 13 KiB of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
10.1.11.1       4      65103        41        23        1    0    0 00:16:05            0        1 N/A

Total number of neighbors 1

k01-worker# show running-config 
Building configuration...

Current configuration:
!
frr version 9.1_git
frr defaults traditional
hostname k01-worker
log file /etc/frr/frr.log informational
log timestamp precision 3
no ipv6 forwarding
service integrated-vtysh-config
!
router bgp 64513
 no bgp ebgp-requires-policy
 no bgp hard-administrative-reset
 no bgp default ipv4-unicast
 no bgp graceful-restart notification
 bgp graceful-restart preserve-fw-state
 no bgp network import-check
 neighbor 10.1.11.1 remote-as 65103
 !
 address-family ipv4 unicast
  network 10.100.1.0/32
  neighbor 10.1.11.1 activate
  neighbor 10.1.11.1 route-map 10.1.11.1-in in
  neighbor 10.1.11.1 route-map 10.1.11.1-out out
 exit-address-family
 !
 address-family ipv6 unicast
  neighbor 10.1.11.1 activate
  neighbor 10.1.11.1 route-map 10.1.11.1-in in
  neighbor 10.1.11.1 route-map 10.1.11.1-out out
 exit-address-family
exit
!
ip prefix-list 10.1.11.1-pl-ipv4 seq 1 permit 10.100.1.0/32
!
ipv6 prefix-list 10.1.11.1-pl-ipv4 seq 2 deny any
!
route-map 10.1.11.1-in deny 20
exit
!
route-map 10.1.11.1-out permit 1
 match ip address prefix-list 10.1.11.1-pl-ipv4
exit
!
route-map 10.1.11.1-out permit 2
 match ipv6 address prefix-list 10.1.11.1-pl-ipv4
exit
!
end
k01-worker# 

We can see that network 10.100.1.0/32 is advertised to the top of rack switches. Assuming another services comes in it would then get network 10.100.1.1/32 within the network.