From 9ada41c9a6877e42cdeb35bee54dbfce5d2aa5ae Mon Sep 17 00:00:00 2001 From: Aninda Chatterjee Date: Mon, 14 Oct 2024 20:45:59 +0530 Subject: [PATCH 01/11] aninchat added srlinux-asymmetric blog post --- .../posts/2024/srlinux-asymmetric-routing.md | 1687 +++++++++++++++++ 1 file changed, 1687 insertions(+) create mode 100644 docs/blog/posts/2024/srlinux-asymmetric-routing.md diff --git a/docs/blog/posts/2024/srlinux-asymmetric-routing.md b/docs/blog/posts/2024/srlinux-asymmetric-routing.md new file mode 100644 index 00000000..75746a1d --- /dev/null +++ b/docs/blog/posts/2024/srlinux-asymmetric-routing.md @@ -0,0 +1,1687 @@ +--- +date: 2024-10-14 +tags: + - bgp + - evpn +authors: + - aninda +--- +# Asymmetric routing with SR Linux in EVPN VXLAN fabrics +This post dives deeper into the asymmetric routing model on SR Linux. The topology in use is a 3-stage Clos fabric with BGP EVPN and VXLAN, with host h1 single-homed to leaf1, h2 dual-homed to leaf2 and leaf3 and h3 single-homed to leaf4. Hosts h1 and h2 are in the same subnet, 172.16.10.0/24 while h3 is in a different subnet, 172.16.20.0/24. Thus, this post demonstrates Layer 2 extension over a routed fabric as well as how Layer 3 services are deployed over the same fabric, with an asymmetric routing model. + +The physical topology is shown below: + + + + + + +The Containerlab file used for this is shown below: + +``` +name: srlinux-asymmetric-routing + +topology: + nodes: + spine1: + kind: nokia_srlinux + image: ghcr.io/nokia/srlinux + spine2: + kind: nokia_srlinux + image: ghcr.io/nokia/srlinux + leaf1: + kind: nokia_srlinux + image: ghcr.io/nokia/srlinux + leaf2: + kind: nokia_srlinux + image: ghcr.io/nokia/srlinux + leaf3: + kind: nokia_srlinux + image: ghcr.io/nokia/srlinux + leaf4: + kind: nokia_srlinux + image: ghcr.io/nokia/srlinux + h1: + kind: linux + image: ghcr.io/srl-labs/network-multitool + exec: + - ip addr add 172.16.10.1/24 dev eth1 + - ip route add 172.16.20.0/24 via 172.16.10.254 + h2: + kind: linux + image: ghcr.io/srl-labs/network-multitool + exec: + - ip link add bond0 type bond mode 802.3ad + - ip link set eth1 down + - ip link set eth2 down + - ip link set eth1 master bond0 + - ip link set eth2 master bond0 + - ip addr add 172.16.10.2/24 dev bond0 + - ip link set eth1 up + - ip link set eth2 up + - ip link set bond0 up + - ip route add 172.16.20.0/24 via 172.16.10.254 + h3: + kind: linux + image: ghcr.io/srl-labs/network-multitool + exec: + - ip addr add 172.16.20.3/24 dev eth1 + - ip route add 172.16.10.0/24 via 172.16.20.254 + links: + - endpoints: ["leaf1:e1-1", "spine1:e1-1"] + - endpoints: ["leaf1:e1-2", "spine2:e1-1"] + - endpoints: ["leaf2:e1-1", "spine1:e1-2"] + - endpoints: ["leaf2:e1-2", "spine2:e1-2"] + - endpoints: ["leaf3:e1-1", "spine1:e1-3"] + - endpoints: ["leaf3:e1-2", "spine2:e1-3"] + - endpoints: ["leaf4:e1-1", "spine1:e1-4"] + - endpoints: ["leaf4:e1-2", "spine2:e1-4"] + - endpoints: ["leaf1:e1-3", "h1:eth1"] + - endpoints: ["leaf2:e1-3", "h2:eth1"] + - endpoints: ["leaf3:e1-3", "h2:eth2"] + - endpoints: ["leaf4:e1-3", "h3:eth1"] +``` + +???+ note + The host (image used is `ghcr.io/srl-labs/network-multitool`) login credentials are user/multit00l. + +The end goal of this post is to ensure that host h1 can communicate with both h2 (same subnet) and h3 (different subnet) using an asymmetric routing model. To that end, the following IPv4 addressing is used (with the IRB addressing following a distributed, anycast model): + +| Resource | IPv4 scope | +| :------------------------------------:|:---------------------------:| +| Underlay | 198.51.100.0/24 | +| `system0` interface | 192.0.2.0/24 | +| VNI 10010 | 172.16.10.0/24 | +| VNI 10020 | 172.16.20.0/24 | +| host h1 | 172.16.10.1/24 | +| host h2 | 172.16.10.2/24 | +| host h3 | 172.16.20.3/24 | +| `irb0.10` interface | 172.16.10.254/24 | +| `irb0.20` interface | 172.16.20.254/24 | + +## Reviewing the asymmetric routing model + +When routing between VNIs, in a VXLAN fabric, there are two major routing models that can be used - asymmetric and symmetric. Asymmetric routing, which is the focus of this post, uses a *`bridge-route-bridge`* model, implying that the ingress leaf bridges the packet into the Layer 2 domain, routes it from one VLAN/VNI to another and then bridges the packet across the VXLAN fabric to the destination. + +Such a design naturally implies that both the source and the destination IRBs (and the corresponding Layer 2 domains and bridge tables) must exist on all leafs hosting servers that need to communicate with each other. While this increases the operational state on the leafs themselves (ARP state and MAC address state is stored everywhere), it does offer configuration and operational simplicity. + + + ESI : 00:00:00:00:00:00:00:00:00:00 + Label : 10020 + Route source : neighbor 198.51.100.1 (last modified 4d18h49m3s ago) + Route preference : No MED, No LocalPref + Atomic Aggr : false + BGP next-hop : 192.0.2.14 + AS Path : i [65500, 65414] + Communities : [target:20:20, bgp-tunnel-encap:VXLAN] + RR Attributes : No Originator-ID, Cluster-List is [] + Aggregation : None + Unknown Attr : None + Invalid Reason : None + Tie Break Reason : none + Path 1 was advertised to (Modified Attributes): + [ 198.51.100.3 ] + Route preference : No MED, No LocalPref + Atomic Aggr : false + BGP next-hop : 192.0.2.14 + AS Path : i [65411, 65500, 65414] + Communities : [target:20:20, bgp-tunnel-encap:VXLAN] + RR Attributes : No Originator-ID, Cluster-List is [] + Aggregation : None + Unknown Attr : None +--------------------------------------------------------------------------------------------------------------------------- +Route Distinguisher: 192.0.2.14:2 +Tag-ID : 0 +MAC address : AA:C1:AB:9F:EF:E2 +IP Address : 172.16.20.3 +neighbor : 198.51.100.3 +Received paths : 1 + Path 1: + ESI : 00:00:00:00:00:00:00:00:00:00 + Label : 10020 + Route source : neighbor 198.51.100.3 (last modified 4d18h49m0s ago) + Route preference : No MED, No LocalPref + Atomic Aggr : false + BGP next-hop : 192.0.2.14 + AS Path : i [65500, 65414] + Communities : [target:20:20, bgp-tunnel-encap:VXLAN] + RR Attributes : No Originator-ID, Cluster-List is [] + Aggregation : None + Unknown Attr : None + Invalid Reason : None + Tie Break Reason : peer-router-id +--------------------------------------------------------------------------------------------------------------------------- +--{ + running }--[ ]-- +``` + +This is an important step for asymmetric routing. Consider a situation where host h1 wants to communicate with h3. When the IP packet hits leaf1, it will attempt to resolve the destination IP address via an ARP request, as it is directly attached locally (via the `irb.20` interface), as shown below. + +``` +--{ + running }--[ ]-- +A:leaf1# show network-instance default route-table ipv4-unicast prefix 172.16.20.0/24 +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +IPv4 unicast route table of network instance default +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ++---------------------------+-------+------------+----------------------+----------+----------+---------+------------+-----------------+-----------------+-----------------+----------------------+ +| Prefix | ID | Route Type | Route Owner | Active | Origin | Metric | Pref | Next-hop (Type) | Next-hop | Backup Next-hop | Backup Next-hop | +| | | | | | Network | | | | Interface | (Type) | Interface | +| | | | | | Instance | | | | | | | ++===========================+=======+============+======================+==========+==========+=========+============+=================+=================+=================+======================+ +| 172.16.20.0/24 | 10 | local | net_inst_mgr | True | default | 0 | 0 | 172.16.20.254 | irb0.20 | | | +| | | | | | | | | (direct) | | | | ++---------------------------+-------+------------+----------------------+----------+----------+---------+------------+-----------------+-----------------+-----------------+----------------------+ +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +--{ + running }--[ ]-- +``` + +Since this IRB interface exists on leaf4 as well, the ARP reply will be consumed by it, never reaching leaf1, and thus, creating a failure in the ARP process. To circumvent this problem associated with an anycast, distributed IRB model, the EVPN Type-2 MAC+IP routes are used to populate the ARP cache. In addition to this, optionally, this EVPN-learnt ARP entry can also be used to inject a host route (/32 for IPv4 and /128 for IPv6) into the routing table using the `arp host-route populate evpn` configuration option (as discussed earlier). Since this is enabled in our case, we can confirm that the route 172.16.20.3/32 exists in the routing table, inserted by the arp_nd_mgr process: + +``` +--{ + running }--[ ]-- +A:leaf1# show network-instance default route-table ipv4-unicast prefix 172.16.20.3/32 +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +IPv4 unicast route table of network instance default +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ++---------------------------+-------+------------+----------------------+----------+----------+---------+------------+-----------------+-----------------+-----------------+----------------------+ +| Prefix | ID | Route Type | Route Owner | Active | Origin | Metric | Pref | Next-hop (Type) | Next-hop | Backup Next-hop | Backup Next-hop | +| | | | | | Network | | | | Interface | (Type) | Interface | +| | | | | | Instance | | | | | | | ++===========================+=======+============+======================+==========+==========+=========+============+=================+=================+=================+======================+ +| 172.16.20.3/32 | 10 | arp-nd | arp_nd_mgr | True | default | 0 | 1 | 172.16.20.3 | irb0.20 | | | +| | | | | | | | | (direct) | | | | ++---------------------------+-------+------------+----------------------+----------+----------+---------+------------+-----------------+-----------------+-----------------+----------------------+ +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +--{ + running }--[ ]-- +``` + +???+ note + + The `arp host-route populate evpn` configuration option is purely a design choice. Since a routing lookup is based on the longest-prefix-match logic (where the longest prefix wins), the existence of the host routes ensure that when there is a routing lookup for the destination, the host route is selected instead of falling back to the subnet route, which relies on ARP resolution, making the forwarding process more efficient. However, this also implies that a host route is created for every EVPN-learnt ARP entry, which can lead to a large routing table, potentially creating an issue in large-scale fabrics. + +Let's consider two flows to understand the data plane forwarding in such a design - host h1 communicating with h2 (same subnet) and h1 communicating with h3 (different subnet). + +Since h1 is in the same subnet as h2, when communicating with h2, h1 will try to resolve its IP address directly via an ARP request. This is received on leaf1 and leaked to the CPU via `irb0.10`. Since L2 proxy-arp is not enabled, the `arp_nd_mgr` process picks up the ARP request and responds back using its own anycast gateway MAC address while suppressing the ARP request from being flooded in the fabric. A packet capture of this ARP reply is shown below. + + Date: Mon, 14 Oct 2024 18:49:37 +0300 Subject: [PATCH 02/11] style fixup --- .../posts/2024/srlinux-asymmetric-routing.md | 2204 +++++++++-------- 1 file changed, 1166 insertions(+), 1038 deletions(-) diff --git a/docs/blog/posts/2024/srlinux-asymmetric-routing.md b/docs/blog/posts/2024/srlinux-asymmetric-routing.md index 75746a1d..a4877cec 100644 --- a/docs/blog/posts/2024/srlinux-asymmetric-routing.md +++ b/docs/blog/posts/2024/srlinux-asymmetric-routing.md @@ -6,41 +6,42 @@ tags: authors: - aninda --- -# Asymmetric routing with SR Linux in EVPN VXLAN fabrics -This post dives deeper into the asymmetric routing model on SR Linux. The topology in use is a 3-stage Clos fabric with BGP EVPN and VXLAN, with host h1 single-homed to leaf1, h2 dual-homed to leaf2 and leaf3 and h3 single-homed to leaf4. Hosts h1 and h2 are in the same subnet, 172.16.10.0/24 while h3 is in a different subnet, 172.16.20.0/24. Thus, this post demonstrates Layer 2 extension over a routed fabric as well as how Layer 3 services are deployed over the same fabric, with an asymmetric routing model. -The physical topology is shown below: +# Asymmetric routing with SR Linux in EVPN VXLAN fabrics + +This post dives deeper into the asymmetric routing model on SR Linux. +The topology in use is a 3-stage Clos fabric with BGP EVPN and VXLAN, with + +* host `h1` single-homed to `leaf1` +* `h2` dual-homed to leaf2 and `leaf3` +* and `h3` single-homed to `leaf4`. + +Hosts h1 and h2 are in the same subnet, 172.16.10.0/24 while h3 is in a different subnet, 172.16.20.0/24. Thus, this post demonstrates Layer 2 extension over a routed fabric as well as how Layer 3 services are deployed over the same fabric, with an asymmetric routing model. - +The physical topology is shown below: +![](https://gitlab.com/aninchat1/images/-/wikis/uploads/1d3750d935d534973fc913e3a3a68c49/srlinux-asymmetric-1.png){.img-shadow} -The Containerlab file used for this is shown below: +The Containerlab topology file used for this is shown below: -``` +```{.yaml .code-scroll-lg} name: srlinux-asymmetric-routing +prefix: "" topology: + defaults: + kind: nokia_srlinux + image: ghcr.io/nokia/srlinux:24.7.1 nodes: spine1: - kind: nokia_srlinux - image: ghcr.io/nokia/srlinux spine2: - kind: nokia_srlinux - image: ghcr.io/nokia/srlinux leaf1: - kind: nokia_srlinux - image: ghcr.io/nokia/srlinux leaf2: - kind: nokia_srlinux - image: ghcr.io/nokia/srlinux leaf3: - kind: nokia_srlinux - image: ghcr.io/nokia/srlinux leaf4: - kind: nokia_srlinux - image: ghcr.io/nokia/srlinux + h1: kind: linux image: ghcr.io/srl-labs/network-multitool @@ -50,7 +51,7 @@ topology: h2: kind: linux image: ghcr.io/srl-labs/network-multitool - exec: + exec: - ip link add bond0 type bond mode 802.3ad - ip link set eth1 down - ip link set eth2 down @@ -82,705 +83,770 @@ topology: - endpoints: ["leaf4:e1-3", "h3:eth1"] ``` -???+ note - The host (image used is `ghcr.io/srl-labs/network-multitool`) login credentials are user/multit00l. +/// admonition | Credentials + type: subtle-note +As usual, Nokia SR Linux nodes can be accessed with `admin:NokiaSrl1!` credentials and the host nodes use `user:multit00l` creds. +/// The end goal of this post is to ensure that host h1 can communicate with both h2 (same subnet) and h3 (different subnet) using an asymmetric routing model. To that end, the following IPv4 addressing is used (with the IRB addressing following a distributed, anycast model): -| Resource | IPv4 scope | -| :------------------------------------:|:---------------------------:| -| Underlay | 198.51.100.0/24 | -| `system0` interface | 192.0.2.0/24 | -| VNI 10010 | 172.16.10.0/24 | -| VNI 10020 | 172.16.20.0/24 | -| host h1 | 172.16.10.1/24 | -| host h2 | 172.16.10.2/24 | -| host h3 | 172.16.20.3/24 | -| `irb0.10` interface | 172.16.10.254/24 | -| `irb0.20` interface | 172.16.20.254/24 | +| Resource | IPv4 scope | +| :-----------------: | :--------------: | +| Underlay | 198.51.100.0/24 | +| `system0` interface | 192.0.2.0/24 | +| VNI 10010 | 172.16.10.0/24 | +| VNI 10020 | 172.16.20.0/24 | +| host h1 | 172.16.10.1/24 | +| host h2 | 172.16.10.2/24 | +| host h3 | 172.16.20.3/24 | +| `irb0.10` interface | 172.16.10.254/24 | +| `irb0.20` interface | 172.16.20.254/24 | ## Reviewing the asymmetric routing model -When routing between VNIs, in a VXLAN fabric, there are two major routing models that can be used - asymmetric and symmetric. Asymmetric routing, which is the focus of this post, uses a *`bridge-route-bridge`* model, implying that the ingress leaf bridges the packet into the Layer 2 domain, routes it from one VLAN/VNI to another and then bridges the packet across the VXLAN fabric to the destination. +When routing between VNIs, in a VXLAN fabric, there are two major routing models that can be used - asymmetric and symmetric. Asymmetric routing, which is the focus of this post, uses a *`bridge-route-bridge`* model, implying that the ingress leaf bridges the packet into the Layer 2 domain, routes it from one VLAN/VNI to another and then bridges the packet across the VXLAN fabric to the destination. Such a design naturally implies that both the source and the destination IRBs (and the corresponding Layer 2 domains and bridge tables) must exist on all leafs hosting servers that need to communicate with each other. While this increases the operational state on the leafs themselves (ARP state and MAC address state is stored everywhere), it does offer configuration and operational simplicity. - Date: Mon, 14 Oct 2024 22:37:41 +0530 Subject: [PATCH 03/11] fix server naming --- .../posts/2024/srlinux-asymmetric-routing.md | 50 +++++++++---------- 1 file changed, 25 insertions(+), 25 deletions(-) diff --git a/docs/blog/posts/2024/srlinux-asymmetric-routing.md b/docs/blog/posts/2024/srlinux-asymmetric-routing.md index a4877cec..92d03a01 100644 --- a/docs/blog/posts/2024/srlinux-asymmetric-routing.md +++ b/docs/blog/posts/2024/srlinux-asymmetric-routing.md @@ -12,11 +12,11 @@ authors: This post dives deeper into the asymmetric routing model on SR Linux. The topology in use is a 3-stage Clos fabric with BGP EVPN and VXLAN, with -* host `h1` single-homed to `leaf1` -* `h2` dual-homed to leaf2 and `leaf3` -* and `h3` single-homed to `leaf4`. +* server `s1` single-homed to `leaf1` +* `s2` dual-homed to leaf2 and `leaf3` +* and `s3` single-homed to `leaf4`. -Hosts h1 and h2 are in the same subnet, 172.16.10.0/24 while h3 is in a different subnet, 172.16.20.0/24. Thus, this post demonstrates Layer 2 extension over a routed fabric as well as how Layer 3 services are deployed over the same fabric, with an asymmetric routing model. +Servers s1 and s2 are in the same subnet, 172.16.10.0/24 while s3 is in a different subnet, 172.16.20.0/24. Thus, this post demonstrates Layer 2 extension over a routed fabric as well as how Layer 3 services are deployed over the same fabric, with an asymmetric routing model. The physical topology is shown below: @@ -42,13 +42,13 @@ topology: leaf3: leaf4: - h1: + s1: kind: linux image: ghcr.io/srl-labs/network-multitool exec: - ip addr add 172.16.10.1/24 dev eth1 - ip route add 172.16.20.0/24 via 172.16.10.254 - h2: + s2: kind: linux image: ghcr.io/srl-labs/network-multitool exec: @@ -62,7 +62,7 @@ topology: - ip link set eth2 up - ip link set bond0 up - ip route add 172.16.20.0/24 via 172.16.10.254 - h3: + s3: kind: linux image: ghcr.io/srl-labs/network-multitool exec: @@ -77,18 +77,18 @@ topology: - endpoints: ["leaf3:e1-2", "spine2:e1-3"] - endpoints: ["leaf4:e1-1", "spine1:e1-4"] - endpoints: ["leaf4:e1-2", "spine2:e1-4"] - - endpoints: ["leaf1:e1-3", "h1:eth1"] - - endpoints: ["leaf2:e1-3", "h2:eth1"] - - endpoints: ["leaf3:e1-3", "h2:eth2"] - - endpoints: ["leaf4:e1-3", "h3:eth1"] + - endpoints: ["leaf1:e1-3", "s1:eth1"] + - endpoints: ["leaf2:e1-3", "s2:eth1"] + - endpoints: ["leaf3:e1-3", "s2:eth2"] + - endpoints: ["leaf4:e1-3", "s3:eth1"] ``` /// admonition | Credentials type: subtle-note -As usual, Nokia SR Linux nodes can be accessed with `admin:NokiaSrl1!` credentials and the host nodes use `user:multit00l` creds. +As usual, Nokia SR Linux nodes can be accessed with `admin:NokiaSrl1!` credentials and the host nodes use `user:multit00l`. /// -The end goal of this post is to ensure that host h1 can communicate with both h2 (same subnet) and h3 (different subnet) using an asymmetric routing model. To that end, the following IPv4 addressing is used (with the IRB addressing following a distributed, anycast model): +The end goal of this post is to ensure that server s1 can communicate with both s2 (same subnet) and s3 (different subnet) using an asymmetric routing model. To that end, the following IPv4 addressing is used (with the IRB addressing following a distributed, anycast model): | Resource | IPv4 scope | | :-----------------: | :--------------: | @@ -96,9 +96,9 @@ The end goal of this post is to ensure that host h1 can communicate with both h2 | `system0` interface | 192.0.2.0/24 | | VNI 10010 | 172.16.10.0/24 | | VNI 10020 | 172.16.20.0/24 | -| host h1 | 172.16.10.1/24 | -| host h2 | 172.16.10.2/24 | -| host h3 | 172.16.20.3/24 | +| server s1 | 172.16.10.1/24 | +| server s2 | 172.16.10.2/24 | +| server s3 | 172.16.20.3/24 | | `irb0.10` interface | 172.16.10.254/24 | | `irb0.20` interface | 172.16.20.254/24 | @@ -850,7 +850,7 @@ Similar to how ranges can be used to pull configuration state from multiple inte ### Host connectivity and ESI LAG -With BGP configured, we can start to deploy the connectivity to the servers and configure the necessary VXLAN constructs for end-to-end connectivity. The interfaces, to the servers, are configured as untagged interfaces. Since host h2 is multi-homed to leaf2 and leaf3, this segment is configured as an ESI LAG. This includes: +With BGP configured, we can start to deploy the connectivity to the servers and configure the necessary VXLAN constructs for end-to-end connectivity. The interfaces, to the servers, are configured as untagged interfaces. Since host s2 is multi-homed to leaf2 and leaf3, this segment is configured as an ESI LAG. This includes: 1. Mapping the physical interface to a LAG interface (`lag1`, in this case). 2. The LAG interface configured with the required LACP properties - mode `active` and a system-mac of `00:00:00:00:23:23`. This LAG interface is also configured with a subinterface of type `bridged`. @@ -1017,7 +1017,7 @@ A:leaf4# info interface ethernet-1/3 ### VXLAN tunnel interfaces -On each leaf, VXLAN tunnel-interfaces are created next. In this case, two logical interfaces are created, one for VNI 10010 and another for VNI 10020 (since this is asymmetric routing, all VNIs must exist on all leafs that want to route between the respective VNIs). Since the end-goal is to have host h1 communicate with h2 and h3, only leaf1 and leaf4 are configured with VNI 10020 as well, while leaf2 and leaf3 are only configured with VNI 10010. +On each leaf, VXLAN tunnel-interfaces are created next. In this case, two logical interfaces are created, one for VNI 10010 and another for VNI 10020 (since this is asymmetric routing, all VNIs must exist on all leafs that want to route between the respective VNIs). Since the end-goal is to have server s1 communicate with s2 and s3, only leaf1 and leaf4 are configured with VNI 10020 as well, while leaf2 and leaf3 are only configured with VNI 10010. /// tab | leaf1 @@ -1563,7 +1563,7 @@ This completes the configuration walkthrough section of this post. Next, we'll c When the hosts come online, they typically send a GARP to ensure there is no duplicate IP address in their broadcast domain. This enables the locally attached leafs to learn the IP-to-MAC binding and build an ARP entry in the ARP cache table (since the `arp learn-unsolicited` configuration option is set to `true`). This, in turn, is advertised as an EVPN Type-2 MAC+IP route for remote leafs to learn this as well and eventually insert the IP-to-MAC binding as an entry in their ARP caches. -On leaf1, we can confirm that it has learnt the IP-to-MAC binding for host h1 (locally attached) and h3 (attached to remote leaf, leaf4). +On leaf1, we can confirm that it has learnt the IP-to-MAC binding for server s1 (locally attached) and s3 (attached to remote leaf, leaf4). ```srl A:leaf1# show arpnd arp-entries interface irb0 @@ -1579,7 +1579,7 @@ A:leaf1# show arpnd arp-entries interface irb0 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ``` -The ARP entry for host h3 (172.16.20.3) is learnt via the EVPN Type-2 MAC+IP route received from leaf4, as shown below. +The ARP entry for host s3 (172.16.20.3) is learnt via the EVPN Type-2 MAC+IP route received from leaf4, as shown below. ```srl --{ + running }--[ ]-- @@ -1641,7 +1641,7 @@ Received paths : 1 --------------------------------------------------------------------------------------------------------------------------- ``` -This is an important step for asymmetric routing. Consider a situation where host h1 wants to communicate with h3. When the IP packet hits leaf1, it will attempt to resolve the destination IP address via an ARP request, as it is directly attached locally (via the `irb.20` interface), as shown below. +This is an important step for asymmetric routing. Consider a situation where server s1 wants to communicate with s3. When the IP packet hits leaf1, it will attempt to resolve the destination IP address via an ARP request, as it is directly attached locally (via the `irb.20` interface), as shown below. ```srl --{ + running }--[ ]-- @@ -1688,13 +1688,13 @@ IPv4 unicast route table of network instance default The `arp host-route populate evpn` configuration option is purely a design choice. Since a routing lookup is based on the longest-prefix-match logic (where the longest prefix wins), the existence of the host routes ensure that when there is a routing lookup for the destination, the host route is selected instead of falling back to the subnet route, which relies on ARP resolution, making the forwarding process more efficient. However, this also implies that a host route is created for every EVPN-learnt ARP entry, which can lead to a large routing table, potentially creating an issue in large-scale fabrics. /// -Let's consider two flows to understand the data plane forwarding in such a design - host h1 communicating with h2 (same subnet) and h1 communicating with h3 (different subnet). +Let's consider two flows to understand the data plane forwarding in such a design - server s1 communicating with s2 (same subnet) and s1 communicating with s3 (different subnet). -Since h1 is in the same subnet as h2, when communicating with h2, h1 will try to resolve its IP address directly via an ARP request. This is received on leaf1 and leaked to the CPU via `irb0.10`. Since L2 proxy-arp is not enabled, the `arp_nd_mgr` process picks up the ARP request and responds back using its own anycast gateway MAC address while suppressing the ARP request from being flooded in the fabric. A packet capture of this ARP reply is shown below. +Since s1 is in the same subnet as s2, when communicating with s2, s1 will try to resolve its IP address directly via an ARP request. This is received on leaf1 and leaked to the CPU via `irb0.10`. Since L2 proxy-arp is not enabled, the `arp_nd_mgr` process picks up the ARP request and responds back using its own anycast gateway MAC address while suppressing the ARP request from being flooded in the fabric. A packet capture of this ARP reply is shown below. ![](https://gitlab.com/aninchat1/images/-/wikis/uploads/bc7ebec1d9e45487dead1d77849f09c2/srlinux-asymmetric-4.png){.img-shadow} -Once this ARP process completes, host h1 generates an ICMP request (since we are testing communication between hosts using the `ping` tool). When this IP packet arrives on leaf1, it does a routing lookup (since the destination MAC address is owned by itself) and this routing lookup will either hit the 172.16.10.0/24 prefix or the more-specific 172.16.10.2/32 prefix (installed from the ARP entry via the EVPN Type-2 MAC+IP route), as shown below. Since this is a directly attached route, it is further resolved into a MAC address via the ARP table and then the packet is bridged towards the destination. This MAC address points to an Ethernet Segment, which in turn resolves into VTEPs 192.0.2.12 and 192.0.2.13. +Once this ARP process completes, host s1 generates an ICMP request (since we are testing communication between hosts using the `ping` tool). When this IP packet arrives on leaf1, it does a routing lookup (since the destination MAC address is owned by itself) and this routing lookup will either hit the 172.16.10.0/24 prefix or the more-specific 172.16.10.2/32 prefix (installed from the ARP entry via the EVPN Type-2 MAC+IP route), as shown below. Since this is a directly attached route, it is further resolved into a MAC address via the ARP table and then the packet is bridged towards the destination. This MAC address points to an Ethernet Segment, which in turn resolves into VTEPs 192.0.2.12 and 192.0.2.13. ```srl A:leaf1# show network-instance default route-table ipv4-unicast route 172.16.10.2 @@ -1761,7 +1761,7 @@ A packet capture of the in-flight packet (as leaf1 sends it to spine1) is shown ![](https://gitlab.com/aninchat1/images/-/wikis/uploads/2aba126b6ddb1c4c37d4be11d125c1c6/srlinux-asymmetric-5.png){.img-shadow} -The communication between host h1 and h3 follows a similar pattern - the packet is received in macvrf1, mapped VNI 10010, and since the destination MAC address is the anycast MAC address owned by leaf1, it is then routed locally into VNI 10020 (since `irb0.20` is locally attached) and then bridged across to the destination, as confirmed below: +The communication between host s1 and s3 follows a similar pattern - the packet is received in macvrf1, mapped VNI 10010, and since the destination MAC address is the anycast MAC address owned by leaf1, it is then routed locally into VNI 10020 (since `irb0.20` is locally attached) and then bridged across to the destination, as confirmed below: ```srl --{ + running }--[ ]-- From 2223432bf3df556ad40d4c2fbab1cf8745d53b70 Mon Sep 17 00:00:00 2001 From: Roman Dodin Date: Mon, 14 Oct 2024 22:22:39 +0300 Subject: [PATCH 04/11] remove dangling prompts and force scroll size for large code blocks --- .../posts/2024/srlinux-asymmetric-routing.md | 102 ++++++++---------- 1 file changed, 44 insertions(+), 58 deletions(-) diff --git a/docs/blog/posts/2024/srlinux-asymmetric-routing.md b/docs/blog/posts/2024/srlinux-asymmetric-routing.md index 92d03a01..be144b19 100644 --- a/docs/blog/posts/2024/srlinux-asymmetric-routing.md +++ b/docs/blog/posts/2024/srlinux-asymmetric-routing.md @@ -104,7 +104,7 @@ The end goal of this post is to ensure that server s1 can communicate with both ## Reviewing the asymmetric routing model -When routing between VNIs, in a VXLAN fabric, there are two major routing models that can be used - asymmetric and symmetric. Asymmetric routing, which is the focus of this post, uses a *`bridge-route-bridge`* model, implying that the ingress leaf bridges the packet into the Layer 2 domain, routes it from one VLAN/VNI to another and then bridges the packet across the VXLAN fabric to the destination. +When routing between VNIs, in a VXLAN fabric, there are two major routing models that can be used - asymmetric and symmetric. Asymmetric routing, which is the focus of this post, uses a `bridge-route-bridge` model, implying that the ingress leaf bridges the packet into the Layer 2 domain, routes it from one VLAN/VNI to another and then bridges the packet across the VXLAN fabric to the destination. Such a design naturally implies that both the source and the destination IRBs (and the corresponding Layer 2 domains and bridge tables) must exist on all leafs hosting servers that need to communicate with each other. While this increases the operational state on the leafs themselves (ARP state and MAC address state is stored everywhere), it does offer configuration and operational simplicity. @@ -120,7 +120,7 @@ The underlay of the fabric includes the physically connected point-to-point inte /// tab | leaf1 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf1# info interface ethernet-1/{1,2} interface ethernet-1/1 { @@ -153,7 +153,7 @@ A:leaf1# info interface ethernet-1/{1,2} /// tab | leaf2 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf2# info interface ethernet-1/{1,2} interface ethernet-1/1 { @@ -186,7 +186,7 @@ A:leaf2# info interface ethernet-1/{1,2} /// tab | leaf3 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf3# info interface ethernet-1/{1,2} interface ethernet-1/1 { @@ -219,7 +219,7 @@ A:leaf3# info interface ethernet-1/{1,2} /// tab | leaf4 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf4# info interface ethernet-1/{1,2} interface ethernet-1/1 { @@ -252,7 +252,7 @@ A:leaf4# info interface ethernet-1/{1,2} /// tab | spine1 -```srl +```{.srl .code-scroll-lg} A:spine1# info interface ethernet-1/{1..4} interface ethernet-1/1 { admin-state enable @@ -308,7 +308,7 @@ A:spine1# info interface ethernet-1/{1..4} /// tab | spine2 -```srl +```{.srl .code-scroll-lg} A:spine2# info interface ethernet-1/{1..4} interface ethernet-1/1 { admin-state enable @@ -382,7 +382,7 @@ The BGP configuration from all nodes is shown below: /// tab | leaf1 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf1# info network-instance default protocols bgp network-instance default { @@ -424,14 +424,13 @@ A:leaf1# info network-instance default protocols bgp } } } ---{ + running }--[ ]-- ``` /// /// tab | leaf2 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf2# info network-instance default protocols bgp network-instance default { @@ -473,14 +472,13 @@ A:leaf2# info network-instance default protocols bgp } } } ---{ + running }--[ ]-- ``` /// /// tab | leaf3 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf3# info network-instance default protocols bgp network-instance default { @@ -522,14 +520,13 @@ A:leaf3# info network-instance default protocols bgp } } } ---{ + running }--[ ]-- ``` /// /// tab | leaf4 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf4# info network-instance default protocols bgp network-instance default { @@ -571,14 +568,13 @@ A:leaf4# info network-instance default protocols bgp } } } ---{ + running }--[ ]-- ``` /// /// tab | spine1 -```srl +```{.srl .code-scroll-lg} --{ running }--[ ]-- A:spine1# info network-instance default protocols bgp network-instance default { @@ -629,14 +625,13 @@ A:spine1# info network-instance default protocols bgp } } } ---{ + running }--[ ]-- ``` /// /// tab | spine2 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:spine2# info network-instance default protocols bgp network-instance default { @@ -687,7 +682,6 @@ A:spine2# info network-instance default protocols bgp } } } ---{ + running }--[ ]-- ``` /// @@ -709,7 +703,7 @@ The configuration of the routing policies used for export and import of BGP rout /// tab | leaf1 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf1# info routing-policy policy spine-* routing-policy { @@ -771,7 +765,7 @@ A:leaf1# info routing-policy policy spine-* /// tab | spine1 -```srl +```{.srl .code-scroll-lg} --{ running }--[ ]-- A:spine1# info routing-policy policy leaf-* routing-policy { @@ -858,7 +852,7 @@ With BGP configured, we can start to deploy the connectivity to the servers and /// tab | leaf1 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf1# info interface ethernet-1/3 interface ethernet-1/3 { @@ -877,7 +871,7 @@ A:leaf1# info interface ethernet-1/3 /// tab | leaf2 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf2# info interface ethernet-1/3 interface ethernet-1/3 { @@ -889,7 +883,7 @@ A:leaf2# info interface ethernet-1/3 --{ + running }--[ ]-- ``` -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf2# info interface lag1 interface lag1 { @@ -910,7 +904,7 @@ A:leaf2# info interface lag1 --{ + running }--[ ]-- ``` -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf2# info system network-instance protocols evpn system { @@ -939,7 +933,7 @@ A:leaf2# info system network-instance protocols evpn /// tab | leaf3 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf3# info interface ethernet-1/3 interface ethernet-1/3 { @@ -951,7 +945,7 @@ A:leaf3# info interface ethernet-1/3 --{ + running }--[ ]-- ``` -```srl +```{.srl .code-scroll-lg} A:leaf3# info interface lag1 interface lag1 { admin-state enable @@ -971,7 +965,7 @@ A:leaf3# info interface lag1 --{ + running }--[ ]-- ``` -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf3# info system network-instance protocols evpn system { @@ -1000,7 +994,7 @@ A:leaf3# info system network-instance protocols evpn /// tab | leaf4 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf4# info interface ethernet-1/3 interface ethernet-1/3 { @@ -1021,7 +1015,7 @@ On each leaf, VXLAN tunnel-interfaces are created next. In this case, two logica /// tab | leaf1 -```srl +```{.srl .code-scroll-lg} A:leaf1# info tunnel-interface * tunnel-interface vxlan1 { vxlan-interface 1 { @@ -1043,7 +1037,7 @@ A:leaf1# info tunnel-interface * /// tab | leaf2 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf2# info tunnel-interface * tunnel-interface vxlan1 { @@ -1060,7 +1054,7 @@ A:leaf2# info tunnel-interface * /// tab | leaf3 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf3# info tunnel-interface * tunnel-interface vxlan1 { @@ -1077,7 +1071,7 @@ A:leaf3# info tunnel-interface * /// tab | leaf4 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf4# info tunnel-interface * tunnel-interface vxlan1 { @@ -1104,7 +1098,7 @@ IRBs are deployed using an anycast, distributed gateway model, implying that all /// tab | leaf1 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf1# info interface irb0 interface irb0 { @@ -1167,7 +1161,7 @@ A:leaf1# info interface irb0 /// tab | leaf2 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf2# info interface irb0 interface irb0 { @@ -1199,14 +1193,13 @@ A:leaf2# info interface irb0 } } } ---{ + running }--[ ]-- ``` /// /// tab | leaf3 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf2# info interface irb0 interface irb0 { @@ -1238,14 +1231,13 @@ A:leaf2# info interface irb0 } } } ---{ + running }--[ ]-- ``` /// /// tab | leaf4 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf2# info interface irb0 interface irb0 { @@ -1341,7 +1333,7 @@ Finally, MAC VRFs are created on the leafs to create a broadcast domain and corr /// tab | leaf1 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf1# info network-instance macvrf* network-instance macvrf1 { @@ -1410,7 +1402,7 @@ A:leaf1# info network-instance macvrf* /// tab | leaf2 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf2# info network-instance macvrf1 network-instance macvrf1 { @@ -1450,7 +1442,7 @@ A:leaf2# info network-instance macvrf1 /// tab | leaf3 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf3# info network-instance macvrf1 network-instance macvrf1 { @@ -1490,7 +1482,7 @@ A:leaf3# info network-instance macvrf1 /// tab | leaf4 -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf4# info network-instance macvrf* network-instance macvrf1 { @@ -1565,7 +1557,7 @@ When the hosts come online, they typically send a GARP to ensure there is no dup On leaf1, we can confirm that it has learnt the IP-to-MAC binding for server s1 (locally attached) and s3 (attached to remote leaf, leaf4). -```srl +```{.srl .code-scroll-lg} A:leaf1# show arpnd arp-entries interface irb0 +-------------------+-------------------+-----------------+-------------------+-------------------------------------+------------------------------------------------------------------------+ | Interface | Subinterface | Neighbor | Origin | Link layer address | Expiry | @@ -1581,7 +1573,7 @@ A:leaf1# show arpnd arp-entries interface irb0 The ARP entry for host s3 (172.16.20.3) is learnt via the EVPN Type-2 MAC+IP route received from leaf4, as shown below. -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf1# show network-instance default protocols bgp routes evpn route-type 2 ip-address 172.16.20.3 detail --------------------------------------------------------------------------------------------------------------------------- @@ -1643,7 +1635,7 @@ Received paths : 1 This is an important step for asymmetric routing. Consider a situation where server s1 wants to communicate with s3. When the IP packet hits leaf1, it will attempt to resolve the destination IP address via an ARP request, as it is directly attached locally (via the `irb.20` interface), as shown below. -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf1# show network-instance default route-table ipv4-unicast prefix 172.16.20.0/24 --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- @@ -1657,9 +1649,6 @@ IPv4 unicast route table of network instance default | 172.16.20.0/24 | 10 | local | net_inst_mgr | True | default | 0 | 0 | 172.16.20.254 | irb0.20 | | | | | | | | | | | | (direct) | | | | +---------------------------+-------+------------+----------------------+----------+----------+---------+------------+-----------------+-----------------+-----------------+----------------------+ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---{ + running }--[ ]-- ``` Since this IRB interface exists on leaf4 as well, the ARP reply will be consumed by it, never reaching leaf1, and thus, creating a failure in the ARP process. To circumvent this problem associated with an anycast, distributed IRB model, the EVPN Type-2 MAC+IP routes are used to populate the ARP cache. In addition to this, optionally, this EVPN-learnt ARP entry can also be used to inject a host route (/32 for IPv4 and /128 for IPv6) into the routing table using the `arp host-route populate evpn` configuration option (as discussed earlier). Since this is enabled in our case, we can confirm that the route 172.16.20.3/32 exists in the routing table, inserted by the arp_nd_mgr process: @@ -1678,9 +1667,6 @@ IPv4 unicast route table of network instance default | 172.16.20.3/32 | 10 | arp-nd | arp_nd_mgr | True | default | 0 | 1 | 172.16.20.3 | irb0.20 | | | | | | | | | | | | (direct) | | | | +---------------------------+-------+------------+----------------------+----------+----------+---------+------------+-----------------+-----------------+-----------------+----------------------+ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---{ + running }--[ ]-- ``` /// note @@ -1696,7 +1682,7 @@ Since s1 is in the same subnet as s2, when communicating with s2, s1 will try to Once this ARP process completes, host s1 generates an ICMP request (since we are testing communication between hosts using the `ping` tool). When this IP packet arrives on leaf1, it does a routing lookup (since the destination MAC address is owned by itself) and this routing lookup will either hit the 172.16.10.0/24 prefix or the more-specific 172.16.10.2/32 prefix (installed from the ARP entry via the EVPN Type-2 MAC+IP route), as shown below. Since this is a directly attached route, it is further resolved into a MAC address via the ARP table and then the packet is bridged towards the destination. This MAC address points to an Ethernet Segment, which in turn resolves into VTEPs 192.0.2.12 and 192.0.2.13. -```srl +```{.srl .code-scroll-lg} A:leaf1# show network-instance default route-table ipv4-unicast route 172.16.10.2 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- IPv4 unicast route table of network instance default @@ -1713,7 +1699,7 @@ IPv4 unicast route table of network instance default -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ``` -```srl +```{.srl .code-scroll-lg} --{ + candidate shared default }--[ ]-- A:leaf1# show arpnd arp-entries interface irb0 ipv4-address 172.16.10.2 +------------------+------------------+-----------------+------------------+-----------------------------------+--------------------------------------------------------------------+ @@ -1726,7 +1712,7 @@ A:leaf1# show arpnd arp-entries interface irb0 ipv4-address 172.16.10.2 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ``` -```srl +```{.srl .code-scroll-lg} --{ + candidate shared default }--[ ]-- A:leaf1# show network-instance macvrf1 bridge-table mac-table mac AA:C1:AB:11:BE:88 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- @@ -1744,7 +1730,7 @@ Hold down time remaining: N/A -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ``` -```srl +```{.srl .code-scroll-lg} --{ + candidate shared default }--[ ]-- A:leaf1# show tunnel-interface vxlan1 vxlan-interface 1 bridge-table unicast-destinations destination | grep -A 7 "Ethernet Segment Destinations" Ethernet Segment Destinations @@ -1763,7 +1749,7 @@ A packet capture of the in-flight packet (as leaf1 sends it to spine1) is shown The communication between host s1 and s3 follows a similar pattern - the packet is received in macvrf1, mapped VNI 10010, and since the destination MAC address is the anycast MAC address owned by leaf1, it is then routed locally into VNI 10020 (since `irb0.20` is locally attached) and then bridged across to the destination, as confirmed below: -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf1# show network-instance default route-table ipv4-unicast route 172.16.20.3 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- @@ -1781,7 +1767,7 @@ IPv4 unicast route table of network instance default -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ``` -```srl +```{.srl .code-scroll-lg} --{ + running }--[ ]-- A:leaf1# show network-instance * bridge-table mac-table mac AA:C1:AB:9F:EF:E2 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- From ee28c4f9451582f0273667e4598e944cb6262379 Mon Sep 17 00:00:00 2001 From: Roman Dodin Date: Mon, 14 Oct 2024 22:39:41 +0300 Subject: [PATCH 05/11] update ci steps --- .github/workflows/cicd.yml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/.github/workflows/cicd.yml b/.github/workflows/cicd.yml index f3a59e5f..ef58feba 100644 --- a/.github/workflows/cicd.yml +++ b/.github/workflows/cicd.yml @@ -18,7 +18,7 @@ jobs: - uses: actions/checkout@v4 - name: Login to GitHub Container Registry - uses: docker/login-action@v1 + uses: docker/login-action@v3 with: registry: ghcr.io username: ${{ github.actor }} @@ -81,7 +81,7 @@ jobs: fetch-depth: 0 # needed for commit authors plugin - name: Login to GitHub Container Registry - uses: docker/login-action@v1 + uses: docker/login-action@v3 with: registry: ghcr.io username: ${{ github.actor }} @@ -92,7 +92,7 @@ jobs: docker run --env CI=true --env MKDOCS_GIT_COMMITTERS_APIKEY=${{ secrets.GITHUB_TOKEN }} -v $(pwd):/docs --user $(id -u):$(id -g) --entrypoint mkdocs ghcr.io/srl-labs/mkdocs-material-insiders:$MKDOCS_MATERIAL_VER build - name: Publish to Cloudflare Pages - uses: cloudflare/wrangler-action@v3 + uses: cloudflare/wrangler-action@v3.8.0 id: wrangler-deploy with: apiToken: ${{ secrets.CLOUDFLARE_API_TOKEN }} From 9b6b7ef472db18321fff8ccd526ef5ed1b1ff469 Mon Sep 17 00:00:00 2001 From: Roman Dodin Date: Mon, 14 Oct 2024 22:43:17 +0300 Subject: [PATCH 06/11] set cf api env var --- .github/workflows/cicd.yml | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/.github/workflows/cicd.yml b/.github/workflows/cicd.yml index ef58feba..15e9f0b3 100644 --- a/.github/workflows/cicd.yml +++ b/.github/workflows/cicd.yml @@ -92,8 +92,10 @@ jobs: docker run --env CI=true --env MKDOCS_GIT_COMMITTERS_APIKEY=${{ secrets.GITHUB_TOKEN }} -v $(pwd):/docs --user $(id -u):$(id -g) --entrypoint mkdocs ghcr.io/srl-labs/mkdocs-material-insiders:$MKDOCS_MATERIAL_VER build - name: Publish to Cloudflare Pages - uses: cloudflare/wrangler-action@v3.8.0 + uses: cloudflare/wrangler-action@v3 id: wrangler-deploy + env: + CLOUDFLARE_API_TOKEN: ${{ secrets.CLOUDFLARE_API_TOKEN }} with: apiToken: ${{ secrets.CLOUDFLARE_API_TOKEN }} accountId: 62e7f50db0ad5b34dbf1fb9b0ed2ef81 From 02eeaf0ea4ee7cd2e90163f51df11096ad2819cc Mon Sep 17 00:00:00 2001 From: Aninda Chatterjee Date: Wed, 16 Oct 2024 14:46:10 +0530 Subject: [PATCH 07/11] addressed changes suggested by roman/jorge --- .../posts/2024/srlinux-asymmetric-routing.md | 32 ++++++++----------- 1 file changed, 14 insertions(+), 18 deletions(-) diff --git a/docs/blog/posts/2024/srlinux-asymmetric-routing.md b/docs/blog/posts/2024/srlinux-asymmetric-routing.md index be144b19..668202aa 100644 --- a/docs/blog/posts/2024/srlinux-asymmetric-routing.md +++ b/docs/blog/posts/2024/srlinux-asymmetric-routing.md @@ -104,7 +104,11 @@ The end goal of this post is to ensure that server s1 can communicate with both ## Reviewing the asymmetric routing model -When routing between VNIs, in a VXLAN fabric, there are two major routing models that can be used - asymmetric and symmetric. Asymmetric routing, which is the focus of this post, uses a `bridge-route-bridge` model, implying that the ingress leaf bridges the packet into the Layer 2 domain, routes it from one VLAN/VNI to another and then bridges the packet across the VXLAN fabric to the destination. +When routing between VNIs, in a VXLAN fabric, there are two major routing models that can be used - asymmetric and symmetric. Asymmetric routing, which is the focus of this post, uses a `bridge-route-bridge` model, implying that the ingress leaf bridges the packet into the Layer 2 domain, routes it from one VLAN/VNI to another and then bridges the packet across the VXLAN fabric to the destination. The *asymmetry* is in the the number of lookups needed on the ingress and the egress leafs - on the ingress leaf, a MAC lookup, an IP lookup and then another MAC lookup is performed while on the egress leaf, only a MAC lookup is performed. + +/// note +Asymmetric and symmetric routing models are defined in RFC 9135. +/// Such a design naturally implies that both the source and the destination IRBs (and the corresponding Layer 2 domains and bridge tables) must exist on all leafs hosting servers that need to communicate with each other. While this increases the operational state on the leafs themselves (ARP state and MAC address state is stored everywhere), it does offer configuration and operational simplicity. @@ -842,7 +846,7 @@ A:spine1# info routing-policy policy leaf-* Similar to how ranges can be used to pull configuration state from multiple interfaces as an example, in this case a wildcard `*` is used to select multiple routing-policies. The wildcard `spine-*` matches both policies named `spine-import` and `spine-export`. /// -### Host connectivity and ESI LAG +### Host connectivity and LAG Ethernet Segment (ESI LAG) With BGP configured, we can start to deploy the connectivity to the servers and configure the necessary VXLAN constructs for end-to-end connectivity. The interfaces, to the servers, are configured as untagged interfaces. Since host s2 is multi-homed to leaf2 and leaf3, this segment is configured as an ESI LAG. This includes: @@ -1116,8 +1120,6 @@ A:leaf1# info interface irb0 host-route { populate dynamic { } - populate evpn { - } } evpn { advertise dynamic { @@ -1141,8 +1143,6 @@ A:leaf1# info interface irb0 host-route { populate dynamic { } - populate evpn { - } } evpn { advertise dynamic { @@ -1179,8 +1179,6 @@ A:leaf2# info interface irb0 host-route { populate dynamic { } - populate evpn { - } } evpn { advertise dynamic { @@ -1217,8 +1215,6 @@ A:leaf2# info interface irb0 host-route { populate dynamic { } - populate evpn { - } } evpn { advertise dynamic { @@ -1255,8 +1251,6 @@ A:leaf2# info interface irb0 host-route { populate dynamic { } - populate evpn { - } } evpn { advertise dynamic { @@ -1280,8 +1274,6 @@ A:leaf2# info interface irb0 host-route { populate dynamic { } - populate evpn { - } } evpn { advertise dynamic { @@ -1306,15 +1298,15 @@ There is a lot going on here, so let's breakdown some of the configuration optio `anycast-gw anycast-gw-mac [mac-address]` -: The MAC address configured with this option is the anycast gateway MAC address and is associated to the IP address for that subinterface. If this is ommitted, the anycast gateway MAC address is auto-derived from the VRRP MAC address group range. +: The MAC address configured with this option is the anycast gateway MAC address and is associated to the IP address for that subinterface. If this is ommitted, the anycast gateway MAC address is auto-derived from the VRRP MAC address group range, as specified in RFC 9135.. `arp learn-unsolicited [true|false]` : This enables the node to learn the IP-to-MAC binding from any ARP packet and not just ARP requests. -`arp host-route populate [dynamic|static|evpn]` +`arp host-route populate dynamic` -: This enables the node to insert a host route (/32 for IPv4 and /128 for IPv6) in the routing table from dynaimc, static or EVPN-learnt ARP entries. +: This enables the node to insert a host route (/32 for IPv4 and /128 for IPv6) in the routing table from dynaimc ARP entries. `arp evpn advertise [dynamic|static]` @@ -1328,7 +1320,7 @@ Finally, MAC VRFs are created on the leafs to create a broadcast domain and corr * The corresponding IRB subinterface is bound to the MAC VRF using the `interface` configuration option. * The VXLAN tunnel subinterface is bound to the MAC VRF using the `vxlan-interface` configuration option. * BGP EVPN learning is enabled for the MAC VRF using the `protocols bgp-evpn` hierarchy and the MAC VRF is bound to an EVI (EVPN virtual instance). -* The `ecmp` configuration option determines how many VTEPs can be considered for load-balancing by the local VTEP (more on this in the validation section). +* The `ecmp` configuration option determines how many VTEPs can be considered for load-balancing by the local VTEP (more on this in the validation section). This is for overlay ECMP in relation to remote multihomed hosts (for multihoming aliasing). * Route distinguishers and route targets are configured for the MAC VRF using the `protocols bgp-vpn` hierarchy. /// tab | leaf1 @@ -1549,6 +1541,10 @@ A:leaf4# info network-instance macvrf* /// +/// note +If needed, route distinguishers can be auto-derived as well by simply omitting the `bgp-vpn bgp-instance [instance-number] route-distinguisher` configuration option. +/// + This completes the configuration walkthrough section of this post. Next, we'll cover the control plane and data plane validation. ## Control plane & data plane validation From 4dd2f20b67d9cca20c28be6e76664654a8791b39 Mon Sep 17 00:00:00 2001 From: Aninda Chatterjee Date: Wed, 16 Oct 2024 15:37:55 +0530 Subject: [PATCH 08/11] minor changes to asymmetric post --- .../posts/2024/srlinux-asymmetric-routing.md | 34 +++++-------------- 1 file changed, 8 insertions(+), 26 deletions(-) diff --git a/docs/blog/posts/2024/srlinux-asymmetric-routing.md b/docs/blog/posts/2024/srlinux-asymmetric-routing.md index 668202aa..302efd80 100644 --- a/docs/blog/posts/2024/srlinux-asymmetric-routing.md +++ b/docs/blog/posts/2024/srlinux-asymmetric-routing.md @@ -110,7 +110,7 @@ When routing between VNIs, in a VXLAN fabric, there are two major routing models Asymmetric and symmetric routing models are defined in RFC 9135. /// -Such a design naturally implies that both the source and the destination IRBs (and the corresponding Layer 2 domains and bridge tables) must exist on all leafs hosting servers that need to communicate with each other. While this increases the operational state on the leafs themselves (ARP state and MAC address state is stored everywhere), it does offer configuration and operational simplicity. +Such a design naturally implies that both the source and the destination IRBs (and the corresponding Layer 2 domains and bridge tables) must exist on all leafs hosting servers that need to communicate with each other. While this increases the operational state on the leafs themselves (ARP state and MAC address state is stored everywhere), it does offer operational simplicity. This is because unlike symmetric routing, there is no concept of a `L3VNI` here, which keeps the routing complexity to a minimum, and analogous to traditional inter-VLAN routing, only with a VXLAN-encapsulation, in this case. No additional VLANs/VNIs need to be configured (for L3VNIs, which are typically mapped per IP VRF), making this a simpler solution to implement and operate. The obvious drawbacks of this approach, however, is that VLANs/VNIs cannot be scoped to specific leafs only - they must exist across all leafs that want to participate in inter-VNI routing. ![](https://gitlab.com/aninchat1/images/-/wikis/uploads/f93957318e62633db1c8603dbef57b69/srlinux-asymmetric-2.png){.img-shad} @@ -1647,28 +1647,7 @@ IPv4 unicast route table of network instance default +---------------------------+-------+------------+----------------------+----------+----------+---------+------------+-----------------+-----------------+-----------------+----------------------+ ``` -Since this IRB interface exists on leaf4 as well, the ARP reply will be consumed by it, never reaching leaf1, and thus, creating a failure in the ARP process. To circumvent this problem associated with an anycast, distributed IRB model, the EVPN Type-2 MAC+IP routes are used to populate the ARP cache. In addition to this, optionally, this EVPN-learnt ARP entry can also be used to inject a host route (/32 for IPv4 and /128 for IPv6) into the routing table using the `arp host-route populate evpn` configuration option (as discussed earlier). Since this is enabled in our case, we can confirm that the route 172.16.20.3/32 exists in the routing table, inserted by the arp_nd_mgr process: - -``` ---{ + running }--[ ]-- -A:leaf1# show network-instance default route-table ipv4-unicast prefix 172.16.20.3/32 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -IPv4 unicast route table of network instance default ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -+---------------------------+-------+------------+----------------------+----------+----------+---------+------------+-----------------+-----------------+-----------------+----------------------+ -| Prefix | ID | Route Type | Route Owner | Active | Origin | Metric | Pref | Next-hop (Type) | Next-hop | Backup Next-hop | Backup Next-hop | -| | | | | | Network | | | | Interface | (Type) | Interface | -| | | | | | Instance | | | | | | | -+===========================+=======+============+======================+==========+==========+=========+============+=================+=================+=================+======================+ -| 172.16.20.3/32 | 10 | arp-nd | arp_nd_mgr | True | default | 0 | 1 | 172.16.20.3 | irb0.20 | | | -| | | | | | | | | (direct) | | | | -+---------------------------+-------+------------+----------------------+----------+----------+---------+------------+-----------------+-----------------+-----------------+----------------------+ -``` - -/// note - -The `arp host-route populate evpn` configuration option is purely a design choice. Since a routing lookup is based on the longest-prefix-match logic (where the longest prefix wins), the existence of the host routes ensure that when there is a routing lookup for the destination, the host route is selected instead of falling back to the subnet route, which relies on ARP resolution, making the forwarding process more efficient. However, this also implies that a host route is created for every EVPN-learnt ARP entry, which can lead to a large routing table, potentially creating an issue in large-scale fabrics. -/// +Since this IRB interface exists on leaf4 as well, the ARP reply will be consumed by it, never reaching leaf1, and thus, creating a failure in the ARP process. To circumvent this problem associated with an anycast, distributed IRB model, the EVPN Type-2 MAC+IP routes are used to populate the ARP cache. Let's consider two flows to understand the data plane forwarding in such a design - server s1 communicating with s2 (same subnet) and s1 communicating with s3 (different subnet). @@ -1676,9 +1655,10 @@ Since s1 is in the same subnet as s2, when communicating with s2, s1 will try to ![](https://gitlab.com/aninchat1/images/-/wikis/uploads/bc7ebec1d9e45487dead1d77849f09c2/srlinux-asymmetric-4.png){.img-shadow} -Once this ARP process completes, host s1 generates an ICMP request (since we are testing communication between hosts using the `ping` tool). When this IP packet arrives on leaf1, it does a routing lookup (since the destination MAC address is owned by itself) and this routing lookup will either hit the 172.16.10.0/24 prefix or the more-specific 172.16.10.2/32 prefix (installed from the ARP entry via the EVPN Type-2 MAC+IP route), as shown below. Since this is a directly attached route, it is further resolved into a MAC address via the ARP table and then the packet is bridged towards the destination. This MAC address points to an Ethernet Segment, which in turn resolves into VTEPs 192.0.2.12 and 192.0.2.13. +Once this ARP process completes, host s1 generates an ICMP request (since we are testing communication between hosts using the `ping` tool). When this IP packet arrives on leaf1, it does a routing lookup (since the destination MAC address is owned by itself) and this routing lookup hits the 172.16.10.0/24 entry, as shown below. Since this is a directly attached route, it is further resolved into a MAC address via the ARP table and then the packet is bridged towards the destination. This MAC address points to an Ethernet Segment, which in turn resolves into VTEPs 192.0.2.12 and 192.0.2.13. ```{.srl .code-scroll-lg} +--{ + running }--[ ]-- A:leaf1# show network-instance default route-table ipv4-unicast route 172.16.10.2 -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- IPv4 unicast route table of network instance default @@ -1688,11 +1668,12 @@ IPv4 unicast route table of network instance default | | | | | | Network | | | (Type) | Interface | hop (Type) | Interface | | | | | | | Instance | | | | | | | +========================+=======+============+======================+==========+==========+=========+============+===============+===============+===============+==================+ -| 172.16.10.2/32 | 8 | arp-nd | arp_nd_mgr | True | default | 0 | 1 | 172.16.10.2 | irb0.10 | | | +| 172.16.10.0/24 | 4 | local | net_inst_mgr | True | default | 0 | 0 | 172.16.10.254 | irb0.10 | | | | | | | | | | | | (direct) | | | | +------------------------+-------+------------+----------------------+----------+----------+---------+------------+---------------+---------------+---------------+------------------+ -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +--{ + running }--[ ]-- ``` ```{.srl .code-scroll-lg} @@ -1756,11 +1737,12 @@ IPv4 unicast route table of network instance default | | | | | | Network | | | (Type) | Interface | hop (Type) | Interface | | | | | | | Instance | | | | | | | +========================+=======+============+======================+==========+==========+=========+============+===============+===============+===============+==================+ -| 172.16.20.3/32 | 10 | arp-nd | arp_nd_mgr | True | default | 0 | 1 | 172.16.20.3 | irb0.20 | | | +| 172.16.20.0/24 | 5 | local | net_inst_mgr | True | default | 0 | 0 | 172.16.20.254 | irb0.20 | | | | | | | | | | | | (direct) | | | | +------------------------+-------+------------+----------------------+----------+----------+---------+------------+---------------+---------------+---------------+------------------+ -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- +--{ + running }--[ ]-- ``` ```{.srl .code-scroll-lg} From ffdcf66d7dd71c3c4829fe3515abccb1856cfbe6 Mon Sep 17 00:00:00 2001 From: Roman Dodin Date: Wed, 16 Oct 2024 13:39:15 +0300 Subject: [PATCH 09/11] removed pic shadows, add link to the rfc and version notes --- .../posts/2024/srlinux-asymmetric-routing.md | 39 +++++++++---------- 1 file changed, 19 insertions(+), 20 deletions(-) diff --git a/docs/blog/posts/2024/srlinux-asymmetric-routing.md b/docs/blog/posts/2024/srlinux-asymmetric-routing.md index 302efd80..2703f86d 100644 --- a/docs/blog/posts/2024/srlinux-asymmetric-routing.md +++ b/docs/blog/posts/2024/srlinux-asymmetric-routing.md @@ -9,7 +9,7 @@ authors: # Asymmetric routing with SR Linux in EVPN VXLAN fabrics -This post dives deeper into the asymmetric routing model on SR Linux. +This post dives deeper into the asymmetric routing model[^1] for EVPN VXLAN fabrics on SR Linux. The topology in use is a 3-stage Clos fabric with BGP EVPN and VXLAN, with * server `s1` single-homed to `leaf1` @@ -20,11 +20,11 @@ Servers s1 and s2 are in the same subnet, 172.16.10.0/24 while s3 is in a differ The physical topology is shown below: -![](https://gitlab.com/aninchat1/images/-/wikis/uploads/1d3750d935d534973fc913e3a3a68c49/srlinux-asymmetric-1.png){.img-shadow} +![](https://gitlab.com/aninchat1/images/-/wikis/uploads/1d3750d935d534973fc913e3a3a68c49/srlinux-asymmetric-1.png) -The Containerlab topology file used for this is shown below: +The [Containerlab](https://containerlab.dev) topology file used for this is shown below: ```{.yaml .code-scroll-lg} name: srlinux-asymmetric-routing @@ -83,8 +83,11 @@ topology: - endpoints: ["leaf4:e1-3", "s3:eth1"] ``` -/// admonition | Credentials +/// admonition | Notes type: subtle-note +

SR Linux version

+Configuration snippets and outputs in this post are based on SR Linux 24.7.1. +

Credentials

As usual, Nokia SR Linux nodes can be accessed with `admin:NokiaSrl1!` credentials and the host nodes use `user:multit00l`. /// @@ -107,12 +110,13 @@ The end goal of this post is to ensure that server s1 can communicate with both When routing between VNIs, in a VXLAN fabric, there are two major routing models that can be used - asymmetric and symmetric. Asymmetric routing, which is the focus of this post, uses a `bridge-route-bridge` model, implying that the ingress leaf bridges the packet into the Layer 2 domain, routes it from one VLAN/VNI to another and then bridges the packet across the VXLAN fabric to the destination. The *asymmetry* is in the the number of lookups needed on the ingress and the egress leafs - on the ingress leaf, a MAC lookup, an IP lookup and then another MAC lookup is performed while on the egress leaf, only a MAC lookup is performed. /// note -Asymmetric and symmetric routing models are defined in RFC 9135. +Asymmetric and symmetric routing models are defined in [RFC 9135](https://datatracker.ietf.org/doc/html/rfc9135). /// -Such a design naturally implies that both the source and the destination IRBs (and the corresponding Layer 2 domains and bridge tables) must exist on all leafs hosting servers that need to communicate with each other. While this increases the operational state on the leafs themselves (ARP state and MAC address state is stored everywhere), it does offer operational simplicity. This is because unlike symmetric routing, there is no concept of a `L3VNI` here, which keeps the routing complexity to a minimum, and analogous to traditional inter-VLAN routing, only with a VXLAN-encapsulation, in this case. No additional VLANs/VNIs need to be configured (for L3VNIs, which are typically mapped per IP VRF), making this a simpler solution to implement and operate. The obvious drawbacks of this approach, however, is that VLANs/VNIs cannot be scoped to specific leafs only - they must exist across all leafs that want to participate in inter-VNI routing. +Such a design naturally implies that both the source and the destination IRBs (and the corresponding Layer 2 domains and bridge tables) must exist on all leafs hosting servers that need to communicate with each other. While this increases the operational state on the leafs themselves (ARP state and MAC address state is stored everywhere), it does offer operational simplicity. +This is because unlike symmetric routing, there is no concept of a `L3VNI` here, which keeps the routing complexity to a minimum, and analogous to traditional inter-VLAN routing, only with a VXLAN-encapsulation, in this case. No additional VLANs/VNIs need to be configured (for L3VNIs, which are typically mapped per IP VRF), making this a simpler solution to implement and operate. The obvious drawbacks of this approach, however, is that VLANs/VNIs cannot be scoped to specific leafs only - they must exist across all leafs that want to participate in inter-VNI routing, which contributes to the scalability considerations. -![](https://gitlab.com/aninchat1/images/-/wikis/uploads/f93957318e62633db1c8603dbef57b69/srlinux-asymmetric-2.png){.img-shad} +![](https://gitlab.com/aninchat1/images/-/wikis/uploads/f93957318e62633db1c8603dbef57b69/srlinux-asymmetric-2.png) ## Configuration walkthrough @@ -699,7 +703,7 @@ The BGP configuration defines a peer-group called `spine` on the leafs and `leaf The following packet capture also confirms the MP-BGP capabilities exchanged with the BGP OPEN messages, where both IPv4 unicast and L2VPN EVPN capabilities are advertised: -![](https://gitlab.com/aninchat1/images/-/wikis/uploads/a55a3e47da51d29386b372c2a1a790ee/srlinux-asymmetric-3.png){.img-shadow} +![](https://gitlab.com/aninchat1/images/-/wikis/uploads/a55a3e47da51d29386b372c2a1a790ee/srlinux-asymmetric-3.png) ### Routing policies for the underlay and overlay @@ -841,7 +845,7 @@ A:spine1# info routing-policy policy leaf-* /// -/// admonition | CLI Ranges +/// admonition | CLI Wildcards type: tip Similar to how ranges can be used to pull configuration state from multiple interfaces as an example, in this case a wildcard `*` is used to select multiple routing-policies. The wildcard `spine-*` matches both policies named `spine-import` and `spine-export`. /// @@ -1647,13 +1651,13 @@ IPv4 unicast route table of network instance default +---------------------------+-------+------------+----------------------+----------+----------+---------+------------+-----------------+-----------------+-----------------+----------------------+ ``` -Since this IRB interface exists on leaf4 as well, the ARP reply will be consumed by it, never reaching leaf1, and thus, creating a failure in the ARP process. To circumvent this problem associated with an anycast, distributed IRB model, the EVPN Type-2 MAC+IP routes are used to populate the ARP cache. +Since this IRB interface exists on leaf4 as well, the ARP reply will be consumed by it, never reaching leaf1, and thus, creating a failure in the ARP process. To circumvent this problem associated with an anycast, distributed IRB model, the EVPN Type-2 MAC+IP routes are used to populate the ARP cache. Let's consider two flows to understand the data plane forwarding in such a design - server s1 communicating with s2 (same subnet) and s1 communicating with s3 (different subnet). Since s1 is in the same subnet as s2, when communicating with s2, s1 will try to resolve its IP address directly via an ARP request. This is received on leaf1 and leaked to the CPU via `irb0.10`. Since L2 proxy-arp is not enabled, the `arp_nd_mgr` process picks up the ARP request and responds back using its own anycast gateway MAC address while suppressing the ARP request from being flooded in the fabric. A packet capture of this ARP reply is shown below. -![](https://gitlab.com/aninchat1/images/-/wikis/uploads/bc7ebec1d9e45487dead1d77849f09c2/srlinux-asymmetric-4.png){.img-shadow} +![](https://gitlab.com/aninchat1/images/-/wikis/uploads/bc7ebec1d9e45487dead1d77849f09c2/srlinux-asymmetric-4.png) Once this ARP process completes, host s1 generates an ICMP request (since we are testing communication between hosts using the `ping` tool). When this IP packet arrives on leaf1, it does a routing lookup (since the destination MAC address is owned by itself) and this routing lookup hits the 172.16.10.0/24 entry, as shown below. Since this is a directly attached route, it is further resolved into a MAC address via the ARP table and then the packet is bridged towards the destination. This MAC address points to an Ethernet Segment, which in turn resolves into VTEPs 192.0.2.12 and 192.0.2.13. @@ -1671,9 +1675,6 @@ IPv4 unicast route table of network instance default | 172.16.10.0/24 | 4 | local | net_inst_mgr | True | default | 0 | 0 | 172.16.10.254 | irb0.10 | | | | | | | | | | | | (direct) | | | | +------------------------+-------+------------+----------------------+----------+----------+---------+------------+---------------+---------------+---------------+------------------+ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---{ + running }--[ ]-- ``` ```{.srl .code-scroll-lg} @@ -1717,12 +1718,11 @@ Ethernet Segment Destinations +===============================+===================+========================+=============================+ | 00:00:11:11:11:11:11:11:23:23 | 322085950259 | 192.0.2.12, 192.0.2.13 | 1(1/0) | +-------------------------------+-------------------+------------------------+-----------------------------+ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ``` A packet capture of the in-flight packet (as leaf1 sends it to spine1) is shown below, which confirms that the packet ICMP request is VXLAN-encapsulated with a VNI of 10010. It also confirms that because of the L3 proxy-arp approach to suppressing ARPs in an EVPN VXLAN fabric, the source MAC address in the inner Ethernet header is the anycast gateway MAC address. -![](https://gitlab.com/aninchat1/images/-/wikis/uploads/2aba126b6ddb1c4c37d4be11d125c1c6/srlinux-asymmetric-5.png){.img-shadow} +![](https://gitlab.com/aninchat1/images/-/wikis/uploads/2aba126b6ddb1c4c37d4be11d125c1c6/srlinux-asymmetric-5.png) The communication between host s1 and s3 follows a similar pattern - the packet is received in macvrf1, mapped VNI 10010, and since the destination MAC address is the anycast MAC address owned by leaf1, it is then routed locally into VNI 10020 (since `irb0.20` is locally attached) and then bridged across to the destination, as confirmed below: @@ -1740,9 +1740,6 @@ IPv4 unicast route table of network instance default | 172.16.20.0/24 | 5 | local | net_inst_mgr | True | default | 0 | 0 | 172.16.20.254 | irb0.20 | | | | | | | | | | | | (direct) | | | | +------------------------+-------+------------+----------------------+----------+----------+---------+------------+---------------+---------------+---------------+------------------+ --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---{ + running }--[ ]-- ``` ```{.srl .code-scroll-lg} @@ -1770,10 +1767,12 @@ Notice how the previous output used a wildcard for the network-instance name ins The following packet capture confirms that the in-flight packet has been routed on the ingress leaf itself (leaf1) and the VNI, in the VXLAN header, is 10020. -![](https://gitlab.com/aninchat1/images/-/wikis/uploads/4dad44354646d9f1c32a73d88c8f7da8/srlinux-asymmetric-6.png){.img-shadow} +![](https://gitlab.com/aninchat1/images/-/wikis/uploads/4dad44354646d9f1c32a73d88c8f7da8/srlinux-asymmetric-6.png) ## Summary Asymmetric routing uses a `bridge-route-bridge` model where the packet, from the source, is bridged into the ingress leaf's L2 domain, routed into the destination VLAN/VNI and the bridged across the VXLAN fabric to the destination. Such a model requires the existence of both source and destination IRBs and L2 bridge domains (and L2 VNIs) to exist on all leafs that want to participate in routing between the VNIs. While this is operationally simpler, it does add additional state since all leafs will have to maintain all IP-to-MAC bindings (in the ARP table) and all MAC addresses in the bridge table. + +[^1]: Asymmetric and Symmetric routing models are covered in [RFC 9135](https://datatracker.ietf.org/doc/html/rfc9135) From 3238f4a0907b47b7c61e59bb5af636f2708825b2 Mon Sep 17 00:00:00 2001 From: Roman Dodin Date: Wed, 16 Oct 2024 13:51:19 +0300 Subject: [PATCH 10/11] squashed typos --- docs/blog/posts/2024/srlinux-asymmetric-routing.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/blog/posts/2024/srlinux-asymmetric-routing.md b/docs/blog/posts/2024/srlinux-asymmetric-routing.md index 2703f86d..67c01381 100644 --- a/docs/blog/posts/2024/srlinux-asymmetric-routing.md +++ b/docs/blog/posts/2024/srlinux-asymmetric-routing.md @@ -380,7 +380,7 @@ On the other hand, you can also provide a contiguous range of numbers by using ` Check out a separate post on [CLI Ranges and Wildcards](../2023/cli-ranges.md). /// -Remember, by default, there is no global routing instance/table in SR Linux. A `network-instance` of type `default` must be configured and these interfaces, including the `system0` interface need to be added to this network instance for point-to-point connectivity. +Remember, by default, there is no global routing instance/table in SR Linux. A `network-instance` with named `default` must be configured and these interfaces, including the `system0` interface need to be added to this network instance for point-to-point connectivity. ### Underlay and overlay BGP @@ -1302,7 +1302,7 @@ There is a lot going on here, so let's breakdown some of the configuration optio `anycast-gw anycast-gw-mac [mac-address]` -: The MAC address configured with this option is the anycast gateway MAC address and is associated to the IP address for that subinterface. If this is ommitted, the anycast gateway MAC address is auto-derived from the VRRP MAC address group range, as specified in RFC 9135.. +: The MAC address configured with this option is the anycast gateway MAC address and is associated to the IP address for that subinterface. If this is omitted, the anycast gateway MAC address is auto-derived from the VRRP MAC address group range, as specified in RFC 9135.. `arp learn-unsolicited [true|false]` @@ -1310,7 +1310,7 @@ There is a lot going on here, so let's breakdown some of the configuration optio `arp host-route populate dynamic` -: This enables the node to insert a host route (/32 for IPv4 and /128 for IPv6) in the routing table from dynaimc ARP entries. +: This enables the node to insert a host route (/32 for IPv4 and /128 for IPv6) in the routing table from dynamic ARP entries. `arp evpn advertise [dynamic|static]` From 5a840399ccecc689089550436498cad7ad215286 Mon Sep 17 00:00:00 2001 From: Roman Dodin Date: Wed, 16 Oct 2024 13:55:12 +0300 Subject: [PATCH 11/11] added related links --- docs/blog/posts/2024/srlinux-asymmetric-routing.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/blog/posts/2024/srlinux-asymmetric-routing.md b/docs/blog/posts/2024/srlinux-asymmetric-routing.md index 67c01381..78ed7a6c 100644 --- a/docs/blog/posts/2024/srlinux-asymmetric-routing.md +++ b/docs/blog/posts/2024/srlinux-asymmetric-routing.md @@ -5,6 +5,9 @@ tags: - evpn authors: - aninda +links: + - EVPN Multihoming tutorial: tutorials/evpn-mh/basics/index.md + - Basic L2 EVPN tutorial: tutorials/l2evpn/intro.md --- # Asymmetric routing with SR Linux in EVPN VXLAN fabrics