New link struct - Vxlan #1532

steiler · 2023-08-18T05:54:23Z

Implement VxLAN as a struct that implements the links.Link interface.
This is then meant to be utilized via the tools command, but will also allow for the definition of VxLAN links inbetween containerlab nodes via the topology file.

steiler · 2023-08-30T13:48:05Z

@hellt this one is fine to get reviewed
It also contains tests and docs already

remove vxlan stitched host inderfaces use EndpintGeneric in EndpointVxlan generalize MTU handling for srl config template

steiler · 2023-09-06T13:48:18Z

The tools command still needs to be migrated.

hellt · 2023-09-28T18:33:37Z

note to self:

check that we use 14789 port by default instead of 4789 as the latter is often blocked
check that we expose an option to change the port (both in link definition and tools cmd)
make sure to create altname for the vxlan link that is longer than 15 chars

hellt · 2023-09-30T22:36:04Z

nodes/srl/srl.go

-			}
-			iface.Mtu = mtu
-		}
+		iface.Mtu = e.GetLink().GetMtu()


this is what causes config to break now in the srl tests in CI

two issues here

when we use srl node kind the errors from srl postdeploy are not visible (need to understand why)

when changed to nokia_srlinux kind, the following becomes evident:

ERRO[0017] failed to run postdeploy task for node srl1: command execution error:At line 10: Error: Error in path: .interface{.name=="ethernet-1/1"}.mtu [InvalidArgument] mtu 9500 out of range [1500-9412] ERRO[0017] failed to run postdeploy task for node srl2: command execution error:At line 13: Error: Error in path: .interface{.name=="ethernet-1/1"}.mtu [InvalidArgument] mtu 9500 out of range [1500-9412]

the default 9500 MTU shouldn't be used in the srl config, as the 9500 link mtu is a veth mtu only, not for config in NOSes.

in b51def3 I changed setting the mtu value in the template endpoints struct only if it is not the default veth mtu value.

So here is the issue with SR Linux and low MTU interfaces.
SR Linux emulates BCOM datapath, and port mtu on recent hw platforms can't be less that 1500B.
When using vxlan interfaces we ultimately rely on the MTU of the outgoing (parent) interfaces, which is often configured with 1500B itself.
This leaves vxlan interface with 1450B MTU which is too low, and can't be configured on the SR Linux platform.

Because of that, using vlxan interfaces inside the SRL netns can only be done if the parent interface of the vlxan tunnel has 1550B+ MTU.

In practice with connection to virtual labs this likely means that we will still use more of the vxlan-stitched mode, rather than vxlan-embed.

hellt · 2023-10-01T14:49:20Z

tests/08-vxlan/01-vxlan.clab.yml

+name: vxlan
+
+topology:
+  nodes:
+    s1:
+      kind: srl
+      image: ghcr.io/nokia/srlinux
+      startup-config: 01-vxlan-s1.config
+  links:
+    - type: vxlan
+      endpoint:
+        node: s1
+        interface: e1-1
+        mac: 02:00:00:00:00:04
+      remote: 192.168.66.1
+      vni: 100
+      udp-port: 5555


I think instead of this lab we can use the following:

name: vxlan-embed topology: nodes: l1: kind: linux image: alpine:3 exec: - ip addr add dev eth1 192.168.0.1/30 mgmt-ipv4: 172.20.20.91 l2: kind: linux image: alpine:3 exec: - ip addr add dev eth1 192.168.0.2/30 mgmt-ipv4: 172.20.20.92 links: - type: vxlan endpoint: node: l1 interface: l1eth1 remote: 172.20.20.92 vni: 100 - type: vxlan endpoint: node: l2 interface: l2eth1 remote: 172.20.20.91 vni: 100

That way you don't have to touch the host at all (and clean it up, which is not happening currently)

Note, that this lab can't be deployed today, some error about the same file being present...

looking at the code it seems this lab is not possible to deploy. Since we have to find the outgoing interface (I wonder if this is a mandatory step?) we rely on the host's routing stack. It makes sense, since in real life applications, the tunnels are established between host nodes, not from within the containers, as in the lab above.

It is not possible to deploy this one, since vxlan links created in a single netns can't have the same VNI.
I will rework this lab

It is still worth looking into this closer. Could be alpine issue or smth. Everything below this line is an early observation.

~~This lab works, but with prereq.~~
When the vlxan interface is first created in the host netns and then moved to the container netns, it needs to initiate connection towards the remote first.
If the remote end (l2) starts to ping l1 first, it won't succeed. l1 needs to start pinging first.

name: vxlan-embed topology: nodes: l1: kind: linux image: alpine:3 exec: - ip addr add dev eth1 192.168.0.1/30 mgmt-ipv4: 172.20.20.91 l2: kind: linux image: alpine:3 exec: - > ash -c ' apk add iproute2 && ip link add name vxlan0 type vxlan id 100 remote 172.20.20.91 dstport 14789 && ip l set dev vxlan0 up && ip addr add dev vxlan0 192.168.0.2/30' mgmt-ipv4: 172.20.20.92 links: - type: vxlan endpoint: node: l1 interface: eth1 remote: 172.20.20.92 vni: 100

There is something strange happening with vxlan tunnels moved to container netns.

I have built a topology that it's deployed in two different hosts like this:

#host1 name: vxlan-embed prefix: "" mgmt: network: fixedips ipv4_subnet: 100.103.1.0/24 topology: kinds: linux: image: ghcr.io/hellt/network-multitool nodes: node1: kind: linux mgmt_ipv4: 100.103.1.11 exec: - ip addr add dev eth1 192.168.0.1/30 links: - type: vxlan endpoint: node: node1 interface: eth1 mac: 02:00:00:00:00:02 remote: 100.103.2.22 vni: 100 udp-port: 4789

#host2 name: vxlan-embed prefix: "" mgmt: network: fixedips ipv4_subnet: 100.103.2.0/24 topology: kinds: linux: image: ghcr.io/hellt/network-multitool nodes: node2: kind: linux mgmt_ipv4: 100.103.2.22 exec: - ip addr add dev eth1 192.168.0.2/30 links: - type: vxlan endpoint: node: node2 interface: eth1 mac: 02:00:00:00:00:01 remote: 100.103.1.11 vni: 100 udp-port: 4789

ping from 192.168.0.1 to 192.168.0.2 and vice versa does not work.

The first thing I see is that vxlan tunnels are created with the “nolearning” flag. This disables the snooping of incoming packets that builds MAC to VTEP address fdb table.

2829: eth1@if2829: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 02:00:00:00:00:01 brd ff:ff:ff:ff:ff:ff link-netnsid 0 promiscuity 0 minmtu 68 maxmtu 65535 vxlan id 100 remote 100.103.6.11 dev if30 srcport 0 0 dstport 4789 nolearning ttl auto ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 inet 192.168.0.1/30 scope global eth1

If we look at fdbs in node1 and node2:

host1: 00:00:00:00:00:00 dev eth1 dst 100.103.2.22 via ifindex 9 link-netnsid 0 self permanent host2: 00:00:00:00:00:00 dev eth1 dst 100.103.1.11 via ifindex 30 link-netnsid 0 self permanent

BUM traffic will be delivered to those dst IP addresses.
These ifindexes (Nexthops) are root namespace interfaces. They are the interfaces that are used to reach destination IP tunnel. It can be the mgmt. bridge or other interface in the host.

If we do a tcpdump to check vxlan traffic, we see that it’s not coming from “mgmt_ipv4” of the node but from the IP of the bridge in the host. It looks that the udp socket of the vxlan tunnel is created at the root namespace. I understand that’s the reason you also see “udp port 4789 unreachable” in the tcpdumps of container.

If I add fdb entries, using IPs in the root namepspace:

##host1 ip netns exec node1 bridge fdb append 00:00:00:00:00:00 dev eth1 self dst 100.103.2.1 vni 100 port 4789 ##host2 ip netns exec node1 bridge fdb append 00:00:00:00:00:00 dev eth1 self dst 100.103.1.1 vni 100 port 4789

.. traffic starts to flow immediately.

Hope it helps.

Hi @michelredondo
We will bring back the learning flag back.
As for the interface IP - it is indeed expected that the interface belongs to the root netns, as this is where the tunnel interface was created.

In general, I feel that moving vxlan to container netns is a bit hacky

Even with the learning flag, topologies like the one I described will not work. Both containers have the “00:00:00:00:00:00” fdb entry pointing to the incorrect IP, so it will never be possible to learn MACs from the other tunnel end.

The https://github.com/srl-labs/containerlab/blob/main/tests/08-vxlan/01-vxlan.clab.yml test case works because srl1 fdb entry points to the correct IP (172.20.25.22). When you originate traffic from that side, srl1 sends BUM traffic to l2 node and l2 learns how to send traffic back.
If you try to originate traffic from l2 node, BUM traffic is sent to srl1 mgmt IP (172.20.25.21), but the udp vxlan socket is not listening there. It’s listening in root namespace (bridge IP: 172.20.25.1). Learning in this case never happens.

The only way I can think to make this work is to add an additional fdb entry pointing to the IP address in the root ns. Maybe we could add this as an optional parameter in the link definition.

links/link_vxlan.go

tests/08-vxlan/01-vxlan.robot

hellt · 2023-10-04T16:44:01Z

links/link.go

+		// rename the link created with random name if its length is acceptable by linux
+		if len(endpt.GetIfaceName()) < 15 {
+			err := netlink.LinkSetName(l, endpt.GetIfaceName())
+			if err != nil {
+				return fmt.Errorf(
+					"failed to rename link: %v", err)
+			}
+		} else {
+			// else we set the desired long name as alias
+			// in future we need to set it as an alternative name,
+			// pending https://twitter.com/ntdvps/status/1709580216648024296
+			err := netlink.LinkSetAlias(l, endpt.GetIfaceName())
+			if err != nil {
+				return fmt.Errorf(
+					"failed to add alias: %v", err)
+			}


Here is a ~big change I made to the original proposal.
Instead of creating some hashed name for the interface I propose the following:

if an interface name is 14+ chars we use the random name that has already been generated by clab and use it

the long name we set as an alias for the interface. The alias is like a description of the interface that can container 128 chars. The alias though doesn't let you use ip link show <alias>.

I am waiting on this to merge Add support for alternative names vishvananda/netlink#862 to use alternative name instead of the alias. The altname allows users to use it in ip utility.

hellt · 2023-10-04T17:54:08Z

clab/clab.go

-// VethCleanup iterates over links found in clab topology to initiate removal of dangling veths
-// in host networking namespace or attached to linux bridge.
-// See https://github.com/srl-labs/containerlab/issues/842 for the reference.
-func (c *CLab) VethCleanup(ctx context.Context) error {
-	hostBasedEndpoints := []links.Endpoint{}
-
-	// collect the endpoints of regular nodes
-	for _, n := range c.Nodes {
-		if n.Config().IsRootNamespaceBased || n.Config().NetworkMode == "host" {
-			hostBasedEndpoints = append(hostBasedEndpoints, n.GetEndpoints()...)
-		}
-	}
-
-	// collect the endpoints of the fake nodes
-	hostBasedEndpoints = append(hostBasedEndpoints, links.GetHostLinkNode().GetEndpoints()...)
-	hostBasedEndpoints = append(hostBasedEndpoints, links.GetMgmtBrLinkNode().GetEndpoints()...)
-
-	var joinedErr error
-	for _, ep := range hostBasedEndpoints {
-		// finally remove all the collected endpoints
-		log.Debugf("removing endpoint %s", ep.String())
-		err := ep.Remove()
-		if err != nil {
-			joinedErr = errors.Join(joinedErr, err)
-		}
-	}
-
-	return joinedErr
-}


we need to find a solution for this removed func. While working on the vlxan-stich I see that when I remove the lab, the host link remains.
This was the reason to have this func.

The Remove() function is implemented on the vxlan link and forwarded to the local endpoint. The local endpoint is actually a generic endpoint. My thinking as of now is, that the removal is triggered but executed in the wrong namespace, that of the container not in the host namespace.
Let me verify.

hellt · 2023-10-06T10:13:42Z

links/link_vxlan.go

+	ip := net.ParseIP(lr.Remote)
+	// if the returned ip is nil, an error occured.
+	// we consider, that we maybe have a textual hostname
+	// e.g. dns name so we try to resolve the string next
+	if ip == nil {
+		ips, err := net.LookupIP(lr.Remote)
+		if err != nil {
+			return nil, err
+		}
+
+		// prepare log message
+		sb := strings.Builder{}
+		for _, ip := range ips {
+			sb.WriteString(", ")
+			sb.WriteString(ip.String())
+		}
+		log.Debugf("looked up hostname %s, received IP addresses [%s]", lr.Remote, sb.String()[2:])
+
+		// always use the first address
+		if len(ips) <= 0 {
+			return nil, fmt.Errorf("unable to resolve %s", lr.Remote)
+		}
+		ip = ips[0]
+	}
+
+	parentIf := lr.ParentInterface
+
+	if parentIf == "" {
+		conn, err := rtnl.Dial(nil)
+		if err != nil {
+			return nil, fmt.Errorf("can't establish netlink connection: %s", err)
+		}
+		defer conn.Close()
+		r, err := conn.RouteGet(ip)
+		if err != nil {
+			return nil, fmt.Errorf("failed to find a route to VxLAN remote address %s", ip.String())
+		}
+		parentIf = r.Interface.Name
+	}


@steiler I think this code piece is now shared between
func (lr *LinkVxlanRaw) resolveStitchedVxlanComponent(params *ResolveParams) (*LinkVxlan, error)

and

func (lr *LinkVxlanRaw) resolveVxlan(params *ResolveParams) (Link, error)

It is a good candidate to create two common funcs in utils.

No not in utils but still in LinkVxlanRaw.
However there is a slight difference in the vxlan endpoint part.
Anyways, let me see how we can carve the common part out.

@hellt, take a look at 8458f0b

hellt · 2023-10-06T10:14:59Z

links/link_vxlan_stitched.go

+// Deploy provisions the stitched vxlan link with all its underlaying sub-links
+func (l *VxlanStitched) Deploy(ctx context.Context) error {
+	return l.deploy(ctx, false)
+}
+
+func (l *VxlanStitched) deploy(ctx context.Context, skipVethCreation bool) error {
+	// deploy the vxlan link
+	err := l.vxlanLink.Deploy(ctx)
+	if err != nil {
+		return err
+	}
+
+	// the veth creation might be skipped if it already exists
+	if !skipVethCreation {
+		err = l.vethLink.Deploy(ctx)
+		if err != nil {
+			return err
+		}
+	}


@steiler do we need the inner deploy then?

Yes in my opinion we do.
If you take a look at from where the deploy is called, it is

// DeployWithExistingVeth provisons the stitched vxlan link whilst the // veth interface does already exist, hence it is not created as part of this // deployment func (l *VxlanStitched) DeployWithExistingVeth(ctx context.Context) error { return l.deploy(ctx, true) } // Deploy provisions the stitched vxlan link with all its underlaying sub-links func (l *VxlanStitched) Deploy(ctx context.Context) error { return l.deploy(ctx, false) }

The withExistingVeth is basically to support the tools command that we have.

codecov · 2023-10-06T14:56:17Z

Codecov Report

Merging #1532 (6d6ebb8) into main (acba717) will increase coverage by 0.82%.
Report is 8 commits behind head on main.
The diff coverage is 59.35%.

❗ Current head 6d6ebb8 differs from pull request most recent head 19dc2f1. Consider uploading reports for the commit 19dc2f1 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1532      +/-   ##
==========================================
+ Coverage   48.58%   49.41%   +0.82%     
==========================================
  Files         133      136       +3     
  Lines       12798    13231     +433     
==========================================
+ Hits         6218     6538     +320     
- Misses       5874     5961      +87     
- Partials      706      732      +26

Files	Coverage Δ
cmd/destroy.go	`69.07% <100.00%> (+2.19%)`	⬆️
links/endpoint_bridge.go	`48.57% <100.00%> (+6.63%)`	⬆️
links/endpoint_host.go	`50.00% <100.00%> (+14.28%)`	⬆️
links/endpoint_raw.go	`78.04% <100.00%> (+5.70%)`	⬆️
links/endpoint_veth.go	`100.00% <100.00%> (ø)`
nodes/bridge/bridge.go	`60.97% <100.00%> (ø)`
clab/ovs.go	`0.00% <0.00%> (ø)`
links/endpoint.go	`81.25% <88.88%> (+1.58%)`	⬆️
clab/tc.go	`0.00% <0.00%> (ø)`
links/endpoint_vxlan.go	`83.33% <83.33%> (ø)`
... and 14 more

... and 6 files with indirect coverage changes

steiler force-pushed the vxlanNewLinkStruct branch from 45201fd to aa614a6 Compare August 29, 2023 07:54

steiler marked this pull request as ready for review August 30, 2023 13:47

steiler requested a review from hellt August 30, 2023 13:59

steiler added 10 commits August 31, 2023 15:05

init

7bb5cd4

push vxlan interface to container, rename and bring it up

11a03a1

implement vxlan-stitch

376c0e3

update doc

0827764

remove vethcleanup and replace with general remove on links

0721747

remove vxlan stitched host inderfaces use EndpintGeneric in EndpointVxlan generalize MTU handling for srl config template

add debug log message

1719074

fix test

a5fad50

vxlan tests

1f69cd7

protect from race

127e3da

please deepsource

73bbd91

steiler force-pushed the vxlanNewLinkStruct branch from bd4f765 to 73bbd91 Compare August 31, 2023 13:06

tools vxlan create to also use new link struct

85433b5

hellt added 2 commits October 1, 2023 00:28

Merge branch 'main' into vxlanNewLinkStruct

65cfc01

added vxlan test to ci

30cdcf1

hellt force-pushed the vxlanNewLinkStruct branch from 65e3414 to 30cdcf1 Compare September 30, 2023 21:45

hellt reviewed Sep 30, 2023

View reviewed changes

set mtu in srl config only if is not the default 9500 values

b51def3

hellt reviewed Oct 1, 2023

View reviewed changes

hellt added 6 commits October 1, 2023 18:04

move func to netlink utils

c57a0c2

fix descr

5784bf0

additional comments to funcs

3c45511

capitilize MTU

96dcc43

Make Endpoints accessible for LinkVeth struct

5d3ed47

remove temp container in cleanup

54863c4

hellt reviewed Oct 2, 2023

View reviewed changes

links/link_vxlan.go Outdated Show resolved Hide resolved

hellt reviewed Oct 2, 2023

View reviewed changes

links/link_vxlan.go Outdated Show resolved Hide resolved

hellt added 6 commits October 2, 2023 13:39

set udp port to default if not set

14b4161

refactored startup and overlay config handling

2a4b777

remove log message

52d182c

refactor vxlan test using native clab constructs

ddfc256

silence complexity check

f3d263f

increase connectivity timer for tests

a95c2df

hellt reviewed Oct 3, 2023

View reviewed changes

tests/08-vxlan/01-vxlan.robot Outdated Show resolved Hide resolved

hellt added 2 commits October 4, 2023 12:57

added step to verify vxlan link params and skip traffic test in CI

dda79fb

use link aliases for long names

b2a903a

hellt reviewed Oct 4, 2023

View reviewed changes

use the right link

1079743

hellt reviewed Oct 4, 2023

View reviewed changes

hellt and others added 5 commits October 5, 2023 15:30

started refactoring vxlan stitched test

1fc47c0

use linear host link resolving process

de394ed

adapt stitch tests

f49fe78

use any to check for netnsid in ci

a6475d6

fix link deletion

1ab2dc1

hellt reviewed Oct 6, 2023

View reviewed changes

steiler and others added 6 commits October 6, 2023 14:57

reorg

8458f0b

please deepsource

5d23d12

carve out GetRouteForIP helper func

f7965ff

removed vxlan params (learning l2/3 miss

6d6ebb8

parallelize vxlan tests

485d6be

added logname parser func

19dc2f1

hellt merged commit d38bdf3 into srl-labs:main Oct 6, 2023
18 checks passed

hellt mentioned this pull request Oct 9, 2023

set learning for vxlan links #1634

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New link struct - Vxlan #1532

New link struct - Vxlan #1532

steiler commented Aug 18, 2023

steiler commented Aug 30, 2023

steiler commented Sep 6, 2023

hellt commented Sep 28, 2023 •

edited

Loading

hellt Sep 30, 2023

hellt Oct 1, 2023

hellt Oct 3, 2023 •

edited

Loading

hellt Oct 1, 2023 •

edited

Loading

hellt Oct 2, 2023

hellt Oct 2, 2023

hellt Oct 2, 2023 •

edited

Loading

michelredondo Oct 9, 2023

hellt Oct 9, 2023 •

edited

Loading

michelredondo Oct 10, 2023

hellt Oct 4, 2023

hellt Oct 4, 2023

steiler Oct 5, 2023

hellt Oct 6, 2023

steiler Oct 6, 2023

steiler Oct 6, 2023

hellt Oct 6, 2023

steiler Oct 6, 2023

codecov bot commented Oct 6, 2023

New link struct - Vxlan #1532

New link struct - Vxlan #1532

Conversation

steiler commented Aug 18, 2023

steiler commented Aug 30, 2023

steiler commented Sep 6, 2023

hellt commented Sep 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hellt Oct 3, 2023 • edited Loading

Choose a reason for hiding this comment

hellt Oct 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hellt Oct 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hellt Oct 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Oct 6, 2023

Codecov Report

hellt commented Sep 28, 2023 •

edited

Loading

hellt Oct 3, 2023 •

edited

Loading

hellt Oct 1, 2023 •

edited

Loading

hellt Oct 2, 2023 •

edited

Loading

hellt Oct 9, 2023 •

edited

Loading