Skip to content
Ido Schimmel edited this page Feb 4, 2019 · 4 revisions
Table of Contents
  1. Overview
  2. Installation
  3. Starting the daemon
  4. Enabling TC-flower
  5. OVS bridges
  6. Offloading flows
  7. What can be offloaded?

Open vSwitch

Overview

Open vSwitch is a fully-featured software switch implementation. The majority of the switching logic resides in userspace (ovs-vswitchd) and is ported to various environments. There is a small set of minimal requirements for porting OVS to a given platform, mainly the ability to receive ingress packets to userspace, transmit packets via userspace and query the interface status. However, this minimal requirements would require the switch to process all the dataplane in userspace which would have horrendous effect on the overall performance.

To mitigate this, OVS porting usually consists of a kernel-portion which allows early processing and switching in the kernel. This prevents most of the kernel <-> userspace packet traversal. The kernel portion (dpif) maintains a flow table which consists of exact-match flows and associated actions. On a mismatch, the dpif would pass the packet to userspace where the daemon would further process it; This may result in a new flow being inserted to the kernel dpif flow table.

Notice the OVS Linux kernel infrastructure is entirely flow-based - it does not utilize any of the regular L2/L3 constructs within the network stack. As a result by default using OVS on top of our switch would result in very poor performance - dpif might diminish ingress packets need to travel all the way to userspace, but they still would need to be trapped, processed by the OS then (likely) egressed by sending it back to the device.

Recently, OVS gained the ability to offload flows using tc-flower, that is, whenever a given match-action rule can be handled by tc-flower it would do that instead of using the dpif dataflow. If the relevant port is capable of offloading said TC-flower rule then it would. This allows us to leverage OVS on top of the switch - tc-offloaded flows are handled entirely by the device and ingressed packets matching such would not be trapped to the CPU.

Installation

On modern distributions, OVS comes as a package and can be installed using that distribution's package manager. E.g., for Fedora 26:

$ dnf install openvswitch
Last metadata expiration check: 1:58:59 ago on Thu 21 Dec 2017 11:32:39 AM IST.
Dependencies resolved.
===============================================================================================================================================================================================================
 Package                                             Arch                                           Version                                              Repository                                       Size
===============================================================================================================================================================================================================
Installing:
 openvswitch                                         x86_64                                         2.7.3-2.fc26                                         updates                                         4.6 M

Transaction Summary
===============================================================================================================================================================================================================
Install  1 Package

However, do notice that packaged installation does not necessarily contain the necessary support for tc-flower offloading; That was only added in 2.8.0. It is possible to install latest OVS from sources. Please follow these guidelines for doing so.

Starting the daemon

In case package was installed, the daemon can be controlled by service:

$ systemctl start openvswitch

If installed from sources, some more wiggling would be required to start it. Some OVSDB should be configured for usage by OVS. Assuming this DB is local (which might be wrong for actual deployments, but is easy to experiment with) the following should be done once after installation:

$ ovsdb-tool create

This would create the local database. Later, to start the various daemons required, run:

$ ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \
               --remote=db:Open_vSwitch,Open_vSwitch,manager_options \
               --private-key=db:Open_vSwitch,SSL,private_key \
               --certificate=db:Open_vSwitch,SSL,certificate \
               --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert \
               --pidfile --detach
$ ovs-vsctl --no-wait init
$ ovs-vswitchd --pidfile --detach

Enabling TC Flower

By default, OVS is not going to use TC flower for its dataflows. In order to enable that, run:

$ ovs-vsctl set Open_vSwitch . other_config:hw-offload=true

Notice this enables OVS to use TC-flower, but it does not mean that those have to be offloaded; When enabled OVS would allow non-HW-offloaded TC rules to exist. If user wants to allow only the HW-offloaded TC rules to use the TC infrastructure and the rest to use regular dpif dataflow, then the default policy should be changed by running:

$ ovs-vsctl set Open_vSwitch . other_config:tc-policy=skip_sw

Do notice the configuration is persistent; Once configured, the configuration would be retained after stopping & restarting the daemons.

OVS bridges

The various ports we are interested in should be added to an OVS bridge. In this example, consider the following topology where the two peers are trying to pass basic ping traffic:

  /-------------------\    /---------------\
  |                   |    |               |
  |     Peer I        |    |    Switch     |
  |                   |    |               |
  | Interface A       |    |               |
  | 192.168.10.1/24   | -- |   enp3s0np49  |
  | e4:1d:2d:ca:c9:7a |    |               |
  |                   |    |               |
  \-------------------/    |               |
                           |               |
  /-------------------\    |               |
  |                   |    |               |
  |     Peer II       |    |               |
  |                   |    |               |
  | Interface B       |    |   enp3s0np51  |
  | 192.168.10.1/24   | -- |               |
  | e4:1d:2d:ca:c9:66 |    |               |
  |                   |    |               |
  \-------------------/    \---------------/

The two interfaces enp3s0np49 and enp3s0np51 are added to a new OVS bridge called br-int, then set to UP state by running:

$ ovs-vsctl add-br br-int
$ ovs-vsctl add-port br-int enp3s0np49
$ ovs-vsctl add-port br-int enp3s0np51
$ ip link set dev enp3s0np49 up
$ ip link set dev enp3s0np51 up

You have various ways of seeing the configured bridge topology, e.g.:

$ ovs-vsctl show
    Bridge br-int
        Port "enp3s0np49"
            tag: 1
            Interface "enp3s0np49"
        Port br-int
            Interface br-int
                type: internal
        Port "enp3s0np51"
            tag: 1
            Interface "enp3s0np51"

$ ovs-dpctl show
system@ovs-system:
        lookups: hit:120 missed:122 lost:0
        flows: 0
        masks: hit:318 total:0 hit/pkt:1.31
        port 0: ovs-system (internal)
        port 1: br-int (internal)
        port 2: enp3s0np49
        port 4: enp3s0np51

While ping is running, user can see the dp flows by running:

$ ovs-dpctl dump-flows
in_port(4),eth(src=e4:1d:2d:ca:c9:66,dst=e4:1d:2d:ca:c9:7a),eth_type(0x0806), packets:2, bytes:110, used:8.240s, actions:2
in_port(4),eth(src=e4:1d:2d:ca:c9:66,dst=e4:1d:2d:ca:c9:7a),eth_type(0x0800), packets:325, bytes:33150, used:0.051s, actions:2
in_port(2),eth(src=e4:1d:2d:ca:c9:7a,dst=e4:1d:2d:ca:c9:66),eth_type(0x0806), packets:4, bytes:220, used:8.240s, actions:4
in_port(2),eth(src=e4:1d:2d:ca:c9:7a,dst=e4:1d:2d:ca:c9:66),eth_type(0x0800), packets:325, bytes:33150, used:0.050s, actions:4

We can see ARPs and ICMP packets traveling from both peers.

After enabling TC-flower offload, we could see the filter offloaded as an ingress redirection of packets between the two ports:

$ tc filter show dev enp3s0np51 ingress
filter protocol 802.1Q pref 1 flower chain 0
filter protocol arp pref 2 flower chain 0
filter protocol arp pref 2 flower chain 0 handle 0x1
  dst_mac e4:1d:2d:ca:c9:7a
  src_mac e4:1d:2d:ca:c9:66
  eth_type arp
  in_hw
        action order 1: mirred (Egress Redirect to device enp3s0np49) stolen
        index 4 ref 1 bind 1

filter protocol ip pref 3 flower chain 0
filter protocol ip pref 3 flower chain 0 handle 0x1
  dst_mac e4:1d:2d:ca:c9:7a
  src_mac e4:1d:2d:ca:c9:66
  eth_type ipv4
  in_hw
        action order 1: mirred (Egress Redirect to device enp3s0np49) stolen
        index 2 ref 1 bind 1

$ tc filter show dev enp3s0np49 ingress
filter protocol arp pref 2 flower chain 0
filter protocol arp pref 2 flower chain 0 handle 0x1
  dst_mac e4:1d:2d:ca:c9:66
  src_mac e4:1d:2d:ca:c9:7a
  eth_type arp
  in_hw
        action order 1: mirred (Egress Redirect to device enp3s0np51) stolen
        index 3 ref 1 bind 1

filter protocol ip pref 3 flower chain 0
filter protocol ip pref 3 flower chain 0 handle 0x1
  dst_mac e4:1d:2d:ca:c9:66
  src_mac e4:1d:2d:ca:c9:7a
  eth_type ipv4
  in_hw
        action order 1: mirred (Egress Redirect to device enp3s0np51) stolen
        index 1 ref 1 bind 1

Offloading flows

Offloading explicit rules can be achieved by using ovs-ofctl. Continuing the previous example, assume the following is used:

$ ovs-ofctl add-flow br-int "ip,nw_dst=192.168.10.2 actions=drop"

The immediate effect would be that traffic would be stopped. Checking the current flows and offloaded tc actions we could see the drop action has been offloaded:

$ ovs-dpctl dump-flows
in_port(2),eth_type(0x0800),ipv4(dst=192.168.10.2), packets:38, bytes:3876, used:0.530s, actions:drop

$ tc filter show dev enp3s0np49 ingress
filter protocol arp pref 2 flower chain 0
filter protocol ip pref 3 flower chain 0
filter protocol ip pref 4 flower chain 0
filter protocol ip pref 4 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 192.168.10.2
  in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1

Unrelated traffic could still pass, e.g., if Peer II starts sending ARPs we will see that the explicit IP is still going to be dropped while ARPs are passing, and the offloaded filters reflect that:

$ ovs-dpctl dump-flows
in_port(4),eth(src=e4:1d:2d:ca:c9:66,dst=e4:1d:2d:ca:c9:7a),eth_type(0x0806), packets:296, bytes:16280, used:0.271s, actions:2
in_port(2),eth(src=e4:1d:2d:ca:c9:7a,dst=e4:1d:2d:ca:c9:66),eth_type(0x0806), packets:576, bytes:31680, used:0.270s, actions:4
in_port(2),eth_type(0x0800),ipv4(dst=192.168.10.2), packets:596, bytes:60792, used:0.270s, actions:drop

$ tc filter show dev enp3s0np49 ingress
filter protocol arp pref 2 flower chain 0
filter protocol arp pref 2 flower chain 0 handle 0x1
  dst_mac e4:1d:2d:ca:c9:66
  src_mac e4:1d:2d:ca:c9:7a
  eth_type arp
  in_hw
        action order 1: mirred (Egress Redirect to device enp3s0np51) stolen
        index 1 ref 1 bind 1

filter protocol ip pref 3 flower chain 0
filter protocol ip pref 4 flower chain 0
filter protocol ip pref 4 flower chain 0 handle 0x1
  eth_type ipv4
  dst_ip 192.168.10.2
  in_hw
        action order 1: gact action drop
         random type none pass val 0
         index 1 ref 1 bind 1

What can be offloaded?

For both the supported keys and action, we can only offload the intersection of what is supported by the driver and OVS. At the moment, the driver's supported keys can be part of the offloaded match-rule, while from the driver's supported actions only redirection and dropping of packets.

Clone this wiki locally