Skip to content
Nir Dotan edited this page Aug 7, 2018 · 25 revisions
Table of Contents
  1. TC Flower
    1. Supported Keys
    2. Supported Actions
    3. Drop Action Example Usage
    4. Pass Action Example Usage
    5. Trap Action Example Usage
    6. Multi-table/Multi-chain Support
    7. Chain Templates Support
    8. Mirred Action Example Usage
    9. Shared Blocks Support
    10. More Examples
  2. Further Resources

TC Flower

It is possible to offload TC flower rules with a limited set of keys and actions to netdevs which represent mlxsw ports.

Before configuring match rules on enp3s0np1, one must first create the queueing disciplines (qdiscs) to which the flower classifier is attached.

Note: Offloading is not yet supported for soft-netdevs (e.g. bridge, bond, VLAN) or the management port.

Note: For now, offloading is only supported for netdevs which are bridged or have an IPv4 address assigned.

In order to prepare for the addition of flower rules, either add the ingress qdisc or clsact qdisc to enp3s0np1:

$ tc qdisc add dev enp3s0np1 ingress

Or:

$ tc qdisc add dev enp3s0np1 clsact

The benefit of clsact qdisc is that it can be used for insertion of not only ingress rules, but also egress rules.

The rest of the examples here use the ingress qdisc. To see more examples using clsact qdisc, please see the More Examples section.

Supported Keys

  • protocol (ethertype) [4.11]
  • src_mac [4.11]
  • dst_mac [4.11]
  • src_ip (both IPv4 and IPv6) [4.11]
  • dst_ip (both IPv4 and IPv6) [4.11]
  • ip_proto ("tcp" and "udp") [4.11]
  • src_port [4.11]
  • dst_port [4.11]
  • vlan_prio [4.12]
  • vlan_id (ingress direction) [4.12]
  • tcp_flags [4.13]
  • ip_ttl [4.14]
  • ip_tos [4.14]

Note: Packets arriving without 802.1q TCI, or ones which are only priority-tagged, are assigned a bridge PVID by the hardware. Thus, a flower match on a vlan_id of PVID will match untagged packets as well.

Supported Actions

  • drop [4.11]
  • mirred egress redirect (forward) [4.11]
  • mirred egress mirror [4.16]
  • vlan modify [4.12]
  • trap [4.13]
  • goto chain [4.14]
  • pass [4.15]

Note: Packets which arrive without 802.1q TCI, or which are only priority-tagged, are assigned a bridge PVID by the hardware. Thus, a "vlan modify" to a non-PVID tag apparently pushes a VLAN tag on such packet, and likewise "vlan modify" to a PVID tag pops it. That is unlike the software pipeline, where "vlan modify" is only meaningful on packets which are already 802.1q-tagged.

Drop Action Example Usage

$ tc filter add dev enp3s0np1 parent ffff: protocol ipv6 pref 2 flower skip_sw src_ip fe01::1 action drop

This would add a rule with priority 2 matching every IPv6 packet with the source address fe01::1. The selected action is drop. Note the parameter skip_sw which instructs TC to skip the insertion of the rule to the kernel's datapath. If this keyword is omitted, the rule is inserted in both the kernel and HW.

To see a list of inserted rules, run:

$ tc filter show dev enp3s0np1 root

In order to observe statistics related to packets, bytes transmitted, or last time used, which are maintained on a per rule basis, add the -s flag:

$ tc -s filter show dev enp3s0np1 root

Pass Action Example Usage

$ tc filter add dev enp3s0np1 parent ffff: protocol ipv6 pref 2 flower skip_sw src_ip fe01::1 action pass

This adds a rule with priority 2 matching every IPv6 packet with the source address fe01::1. The selected action is pass. The result is that matching packets are accepted and processing of further filters is avoided.

Trap Action Example Usage

$ tc filter add dev enp3s0np1 parent ffff: protocol ipv6 pref 2 flower skip_sw src_ip fe01::1 action trap

This adds a rule with priority 2 matching every IPv6 packet with the source address fe01::1. The selected action is trap.

This rule insertion instructs the hardware to send matched packets to the kernel which may then perform further analysis on them. They appear as if they come from device enp3s0np1.

Multi-table/Multi-chain Support

TC rules (filters) are put together into chains by order of priority (pref). Each chain can be looked at as a table of rules.

To insert a rule into a specific chain, one has to use the chain parameter:

$ tc filter add dev enp3s0np1 parent ffff: protocol ip chain 100 pref 10 flower skip_sw dst_ip 192.168.101.1 action drop

In this example, we added the rule into chain 100. If the chain parameter is omitted, the default chain 0 is assumed. Chain 0 is also the chain which is always processed first. If we want other chains to be processed, we have to use the action goto chain:

$ tc filter add dev enp3s0np1 parent ffff: protocol ip pref 10 flower skip_sw dst_ip 192.168.101.1 action goto chain 100

If a chain does not exist before a filter is added, it is implicitly created. Similarly, after the last filter is removed, implicitly created chains are destroyed. It is possible to explicitly create and destroy chains.

To create chain 11, run the following command:

$ tc chain add dev enp3s0np1 ingress chain 11

To list existing chains, run:

$ tc chain show dev enp3s0np1 ingress
chain parent ffff: chain 11

To destroy chain 11, run:

$ tc chain del dev enp3s0np1 ingress chain 11

Note: The above command will will delete both implicitly and explicitly created chains along with any existing filters.

Chain Templates Support

For filter insertions to chains, the mlxsw driver needs to hold a magic ball. With the first inserted rule into hardware it needs to guess all the fields that are going to be used for the matching in the chain. If later on this guess proves to be wrong and user adds a filter with different fields to match, there is a problem. mlxsw resolves it now with couple of predefined patterns. Those try to cover as many match fields as possible. This approach is far from optimal, both performance-wise and scale-wise. Also, the insertion of certain filters might fail, depending on the insertion order.

Most of the time, when user inserts filters in chain, he knows how the filters are going to look like in advance - what type and option will they have. For example, it is possible that the user knows that only filters of type flower matching on destination IP are required. The user can specify a template that would cover all the filters which are going to be inserted in the chain.

The template is passed along during the chain creation like this:

$ tc chain add dev enp3s0np1 ingress proto ip chain 11 flower dst_ip 0.0.0.0/16

The template is then shown when listing chains:

$ tc chain show dev enp3s0np1 ingress
chain parent ffff: flower chain 11
  eth_type ipv4
  dst_ip 0.0.0.0/16

Addition of filters that fit the template will be successful:

$ tc filter add dev enp3s0np1 ingress proto ip chain 11 flower dst_ip 10.0.0.1/8 action drop

Addition of filters that do not fit the template will fail:

$ tc filter add dev enp3s0np1 ingress proto ip chain 11 flower dst_ip 10.0.0.1/24 action drop
Error: cls_flower: Mask does not fit the template.
We have an error talking to the kernel, -1

Mirred Action Example Usage

$ tc filter add dev enp3s0np1 parent ffff: protocol ipv6 pref 2 flower skip_sw src_ip fe01::1 action mirred egress (mirror|redirect) dev enp3s0np2

This adds a rule with priority 2 matching every IPv6 packet with the source address fe01::1. The selected action is mirred.

This rule insertion instructs the hardware to redirect/mirror matched packet to the specified interface, enp3s0np2 in the example.

Shared Blocks Support

By default, each qdisc has its own group of chains (each contains filters). This group of chains is called block. For example for ingress qdisc the mapping between netdev:qdisc:block is 1:1:1.

But consider a case when you have 2 netdevices, you create ingress qdisc on both. Now if you want to add identical set of filter rules to both, you need to add them twice. One for each netdev:qdisc:block. That is of course doable, but when the filters are offloaded to TCAM with limited number of entries, the duplications may become a scale issue. Sharing of blocks aims to resolve that.

In order to ask kernel to share blocks, one has to indicate so during qdisc creation:

$ tc qdisc add dev enp3s0np1 ingress_block 22 ingress
$ tc qdisc add dev enp3s0np2 ingress_block 22 ingress

These two commands added ingress qdiscs to both netdevices. Note the ingress_block option that indicates that both qdiscs should share the same block identified by index 22. It is up to the user to choose the block index.

If you list the existing qdiscs, you see the block sharing info in the output:

$ tc qdisc
qdisc ingress ffff: dev enp3s0np1 parent ffff:fff1 ingress_block 22
qdisc ingress ffff: dev enp3s0np2 parent ffff:fff1 ingress_block 22

To make it more visual, the situation looks like this:

   enp3s0np1 ingress qdisc            enp3s0np2 ingress qdisc
              |                                  |
              |                                  |
              +---------->  block 22  <----------+

There is no limitation in number of qdiscs that can share the same block.

Once the qdisc block is shared, it is no longer possible to manipulate the filters using the qdisc handle. One has to rather use the block index as a handle:

$ tc filter add block 22 protocol ip pref 25 flower dst_ip 192.168.0.0/16 action drop

Aside of the ingress qdisc, the block sharing is also supported for clsact qdisc. For that, user can decide to share ingress and egress block:

$ tc qdisc add dev enp3s0np3 ingress_block 23 egress_block 24 clsact

More Examples

$ tc filter add dev enp3s0np1 parent ffff: protocol ip pref 20 flower skip_sw dst_mac f4:52:14:10:df:92 action mirred egress redirect dev enp3s0np19
$ tc filter add dev enp3s0np1 parent ffff: protocol ipv6 pref 10 flower skip_sw dst_ip fe01::3 ip_proto tcp dst_port 3333 action drop
$ tc filter add dev enp3s0np1 parent ffff: protocol 802.1q flower vlan_id 95 skip_sw action drop
$ tc filter add dev enp3s0np1 parent ffff: protocol all flower action vlan modify id 85

Using clsact qdisc:

$ tc filter add dev enp3s0np1 ingress protocol ip pref 10 flower skip_sw dst_ip 192.168.101.1 action trap
$ tc filter add dev enp3s0np1 egress protocol ip pref 10 flower skip_sw dst_ip 192.168.101.3 action drop

Further Resources

  1. man tc
  2. man tc-flower
  3. QoS in Linux with TC and Filters by Phil Sutter (part of iproute documentation)
Clone this wiki locally