When you are running heavy loaded networking application you need to rely on high performances software design to distribute the load. In late 2009, Tom Herbert makes it happen by introducing software Receive Packet Steering (RPS) to distribute load across multiple processors. In 2010, he introduced Receive Flow Steering (RFS) for distributing load at socket layer. Finally in 2013, he introduced SO_REUSEPORT socket option to fully benefit RFS for UDP & TCP sockets.
Concretely, modern Linux networking applications are spawning multiple pthread, potentially bound to a specific CPU, and create AF_INET(6) UDP or TCP protocol listener via socket API. Using SO_REUSEPORT socket option one socket can listen on a UDP or TCP port on which another socket is already bound to. This simple syscall enable access to ingress packet load balancing.
Unfortunately this nice design is not available for PF_PACKET sockets, simply because RFS is hashing layer3+layer4 headers elements to steer packet across AF_INET(6) sockets. PF_SOCKET are low-level hook in the Linux kernel mostly used by applications like tcpdump to watch traffic. When an application creates multiple sockets via socket(PF_PACKET, SOCK_RAW, proto) then each socket will receive a copy of every packets matching proto. This built-in copy feature enable you to run multiple tcpdump on the same host, and then application need to filter out unrelated traffic. This is where BPF took place, providing a set of instructions to create filter to attach onto the socket to only grab pieces of interest.
Considering a layer2 application like an ethernet access-concentrator (BNG) or any other application on top of layer2. One simple design could be to create a PF_PACKET socket and re-distribute incoming packets across multiple pthread. This design introduce the need for synchronisation to feed each pthread processing queue which increase complexity of the application and create a bottleneck with this single ingress socket.
The proposed design here is an attempt to simplify application code complexity and increase performances by using the same design as RPS/RFS but applied to PF_PACKET sockets. An ethernet layer2 application will be offered as a PoC. Application will simply create multiple PF_PACKET sockets and process packet without any synchronisation needed by attaching an eBPF program filtering packet based on hashing ethernet source address.
This PoC code is an experiment candidate to be merged into GTP-Guard.
Proposed application will apply to following topology built with qemu :
-----------+-----------------------+----------------------
| eno1 | enp1s0
| (MAC: random) | (MAC: 52:54:00:84:d7:ff)
+----+-----+ +--+---+
| injector | | node |
+----------+ +------+
3 directories are available :
* inject/ : inject application
* src/ : bpf_pfpacket_rps application
* src/bpf/ : bpf_rps.bpf eBPF program
To build application run :
$ git clone --recursive [email protected]:acassen/bpf_pfpacket_rps.git
$ cd bpf_pfpacket_rps
$ make
$ ls bin/
bpf_pfpacket_rps bpf_rps.bpf inject
$
injector will run inject
and node will run bpf_pfpacket_rps + bpf_rps.bpf
This application is forging simple PPPoE packets. It uses destination ethernet address as specified via command line and source ethernet address is randomly generated.
This application spawn n pthread worker. Each worker creates one PF_PACKET socket and attach bpf_rps.bpf
program to it. Each worker have its own uniq id, monotonically incremented. bpf_rps.bpf
is exposing a BPF_MAP_TYPE_ARRAY used as local options set by worker to identify its id, max_number_of_worker & hash_algo. This eBPF program will then perform a hash over source ethernet address of ingress packet leading to PASSING or DROPPING decision according to local id matching.
$ ./inject -h
Usage: ./inject [OPTION...]
-i, --bind-interface Interface to bind to
-d, --dest-mac Destination MAC Address
-p, --pkt-count Number of packet to send
-h, --help Display this help message
$ sudo ./inject -i eno1 -d 52:54:00:84:d7:ff -p 25000
It will send out 25000 PPP packets over eno1 interface to remote ethernet address 52:54:00:84:d7:ff with randomly generated ethernet source address
$ ./bpf_pfpacket_rps -h
Usage: ./bpf_pfpacket_rps [OPTION...]
-i, --bind-interface Interface to bind to
-w, --worker-count Number of thread workers
-a, --hash-alg 0:mac_hash 1:mac_jhash 2:jhash_oaat
-b, --ebpf-rps eBPF RPS program
-h, --help Display this help message
$ sudo ./bpf_pfpacket_rps -i enp1s0 -b ./bpf_rps.bpf -w 8
#0:0000 #1:0000 #2:0000 #3:0000 #4:0000 #5:0000 #6:0000 #7:0000 (total:0 std_dev:0.000000)
#0:0000 #1:0000 #2:0000 #3:0000 #4:0000 #5:0000 #6:0000 #7:0000 (total:0 std_dev:0.000000)
#0:0232 #1:0237 #2:0262 #3:0246 #4:0230 #5:0233 #6:0236 #7:0246 (total:1922 std_dev:9.959292)
#0:1989 #1:1952 #2:1961 #3:1977 #4:1956 #5:1938 #6:1968 #7:1962 (total:15703 std_dev:14.563975)
#0:3165 #1:3155 #2:3061 #3:3157 #4:3123 #5:3109 #6:3116 #7:3114 (total:25000 std_dev:31.784430)
Default hash algorithm used is 0. Every second application is displaying packet distribution over workers. Standard deviation is calculated to evaluate hash algorithm accuracy.
Enjoy, Alexandre