High CPU usage when running Katran in shared mode with bonding interface #235

tantm3 · 2024-09-05T15:54:46Z

Hi everyone!

I am currently running Katran as a L3 Director load balancer for our services.
I would like to run Katran with a bonding interface because I believe it's easier to add more network interfaces rather than servers for scaling Katran's workload.
I followed this issue (#13) and make the Katran work normally in shared mode and bonding interface with those command:

# Network config
1: lo: ...
2: ens2f0: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 xdp/id:1900 qdisc mq master bond0 state UP group default qlen 1000
    link/ether 02:96:77:09:2b:73 brd ff:ff:ff:ff:ff:ff permaddr d4:f5:ef:36:1a:60
    altname enp55s0f0
3: ens2f1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 xdp/id:1905 qdisc mq master bond0 state UP group default qlen 1000
    link/ether 02:96:77:09:2b:73 brd ff:ff:ff:ff:ff:ff permaddr d4:f5:ef:36:1a:68
    altname enp55s0f1
4: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
5: ipip0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
6: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN group default qlen 1000
    link/tunnel6 :: brd :: permaddr 16dd:8fa:927c::
7: ipip60@NONE: <NOARP,UP,LOWER_UP> mtu 1452 qdisc noqueue state UNKNOWN group default qlen 1000
    link/tunnel6 :: brd :: permaddr 42d6:82ed:7cf5::
    inet6 fe80::40d6:82ff:feed:7cf5/64 scope link 
       valid_lft forever preferred_lft forever
8: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 02:96:77:09:2b:73 brd ff:ff:ff:ff:ff:ff
    inet 10.50.73.53/24 brd 10.50.73.255 scope global bond0
       valid_lft forever preferred_lft forever
    inet6 fe80::96:77ff:fe09:2b73/64 scope link 
       valid_lft forever preferred_lft forever
## for xdp root I edited the install_xdproot.sh script.
## And I run just one command to add Katran loadbalancer xdp program
sudo ./build/example_grpc/katran_server_grpc -balancer_prog ./deps/bpfprog/bpf/balancer.bpf.o -default_mac 58:e4:34:56:46:e0  -forwarding_cores=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19 -numa_nodes=0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1 -healthchecker_prog ./deps/bpfprog/bpf/healthchecking_ipip.o -intf=ens2f0 -ipip_intf=ipip0 -ipip6_intf=ipip60 -lru_size=100000 -map_path /sys/fs/bpf/jmp_ens2 -prog_pos=2
## Katran VIP + REAL config
2024/09/08 04:23:39 vips len 1
VIP:        49.213.85.151 Port:     80 Protocol: tcp
Vip's flags: 
 ->49.213.85.171     weight: 1 flags: 
exiting

I do mapping irq with cpu:

i40e-ens2f0-TxRx-0(45) is affinitive with  00,00000001 (from CPU1 .....)
i40e-ens2f0-TxRx-1(46) is affinitive with  00,00000002
i40e-ens2f0-TxRx-2(47) is affinitive with  00,00000004
i40e-ens2f0-TxRx-3(48) is affinitive with  00,00000008
i40e-ens2f0-TxRx-4(49) is affinitive with  00,00000010
i40e-ens2f0-TxRx-5(50) is affinitive with  00,00000020
i40e-ens2f0-TxRx-6(51) is affinitive with  00,00000040
i40e-ens2f0-TxRx-7(52) is affinitive with  00,00000080
i40e-ens2f0-TxRx-8(53) is affinitive with  00,00000100
i40e-ens2f0-TxRx-9(54) is affinitive with  00,00000200
i40e-ens2f1-TxRx-0(95) is affinitive with  00,00000400
i40e-ens2f1-TxRx-1(96) is affinitive with  00,00000800
i40e-ens2f1-TxRx-2(97) is affinitive with  00,00001000
i40e-ens2f1-TxRx-3(98) is affinitive with  00,00002000
i40e-ens2f1-TxRx-4(99) is affinitive with  00,00004000
i40e-ens2f1-TxRx-5(100) is affinitive with  00,00008000
i40e-ens2f1-TxRx-6(101) is affinitive with  00,00010000
i40e-ens2f1-TxRx-7(102) is affinitive with  00,00020000
i40e-ens2f1-TxRx-8(103) is affinitive with  00,00040000
i40e-ens2f1-TxRx-9(104) is affinitive with  00,00080000 (to CPU 20)

The problem arose when I saw the Katran's statics, it's shown 100% lru miss rate

##katran_goclient -s -lru
summary: 6380747 pkts/sec. lru hit: 0.00% lru miss: 100.00% (tcp syn: 1.00% tcp non-syn: 0.00% udp: 0.00%) fallback lru hit: 0 pkts/sec
summary: 6668858 pkts/sec. lru hit: -0.00% lru miss: 100.00% (tcp syn: 1.00% tcp non-syn: 0.00% udp: 0.00%) fallback lru hit: 0 pkts/sec
summary: 6657124 pkts/sec. lru hit: 0.00% lru miss: 100.00% (tcp syn: 1.00% tcp non-syn: 0.00% udp: -0.00%) fallback lru hit: 0 pkts/sec

And all 20 CPUs are consumed by ksoftirqd
Here is the screenshot that shown the output of perf report:

I am not sure this performance issue is relating to Katran or not. So, I post this question here to find some clues.

Feel free to ask me to provide more information!

The text was updated successfully, but these errors were encountered:

tantm3 · 2024-09-09T07:37:11Z

I updated one more test case from my research.
I want to know whether bonding interface cause the overload of CPU usage, so, I removed the bonding interface and ran Katran shared mode directly in two physical interfaces.

Here is my network config

1: lo: ....
2: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp/id:2118 qdisc mq state UP group default qlen 1000
    link/ether d4:f5:ef:ac:ac:f0 brd ff:ff:ff:ff:ff:ff
    altname enp18s0f0
    inet 10.50.73.55/24 brd 10.50.73.255 scope global ens1f0
       valid_lft forever preferred_lft forever
    inet6 fe80::d6f5:efff:feac:acf0/64 scope link 
       valid_lft forever preferred_lft forever
3: ens1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp/id:2123 qdisc mq state UP group default qlen 1000
    link/ether d4:f5:ef:ac:ac:f8 brd ff:ff:ff:ff:ff:ff
    altname enp18s0f1
    inet 10.50.73.52/24 brd 10.50.73.255 scope global ens1f1
       valid_lft forever preferred_lft forever
    inet6 fe80::d6f5:efff:feac:acf8/64 scope link 
       valid_lft forever preferred_lft forever
....
8: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
9: ipip0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
10: ip6tnl0@NONE: <NOARP> mtu 1452 qdisc noop state DOWN group default qlen 1000
    link/tunnel6 :: brd :: permaddr f27f:220f:4e91::
11: ipip60@NONE: <NOARP,UP,LOWER_UP> mtu 1452 qdisc noqueue state UNKNOWN group default qlen 1000
    link/tunnel6 :: brd :: permaddr baba:33a1:a4e6::
    inet6 fe80::b8ba:33ff:fea1:a4e6/64 scope link 
       valid_lft forever preferred_lft forever

Command that I used to run Katran

sudo ./build/example_grpc/katran_server_grpc -balancer_prog ./deps/bpfprog/bpf/balancer.bpf.o -default_mac 58:e4:34:56:46:e0 -forwarding_cores=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19 -numa_nodes=0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1 -healthchecker_prog ./deps/bpfprog/bpf/healthchecking_ipip.o -intf=ens1f0 -ipip_intf=ipip0 -ipip6_intf=ipip60 -lru_size=1000000 -map_path /sys/fs/bpf/jmp_ens1 -prog_pos=2

Here is CPU mapping information

i40e-ens1f0-TxRx-0(44) is affinitive with  00,00000001
i40e-ens1f0-TxRx-1(45) is affinitive with  00,00000002
i40e-ens1f0-TxRx-2(46) is affinitive with  00,00000004
i40e-ens1f0-TxRx-3(47) is affinitive with  00,00000008
i40e-ens1f0-TxRx-4(48) is affinitive with  00,00000010
i40e-ens1f0-TxRx-5(49) is affinitive with  00,00000020
i40e-ens1f0-TxRx-6(50) is affinitive with  00,00000040
i40e-ens1f0-TxRx-7(51) is affinitive with  00,00000080
i40e-ens1f0-TxRx-8(52) is affinitive with  00,00000100
i40e-ens1f0-TxRx-9(53) is affinitive with  00,00000200
i40e-ens1f1-TxRx-0(103) is affinitive with  00,00000400
i40e-ens1f1-TxRx-1(104) is affinitive with  00,00000800
i40e-ens1f1-TxRx-2(105) is affinitive with  00,00001000
i40e-ens1f1-TxRx-3(106) is affinitive with  00,00002000
i40e-ens1f1-TxRx-4(107) is affinitive with  00,00004000
i40e-ens1f1-TxRx-5(108) is affinitive with  00,00008000
i40e-ens1f1-TxRx-6(109) is affinitive with  00,00010000
i40e-ens1f1-TxRx-7(110) is affinitive with  00,00020000
i40e-ens1f1-TxRx-8(111) is affinitive with  00,00040000
i40e-ens1f1-TxRx-9(112) is affinitive with  00,00080000

Katran's stats:
CPU Usage is also full
Here is the output from perf command
watching with perf top
Recording and perf report

There is a little performance improvement when running with physical interfaces (according to the output from Katran), but the CPU usage is still full.

tantm3 · 2024-09-23T01:37:25Z

Hi @avasylev @tehnerd,

Could you guys share some thoughts on my setup?
I still struggle with this.

tehnerd · 2024-09-24T20:45:37Z

sudo sysctl -a | grep bpf ?

tantm3 · 2024-09-25T00:35:11Z

Hi @tehnerd,

Here is the output:

tehnerd · 2024-09-26T01:31:20Z

Hmm. Strange.
Please collect
1 perf record -a -F 23 -- sleep 10
2 same perf as before (in your previous pictures; I guess you were using -g as well). When looking into report move to balancers bpf program and use 'a' shortcut. That would show assembly code so we would understand where exactly in bpf program we consume cpu

tehnerd · 2024-09-26T01:32:45Z

Also how the traffic pattern looks like ? Are they real tcp streams or just random packets

tantm3 · 2024-09-26T06:18:50Z

Please collect 1 perf record -a -F 23 -- sleep 10 2 same perf as before (in your previous pictures; I guess you were using -g as well)

I have a little trouble getting assembly code from perf report.
With bpf program, it shows an error

With other processes, I used a shortcut and it returned the output like this one

Anyway, I share all the output that I collected from those commands.
perf record -a -F 23 -- sleep 10

Assembly code when I jump (press enter) deeper-dive in katran's bpf program

perf record -ag -- sleep 20

perf top --sort comm,dso

Again, it shows an error when I press a shortcut to the bpf program

tantm3 · 2024-09-26T06:27:20Z

Also how the traffic pattern looks like ? Are they real tcp streams or just random packets

I used Pktgen to generate traffic, here is the configuration:
In short, I want to stimulate the syn flood packet and send it to Katran. I used xdpdump tool to capture the packet and the packets look like this:

06:24:05.724378 IP 49.213.85.169.14397 > 49.213.85.152.80: Flags [S], seq 74616:74622, win 8192, length 6: HTTP
06:24:05.724396 IP 49.213.85.169.14997 > 49.213.85.152.80: Flags [S], seq 74616:74622, win 8192, length 6: HTTP
06:24:05.724401 IP 49.213.85.169.14984 > 49.213.85.152.80: Flags [S], seq 74616:74622, win 8192, length 6: HTTP

Here is the Katran configuration

katran_goclient -l -server localhost:8080
2024/09/26 06:26:30 vips len 2
VIP:        49.213.85.153 Port:     80 Protocol: tcp
Vip's flags:
 ->49.213.85.171     weight: 1 flags:
VIP:        49.213.85.152 Port:     80 Protocol: tcp
Vip's flags:
 ->49.213.85.171     weight: 1 flags:
exiting

UPDATE:

When trying to figure out the cause, I stopped at this thread which describes the NIC driver i40e drop packet at a rate 10Mpps: https://www.spinics.net/lists/xdp-newbies/msg01918.html#google_vignette
mine server have a similar NIC driver:

driver: i40e
version: 5.15.0-122-generic
firmware-version: 10.53.7
expansion-rom-version:
bus-info: 0000:37:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

And I saw a lot of package drops at the rx flow

I think this is the important information that may show Katran is not causing high CPU usage.

tehnerd · 2024-09-26T16:55:29Z

Feels like bpf program is not jitted. Could you please run bpftool prog list and bpftool map list

tehnerd · 2024-09-26T17:38:24Z

Also in perf report (which was taken with -ag)
Please show the output filtered with bpf keyword (/ and than type bpf)

tantm3 · 2024-09-27T01:34:32Z

Yes sure, here is the output of those commands
bpftool prog list

bpftool map list

perf report with the output filtered with bpf keyword

UPDATE
Here is the interface tag, it seems like the bpf program is jitted, at least from the outside view, hope this information is helpful.

tantm3 · 2024-10-02T01:18:12Z

Hi @tehnerd,

Would you happen to have any updates on this issue?
Feel free to ask me to provide more information or do some tests!

tehnerd · 2024-10-03T20:41:19Z

No idea. For some reason bpf program seems slow in bpf code itself. At this point the only idea is to build perf with bpf support (link against that library which is required to disassembly bpf) and to check where that cpus is spent inside bpf.

tantm3 · 2024-10-04T01:56:48Z

You mention that it feels like the bpf program is not jitted.
But, according to your answer, it does not seem to be the case here, right?
So, the next step is building perf with bpf support and checking inside the bpf program.

tantm3 · 2024-10-18T04:10:34Z

Hi @tehnerd,

It has been a while and I finally got the assembly code output inside the Katran bpf loadbalance program.
Command: perf record -a -F 23 -- sleep 10

the point in the assembly code that shows red sign

Command: perf record -ag -- sleep 20

the point in the assembly code that shows red sign

I attached the content of the perf report --stdio as a zip file here, in case it helps
perf.zip

Could you please take a look at that?

tehnerd · 2024-10-18T05:02:48Z

All of that is memory accesses. It feels like there is some issue with it. Either slow memory or system is low on memory. Or tlb is trashed. What the environment looks like ? Is it vm or not? What the memory? How much ? How much free? Would be nice to see perf counters for memory accesses (stalled front end backend cycles. Tlb stats)

tantm3 · 2024-10-18T06:39:04Z

Here is the memory information from my server.

Is it vm or not? What the memory? How much ? How much free?

the server is physical not a VM
It has 64GB of memory and there is no significant increase in the amount of used memory when the traffic arrived, maybe the user-space tool can not catch that.
Here is the numa node information
the perf counters for memory accesses are here, hope that it all the counters that you need
UPDATE: Perf counter CPU cycle

tantm3 · 2024-10-23T02:59:37Z

Hi @tehnerd,

Did you find any clues from the memory stats?
Feel free to ask me to provide more information!

tehnerd · 2024-10-23T21:04:21Z

Can you run perf to collect counters for "cycles,stalled-cycles-fronted,stalled-cycles-backend". The only explanation that make sense for mov to /
From memory to be high on cpu is for some reason to slow memory access. Which would be indicated by high value of stalled cycles

tantm3 · 2024-10-24T04:26:16Z

It's quite a challenge to collect those counters because my CPU model is a Cascade Lake microarchitecture, and it seems that stalled-cycles-fronted,stalled-cycles-backend counters are not supported.

Model name: Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz
Model number: 85
The Intel Xeon Silver 4210R belongs to the Cascade Lake

I look into kernel event code: arch/x86/events/intel/core.c kernel version 5.15 https://mirrors.edge.kernel.org/pub/linux/kernel/v5.x/linux-5.15.tar.gz and captured the counters by events mask, so the outputs are here:

Perf in the normal scenario

Perf in high network workload (syn flood)

swettoth0812 · 2024-10-25T03:07:04Z

I found the IPC indicator which is useful to identify CPU stalled. (this blog is beneficial: https://www.brendangregg.com/blog/2017-05-09/cpu-utilization-is-wrong.html)
Here is the IPC measurement from my server:

The value is around 0.18 and 0.23 which is quite low and the command Interpretation from the blog shows that the low IPC value is related to low memory I/O:

If your IPC is < 1.0, you are likely memory stalled, and software tuning strategies include reducing memory I/O, and improving CPU caching and memory locality, especially on NUMA systems. Hardware tuning includes using processors with larger CPU caches, and faster memory, busses, and interconnects.

I suppose that focusing on memory tuning can increase performance but I have no clues on that yet.
Could you show me some configurations that help optimize the memory performance?

tehnerd · 2024-10-25T03:17:21Z

Yeah. Stalls are way too hard. And they are not even on map accesses. Anyway I think there is some issue with hardware? Do you have any other test machine to run ?

swettoth0812 · 2024-10-25T03:46:52Z

Actually, I have another test machine but the hardware specs are mostly the same except the network driver.
The server in this discussion uses the network driver ixgbe.
I have another one with driver i40e and this is some performance output in this machine

The CPU util is still full

From your perspective, which is the hardware spec you want to change in this case?

tehnerd · 2024-10-30T21:23:56Z

That seems pretty strange. I do not think that it is related to NIC. Seems like some strange memory related hardware issue/ specific. I will post later today how to run it, but wonder what are the results of synthetic load tests would looks like. They would test just bpf code itself. From
Perf it looked like that the issue was visible inside it.

tantm3 changed the title ~~How to tunning performance Katran shared mode with bonding interface?~~ High CPU usage when running Katran in shared mode with bonding interface Sep 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

High CPU usage when running Katran in shared mode with bonding interface #235

High CPU usage when running Katran in shared mode with bonding interface #235

tantm3 commented Sep 5, 2024 •

edited

Loading

tantm3 commented Sep 9, 2024

tantm3 commented Sep 23, 2024

tehnerd commented Sep 24, 2024

tantm3 commented Sep 25, 2024

tehnerd commented Sep 26, 2024

tehnerd commented Sep 26, 2024

tantm3 commented Sep 26, 2024

tantm3 commented Sep 26, 2024 •

edited

Loading

tehnerd commented Sep 26, 2024

tehnerd commented Sep 26, 2024

tantm3 commented Sep 27, 2024 •

edited

Loading

tantm3 commented Oct 2, 2024

tehnerd commented Oct 3, 2024

tantm3 commented Oct 4, 2024

tantm3 commented Oct 18, 2024

tehnerd commented Oct 18, 2024

tantm3 commented Oct 18, 2024 •

edited

Loading

tantm3 commented Oct 23, 2024

tehnerd commented Oct 23, 2024

tantm3 commented Oct 24, 2024

swettoth0812 commented Oct 25, 2024

tehnerd commented Oct 25, 2024

swettoth0812 commented Oct 25, 2024

tehnerd commented Oct 30, 2024

High CPU usage when running Katran in shared mode with bonding interface #235

High CPU usage when running Katran in shared mode with bonding interface #235

Comments

tantm3 commented Sep 5, 2024 • edited Loading

tantm3 commented Sep 9, 2024

tantm3 commented Sep 23, 2024

tehnerd commented Sep 24, 2024

tantm3 commented Sep 25, 2024

tehnerd commented Sep 26, 2024

tehnerd commented Sep 26, 2024

tantm3 commented Sep 26, 2024

tantm3 commented Sep 26, 2024 • edited Loading

tehnerd commented Sep 26, 2024

tehnerd commented Sep 26, 2024

tantm3 commented Sep 27, 2024 • edited Loading

tantm3 commented Oct 2, 2024

tehnerd commented Oct 3, 2024

tantm3 commented Oct 4, 2024

tantm3 commented Oct 18, 2024

tehnerd commented Oct 18, 2024

tantm3 commented Oct 18, 2024 • edited Loading

tantm3 commented Oct 23, 2024

tehnerd commented Oct 23, 2024

tantm3 commented Oct 24, 2024

swettoth0812 commented Oct 25, 2024

tehnerd commented Oct 25, 2024

swettoth0812 commented Oct 25, 2024

tehnerd commented Oct 30, 2024

tantm3 commented Sep 5, 2024 •

edited

Loading

tantm3 commented Sep 26, 2024 •

edited

Loading

tantm3 commented Sep 27, 2024 •

edited

Loading

tantm3 commented Oct 18, 2024 •

edited

Loading