Scaling to high-subscriber ISPs #132

rchac · 2022-10-19T21:30:09Z

rchac
Oct 19, 2022
Maintainer

I'm working to see how far LibreQoS can scale.
Based on lab tests, it seems to scale reasonably well (thanks XDP).
Some notes on current problems and solutions:

Filter limitations

The only hard limit I am aware of for scaling LibreQoS is the 32k entry limit for xdp-cpumap-tc and cpumap-pping. Hopefully once cpumap-pping is more solidified we can work on this, and submit an upstream PR to xdp-cpumap-tc.

Edit: Solved. Modify cpumap-pping file src/common_kern_user.h and edit the value of #define IP_HASH_ENTRIES_MAX. Run "make clean ; make" to rebuild. Reboot to clear bpf map.

Memory use seems to be a factor to keep in mind when using CAKE.

A VM with 4000 subscriber circuits utilizes:

fq_codel: 0.8GB of RAM, so about 0.0002 GB per subscriber.
CAKE: 8GB of RAM, so about 0.002 GB per subscriber.

We can extrapolate based on this that 50,000 subscribers should consume 10GB RAM usage with fq_codel or 100GB RAM usage with CAKE. Many commercial server configurations can support 256GB RAM or more, so ISPs around the 50k size can still consider using LibreQoS with CAKE.

v1.3 Improvements to help scale

HTB major:minor handle

HTB uses a hex handle for classes. It is two 16-bit hex values joined by a colon - major:minor (:). In LibreQoS, each CPU core uses a different major handle.

In v1.2 and prior, the minor handle was unique across all CPUs, meaning only 30k subscribers could be added total.

Starting with LibreQoS v1.3 - minor handles are counted independently by CPU core. With this change, the maximum possible subscriber qdiscs/classes goes from a hard limit of 30k to instead be 30k x CPU core count. So for a higher end system with a 64 core processor such as the AMD EPYC™ 7713P, that would mean ~1.9 million possible subscriber classes. Of course CPU use will be the bottleneck way before class handles are in that scenario. But at least we have that arbitrary 30k limit out of the way.

"Circuit ID" Unique Identifier

In order to improve queue reload time in v1.3, it was necessary to use a unique identifier for each circuit. I chose Circuit ID. It can be a number or string, it just needs to be unique between circuits, and the same for multiple devices in the same circuit. This allows us to avoid costly lookups when sorting through the queue structure.

If you have your own script creating ShapedDevices.csv - you could use your CRM's unique identifier for customer services / circuits to serve as this Circuit ID. The UISP and Splynx integrations already do this automatically.

Partial Queue Reload

In v1.2 and prior, the the entire queue structure had to be reloaded to make any changes. This led to a few milliseconds of packet loss for some clients each time that reload happened. The scheduled.py was set to reload all queues each morning at 4AM to avoid any potential disruptions that could theoretically cause.

Starting with v1.3 - LibreQoS tracks the state of the queues, and can do incremental changes without a full reload of all queues. Every 30 minutes - scheduler.py runs the CRM import, and runs a partial reload affecting just the queues that have changed. It still runs a full reload at 4AM.

thebracket · 2022-10-19T21:43:18Z

thebracket
Oct 19, 2022
Maintainer

The 32k limit in xdp-cpumap-tc and cpumap-pping is very easy to adjust, but it's a little tricky to have a "one size fits all" value. It probably needs to be tunable per installation (right now, it's one #define in the source code). It's tough to come up with a good "one-size fits all" number. Each entry is using 28 bytes of RAM (we've tried to keep it lean). So a full map of 32,767 entries is 917,476 bytes - or about 0.8 Mb. That's tiny - so plenty of room to grow. However - the smaller it is, the more likely it is that your CPU cache lines will hold it, which massively speeds up the lookup time. I'm assured that there's no technical problem with a larger number; doubling it to allow 64k entries would still only be 1.6 mb of RAM - it just might start to slow down if it's less likely to be in L1 cache. You can change it to test by opening src/common_kern_user.h and editing the value of #define IP_HASH_ENTRIES_MAX 32767. (It's around line 29) (and of course, run "make clean ; make" to rebuild)

…

On Wed, Oct 19, 2022 at 4:30 PM Robert Chacón ***@***.***> wrote: I'm working to see how far LibreQoS can scale. Based on lab tests, it seems to scale reasonably well (thanks XDP). Some notes on current problems and solutions: 1. Filter limitations The only hard limit I am aware of for scaling LibreQoS is the 32k entry limit for xdp-cpumap-tc and cpumap-pping <https://github.com/thebracket/cpumap-pping-hackjob>. Hopefully once cpumap-pping is more solidified we can work on this, and submit an upstream PR to xdp-cpumap-tc. 2. Memory use seems to be a factor to keep in mind when using CAKE. A VM with 4000 subscriber circuits utilizes: - fq_codel: 0.8GB of RAM, so about 0.0002 GB per subscriber. - CAKE: 8GB of RAM, so about 0.002 GB per subscriber. We can extrapolate based on this that 50,000 subscribers should consume 10GB RAM usage with fq_codel or 100GB RAM usage with CAKE. Many commercial server configurations can support 256GB RAM or more, so ISPs around the 50k size can still consider using LibreQoS with CAKE. 1. HTB major:minor handle HTB uses a hex handle for classes. It is two 16-bit hex values joined by a colon - major:minor (:). In LibreQoS, each CPU core uses a different major handle. In v1.2 and prior, the minor handle was unique across all CPUs, meaning only 30k subscribers can be added total. Starting with LibreQoS v1.3 - minor handles are counted independently by CPU core. With this change, the maximum possible subscriber qdiscs/classes goes from a hard limit of 30k to instead be 30k x CPU core count. So for a higher end system with a 64 core processor such as the AMD EPYC™ 7713P, that would mean ~1.9 million possible subscriber classes. Of course CPU use will be the bottleneck way before class handles are in that scenario. But at least we have that arbitrary 30k limit out of the way. 1. In order to improve queue reload time in v1.3, it was necessary to use a unique identifier for each circuit. I chose Circuit ID. It can be a number or string, it just needs to be unique between circuits, and the same for multiple devices in the same circuit. This allows us to avoid costly lookups when sorting through the queue structure. If you have your own script creating ShapedDevices.csv - you could use your CRM's unique identifier for customer services / circuits to serve as this Circuit ID. The UISP and Splynx integrations already do this automatically. — Reply to this email directly, view it on GitHub <#132>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADRU43YFHCXUVYS2SORJU6DWEBR6ZANCNFSM6AAAAAARJQ3NSU> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

2 replies

rchac Oct 23, 2022
Maintainer Author

I made the change to common_kern_user.h, but I'm wondering if there's somewhere else I need to change something.
I used a script to generate a ShapedDevices.csv with /32 hosts generated from one big subnet.
It seems to work for the first 30k entries but after that I see

rchac Oct 23, 2022
Maintainer Author

Awesome, after a reboot it worked with 50k entries! Thanks.

dtaht · 2022-10-20T15:53:27Z

dtaht
Oct 20, 2022
Collaborator

Thats a max limit for cake, not a fixed limit. The overall storage of packets (with the exception of malignant traffic) will remain a constant relative to the desired egress bandwidth, as a function of aiming for 5ms total queuing delay.

cake sets saner maximum memory defaults when you have the bandwidth parameter specified. The calculation we use is in the code. So leveraging htb and that calculation and passing that memlimit to the cake invocation. Now, I don't really know what the right number is at the moment as cake uses the slab size not the packet size... but the default 32MB limit was sized to 10Gbit, and the common openwrt limit of 4MB is too small for greater than, say 200MB traffic.

2 replies

rchac Oct 20, 2022
Maintainer Author

cake sets saner maximum memory defaults when you have the bandwidth parameter specified. The calculation we use is in the code. So leveraging htb and that calculation will also limit memory use.

Currently in LibreQoS when the CAKE qdisc is added to a class bandwidth is not set. I thought it would use the rates of the HTB.

command = 'qdisc add dev ' + interfaceB + ' parent ' + circuit['classMajor'] + ':' + circuit['classMinor'] + ' ' + fqOrCAKE
linuxTCcommands.append(command)

So if I'm understanding correctly, we need to also set the bandwidth parameter explicitly for the CAKE qdisc to avoid extra memory consumption?

dtaht Oct 20, 2022
Collaborator

no, the cake memlimit. You cannot use the cake bandwidth parameter and the htb bandwidth parameters at the same time.

thebracket · 2022-10-20T15:56:14Z

thebracket
Oct 20, 2022
Maintainer

So Cake has a limit on flow tracking as well? (Robert was referring to a constant in xdp-cpumap-tc, limiting the number of IP addresses it can track/assign to tc classifiers)

…

On Thu, Oct 20, 2022 at 10:53 AM Dave Täht ***@***.***> wrote: Thats a max limit for cake, not a fixed limit. The overall storage of packets (with the exception of malignant traffic) will remain a constant as a function of aiming for 5ms total queuing delay. — Reply to this email directly, view it on GitHub <#132 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADRU432JIRIHKV7T6G47RKLWEFTIDANCNFSM6AAAAAARJQ3NSU> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

dtaht · 2022-10-20T16:21:33Z

dtaht
Oct 20, 2022
Collaborator

https://github.com/dtaht/sch_cake/blob/master/sch_cake.c#L2682

It's a little more involved than this, in that you also have to hack codel's target below 4mbit.

0 replies

dtaht · 2022-10-20T16:22:59Z

dtaht
Oct 20, 2022
Collaborator

#define CAKE_SET_WAYS (8)
#define CAKE_MAX_TINS (8)
#define CAKE_QUEUES (1024)
#define CAKE_FLOW_MASK 63
#define CAKE_FLOW_NAT_FLAG 64

https://github.com/dtaht/sch_cake/blob/master/sch_cake.c#L85

0 replies

thebracket · 2022-10-20T16:57:54Z

thebracket
Oct 20, 2022
Maintainer

Huh, I interpreted that as a per-cake queue limit? Seems like a limit of 1024 queues would be a problem if it were global. From what I've seen, conntrack can handle many, many flows - so flow tracking isn't a problem. 1024 queues per Cake classifier would make sense; I seem to remember falling flat when I tried to use a single Cake queue (on a router) to try and smooth out a feed with a lot of devices (something like 9000!).

…

On Thu, Oct 20, 2022 at 11:23 AM Dave Täht ***@***.***> wrote: #define CAKE_SET_WAYS (8) #define CAKE_MAX_TINS (8) #define CAKE_QUEUES (1024) #define CAKE_FLOW_MASK 63 #define CAKE_FLOW_NAT_FLAG 64 https://github.com/dtaht/sch_cake/blob/master/sch_cake.c#L85 — Reply to this email directly, view it on GitHub <#132 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADRU4332R3BR6BUKSM2IHJLWEFWW5ANCNFSM6AAAAAARJQ3NSU> . You are receiving this because you commented.Message ID: ***@***.***>

0 replies

thebracket · 2022-10-23T16:56:09Z

thebracket
Oct 23, 2022
Maintainer

That's odd; you might need to reboot to rebuild the pinned maps? Otherwise, I'll try to debug in the morning.

…

On Sun, Oct 23, 2022, 11:35 AM Robert Chacón ***@***.***> wrote: I made the change to common_kern_user.h, but I'm wondering if there's somewhere else I need to change something. I used a script to generate a ShapedDevices.csv with /32 hosts generated from one big subnet. It seems to work for the first 30k entries but after that I see [image: image] <https://user-images.githubusercontent.com/22501920/197404131-7a05b185-30ac-496a-9005-cc1657ada093.png> — Reply to this email directly, view it on GitHub <#132 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADRU436LAEDPVGJCPFJRCELWEVSL7ANCNFSM6AAAAAARJQ3NSU> . You are receiving this because you commented.Message ID: ***@***.***>

1 reply

rchac Oct 23, 2022
Maintainer Author

That would make sense. I'll reboot to see if it resolves it. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling to high-subscriber ISPs #132

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 7 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Scaling to high-subscriber ISPs #132

rchac Oct 19, 2022 Maintainer

Filter limitations

Memory use seems to be a factor to keep in mind when using CAKE.

v1.3 Improvements to help scale

HTB major:minor handle

"Circuit ID" Unique Identifier

Partial Queue Reload

Replies: 7 comments · 5 replies

thebracket Oct 19, 2022 Maintainer

rchac Oct 23, 2022 Maintainer Author

rchac Oct 23, 2022 Maintainer Author

dtaht Oct 20, 2022 Collaborator

rchac Oct 20, 2022 Maintainer Author

dtaht Oct 20, 2022 Collaborator

thebracket Oct 20, 2022 Maintainer

dtaht Oct 20, 2022 Collaborator

dtaht Oct 20, 2022 Collaborator

thebracket Oct 20, 2022 Maintainer

thebracket Oct 23, 2022 Maintainer

rchac Oct 23, 2022 Maintainer Author

rchac
Oct 19, 2022
Maintainer

Replies: 7 comments 5 replies

thebracket
Oct 19, 2022
Maintainer

rchac Oct 23, 2022
Maintainer Author

rchac Oct 23, 2022
Maintainer Author

dtaht
Oct 20, 2022
Collaborator

rchac Oct 20, 2022
Maintainer Author

dtaht Oct 20, 2022
Collaborator

thebracket
Oct 20, 2022
Maintainer

dtaht
Oct 20, 2022
Collaborator

dtaht
Oct 20, 2022
Collaborator

thebracket
Oct 20, 2022
Maintainer

thebracket
Oct 23, 2022
Maintainer

rchac Oct 23, 2022
Maintainer Author