BBR (Bottleneck Bandwidth and RTT) is TCP congestion control. It computes the sending rate based on the delivery rate (throughput) estimated from ACKs.
BBR can significantly increase throughput and reduce latency for TCP connections:
- up to 25% on Dobrohost production setup
- "up to 22% improvement" at AWS
- "up to 2,700x higher" - Google Cloud blogpost
- "improves performance by 21%" at Gumlet
- "reduced median round-trip-time (RTT) of YouTube by 80%" - link
BBR is good for:
- Resilience to random loss (e.g. from shallow buffers):
- Low latency with the bloated buffers common in today’s last-mile links
BBR requires only changes on the sender side, not in the network or the receiver side.
BBRv1 is available from 4.9 kernel (with some issues) and is ok to use starting from approximately 4.13. It's good with Ubuntu 5.15 HWE kernel.
BBRv1 is known to be unfair to other loss-based congestion algorithms and BBR traffic can dominate over non-BBR traffic in network. BBR can obtain more than 90% of the total bandwidth:
- https://blog.apnic.net/2020/01/10/when-to-use-and-not-use-bbr/
- https://www.ietf.org/proceedings/97/slides/slides-97-iccrg-bbr-congestion-control-02.pdf
- https://www.uio.no/studier/emner/matnat/ifi/INF5072/v18/inf5072_example1.pdf
BBR requires pacing. The Linux
fq_codel
qdisc does not implement pacing, sofq_codel
would not be sufficient.So the note in the
tcp_bbr.c
code is still current and complete:NOTE: BBR must be used with the fq qdisc ("
man tc-fq
") with pacing enabled, since pacing is integral to the BBR design and implementation. BBR without pacing would not function properly, and may incur unnecessary high packet loss rates.from https://groups.google.com/g/bbr-dev/c/4jL4ropdOV8/m/GyndlPWpAAAJ
Update: this can be no longer true, as https://wiki.geant.org/pages/releaseview.action?pageId=121340614 states:
Linux 4.13 and above: In May 2017, Éric Dumazet submitted a patch to implement pacing in TCP itself, removing the dependency on the fq scheduler. This makes BBR is simpler to enable, and allows its use together with other schedulers (such as the popular fq_codel).
There are multiple BBR variants and modifications, but most of them are not actual as of 2022:
- BBR Advanced (BBR-A), github
- tsunami, nanqinlang or bbrplus - github
- bbrplus with ports to newer kernels
- An Evaluation of BBR and its variants (pdf)
cubic
is used by default:
# sysctl net.ipv4.tcp_available_congestion_control
net.ipv4.tcp_available_congestion_control = reno cubic
#sysctl net.ipv4.tcp_congestion_control
net.ipv4.tcp_congestion_control = cubic
Check if we can enable BBR:
# cat /boot/config-$(uname -r) | grep 'CONFIG_TCP_CONG_BBR'
CONFIG_TCP_CONG_BBR=m
# cat /boot/config-$(uname -r) | grep 'CONFIG_NET_SCH_FQ'
CONFIG_NET_SCH_FQ_CODEL=m
CONFIG_NET_SCH_FQ=m
CONFIG_NET_SCH_FQ_PIE=m
Enabling BBR:
modprobe tcp_bbr
echo "tcp_bbr" >> /etc/modules-load.d/modules.conf
cat << 'EOF' >> /etc/sysctl.conf
# Enable BBR
net.core.default_qdisc=fq
net.ipv4.tcp_congestion_control=bbr
EOF
sysctl -p
sysctl net.ipv4.tcp_available_congestion_control
sysctl net.ipv4.tcp_congestion_control
lsmod | grep bbr
ss -ti
or ss -tin
.
ss
is "utility to investigate sockets":
-i
is for "Show internal TCP information"-t
is for "Display TCP sockets".-n
for numeric. "Show exact bandwidth values, instead of human-readable."
See man ss
for details.
BBRv2 is in alpha as on 2022 november - https://github.com/google/bbr/commits/v2alpha.
There are ready kernels with this patches:
- https://codeberg.org/pf-kernel/linux/wiki/README
- https://xanmod.org/
- https://liquorix.net/ (zen kernel) - https://launchpad.net/~damentz/+archive/ubuntu/liquorix
- https://github.com/CachyOS/linux-cachyos
But it looks like BBRv2 has some issues and is not ready for production:
- from march 2020: https://roov.org/2020/03/bbr-bbrplus-bbr2/ (use google translate )
- from may 2022: https://groups.google.com/g/bbr-dev/c/xmley7VkeoE/m/W4lEyyW_AAAJ
Theese all about net.ipv4.tcp_notsent_lowat
and small kernel patch.
- https://blog.cloudflare.com/http-2-prioritization-with-nginx/
- https://blog.cloudflare.com/optimizing-tcp-for-high-throughput-and-low-latency/
Although results are promising in this links it's better to do carefully own tests on specific production setup.
Small test with lowering tcp_notsent_lowat
to 16k shows bitrate is lowering in iperf3 tests, so it may be not optimal for generic load, like backups, etc.
It is possible to control tcp_notsent_lowat
for only nginx with patches - https://github.com/nginx-modules/ngx_http_tls_dyn_size