Client and server run on two separate machines. Both share the same setup:
- Intel® Core™ i9-13900K CPU @ 3.0GHz (max 5.8GHz, performance-cores only),
- RAM 64GB DDR5-4800,
- 10Gbit network,
- Ubuntu 23.10 (Linux 6.6.6),
- Oracle JDK 17 (Old ZGC) / 22 (ZGC generational with latest patches)
Server code resides in the /server
module. Server exposes a single /ts
websocket endpoint, which emits a timestamp every 100ms.
This targets a scenario in which a websocket channel is used to serve live market updates to the user.
Tested servers:
- http4s + blaze (CE 3.5.4, fs2 3.10.0)
- http4s + blaze, via tapir fast-path (CE 3.5.4, fs2 3.10.0, tapir 1.10.0)
- http4s + blaze, via tapir (CE 3.5.4, fs2 3.10.0, tapir 1.10.0)
- http4s + blaze, via tapir (CE 3.5.4, fs2 3.10.0, tapir 1.6.3)
- http4s + blaze, via tapir (CE 3.5.4, fs2 3.10.0, tapir 1.6.0)
The following configuration of Tapir endpoint out was used for the "fast-path" mode:
webSocketBody(Fs2Streams[IO])
.decodeCloseRequests(true)
.concatenateFragmentedFrames(false)
.autoPongOnPing(false)
.ignorePong(false)
.autoPing(None)
Client code resides in the /client
module. Gatling client ramps up to 25k users within 30s,
and each user consumes 600 messages from the websocket server (with an update every 100ms this amounts to 60s).
For each message, an absolute difference between the client timestamp and the timestamp received from the server
is stored into an HdrHistogram. With clocks synchronized between the client and server, this value corresponds
to the latency induced by the server.
For precise measurement of latency up to milliseconds need to install, configure, and run chrony
service.
The following command could be used for installation on Ubuntu:
sudo apt-get -y install chrony
Here is a list of NTP servers that is used in our /etc/chrony/chrony.conf
:
server time5.facebook.com iburst
server tempus1.gum.gov.pl
server tempus2.gum.gov.pl
server ntp1.tp.pl
server ntp2.tp.pl
For non-Poland regions other servers could be preferred.
Finally, need to restart the service after (re)configuration by:
sudo systemctl restart chrony
Here is a great article about time synchronization in Facebook.
Use transparent huge pages by starting JVM services with the -XX:+UseTransparentHugePages
option and the following
transparent_hugepage
configuration in Linux:
echo madvise | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
echo advise | sudo tee /sys/kernel/mm/transparent_hugepage/shmem_enabled
echo defer | sudo tee /sys/kernel/mm/transparent_hugepage/defrag
echo 1 | sudo tee /sys/kernel/mm/transparent_hugepage/khugepaged/defrag
Add the following configuration to /etc/sysctl.conf
:
# Memory (all processes)
vm.swappiness = 0
vm.stat_interval=120
vm.min_free_kbytes = 4194304
vm.zone_reclaim_mode = 0
kernel.numa_balancing = 0
# File System (http/ws/db servers)
fs.file-max = 10000000
fs.nr_open = 10000000
vm.dirty_ratio = 80
vm.dirty_background_ratio = 5
vm.dirty_expire_centisecs = 12000
# Networking (aeron, http/ws servers)
net.core.somaxconn = 65535
net.core.rmem_max = 4194304
net.core.rmem_default = 65536
net.core.wmem_max = 4194304
net.core.wmem_default = 65536
net.ipv4.ip_local_port_range = 1024 65535
net.ipv4.tcp_mem = 786432 1697152 1945728
net.ipv4.tcp_rmem = 4096 65536 4194304
net.ipv4.tcp_wmem = 4096 65536 4194304
net.ipv4.tcp_max_syn_backlog = 8192
net.ipv4.tcp_slow_start_after_idle = 0
net.ipv4.tcp_syn_retries = 2
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_orphans = 65536
net.ipv4.tcp_fin_timeout = 10
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_sack = 1
net.ipv4.tcp_congestion_control = htcp
net.ipv4.tcp_no_metrics_save = 1
net.ipv4.tcp_window_scaling = 1
# Profiling (perf, async-profiler)
kernel.perf_event_paranoid = -1
kernel.kptr_restrict = 0
kernel.perf_event_max_stack = 1024
kernel.perf_event_mlock_kb = 8096
And then run:
sudo sysctl -p
Install and configure an after boot service that will turn on performance
mode for scaling governor:
sudo apt install cpufrequtils
Then, perform a restart and set scaling governor to performance:
sudo cpufreq-set -g performance
Note: If you ran the above command without restarting, then you'd have to edit the file in /etc/init.d/cpufrequtils
and change the line that says GOVERNOR=ondemand
to GOVERNOR=performance
then run:
sudo sh /etc/init.d/cpufrequtils start
It'd probably prompt you to restart your systemctl daemon, so do it like this:
sudo systemctl daemon-reload
Then view your CPU frequency with:
cat /proc/cpuinfo | grep -i mhz
Benchmark results reside in /results
.
results
├── http4s (JDK 22 with Generational ZGC, THP, and other optimizations, CE 3.5.4, fs2 3.10.0)
├── http4s-no (JDK 17 without optimizations, CE 3.5.4, fs2 3.10.0)
├── http4s-co (coordinated ommision, JDK 22 with Generational ZGC, THP, and other optimizations, CE 3.5.4, fs2 3.10.0)
├── tapir-1.6.0 (JDK 22 with Generational ZGC, THP, and other optimizations, CE 3.5.4, fs2 3.10.0, tapir 1.6.0)
├── tapir-1.6.0-no (JDK 17 without optimizations, CE 3.5.4, fs2 3.10.0, tapir 1.6.0)
├── tapir-1.6.3 (JDK 22 with Generational ZGC, THP, and other optimizations, CE 3.5.4, fs2 3.10.0, tapir 1.6.3)
├── tapir-1.10.0 (JDK 22 with Generational ZGC, THP, and other optimizations, CE 3.5.4, fs2 3.10.0, tapir 1.10.0)
├── tapir-1.10.0-fp (JDK 22 with Generational ZGC, THP, and other optimizations, CE 3.5.4, fs2 3.10.0, tapir 1.10.0, fast-path)
Each folder contains:
- HdrHistogram latency,
- Gatling html report (useful to see variance in the expected 100ms between the updates across time),
- async-profiler interactive flame-graph with per-thread aggregation of CPU cycles for stack frames.
Note: you need Java 21 to build and run the benchmarks.
- Build server binaries via
sbt stage
- Start the desired server using binaries found in
server/target/universal/stage/bin
- Start gatling web socket client via
sbt client/Gatling/test
The majority of the work behind the tests is carried out by Andriy Plokhotnyuk. Thank, you Andriy!