SBCL-specific optimization for plain blob HTTP output. #191

phmarek · 2021-03-31T18:12:14Z

When a easy-handler returns a string or ub8-vector
for output, the small socket buffer size hurts performance
by requiring many unnecessary context switches, resp.
allows for other threads to be scheduled.

By just writing prepared data as it is, the socket buffer
can be streamed as fast as available bandwidth allows it.

(Note: on Linux a reasonable TCP buffer sysctl is recommended,
for example "net.ipv4.tcp_wmem = 131072 131072 4194304").

For small request sizes, the difference is within the noise floor:

  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    89.50us  287.62us   8.40ms   99.62%
    Req/Sec    13.24k     1.02k   15.86k    75.91%

  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    91.50us  312.95us   8.90ms   99.60%
    Req/Sec    13.21k     0.91k   15.69k    68.32%

But for larger outputs (here a 115kB PDF) this patch decreases
latency by quite a large margin. From

  3 threads and 3 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    38.41ms   20.96ms 111.80ms   67.98%
    Req/Sec    26.16      9.20    50.00     76.33%
  Latency Distribution
     50%   22.59ms
     75%   63.52ms
     90%   65.59ms
     99%   84.39ms
  785 requests in 10.01s, 87.90MB read
  Requests/sec:     78.40

the 99% latency is nearly halved:

  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    24.86ms    5.70ms  64.90ms   85.10%
    Req/Sec    40.29      6.88    50.00     60.00%
  Latency Distribution
     50%   22.61ms
     75%   23.12ms
     90%   32.48ms
     99%   45.03ms
  1209 requests in 10.01s, 135.24MB read
  Requests/sec:    120.76

For HTTPS, chunked output, and other stream types this keeps
the old behaviour.

When a easy-handler returns a string or ub8-vector for output, the small socket buffer size hurts performance by requiring many unnecessary context switches, resp. allows for other threads to be scheduled. By just writing prepared data as it is, the socket buffer can be streamed as fast as available bandwidth allows it. (Note: on Linux a reasonable TCP buffer sysctl is recommended, for example "net.ipv4.tcp_wmem = 131072 131072 4194304"). For small request sizes, the difference is within the noise floor: Thread Stats Avg Stdev Max +/- Stdev Latency 89.50us 287.62us 8.40ms 99.62% Req/Sec 13.24k 1.02k 15.86k 75.91% Thread Stats Avg Stdev Max +/- Stdev Latency 91.50us 312.95us 8.90ms 99.60% Req/Sec 13.21k 0.91k 15.69k 68.32% But for larger outputs (here a 115kB PDF) this patch decreases latency by quite a large margin. From 3 threads and 3 connections Thread Stats Avg Stdev Max +/- Stdev Latency 38.41ms 20.96ms 111.80ms 67.98% Req/Sec 26.16 9.20 50.00 76.33% Latency Distribution 50% 22.59ms 75% 63.52ms 90% 65.59ms 99% 84.39ms 785 requests in 10.01s, 87.90MB read Requests/sec: 78.40 the 99% latency is nearly halved: Thread Stats Avg Stdev Max +/- Stdev Latency 24.86ms 5.70ms 64.90ms 85.10% Req/Sec 40.29 6.88 50.00 60.00% Latency Distribution 50% 22.61ms 75% 23.12ms 90% 32.48ms 99% 45.03ms 1209 requests in 10.01s, 135.24MB read Requests/sec: 120.76 For HTTPS, chunked output, and other stream types this keeps the old behaviour.

stassats · 2021-03-31T19:04:14Z

If you want a fast web server then hunchentoot is probably the wrong place to find it. And using SBCL internals is a no go anyway.

stassats closed this Mar 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SBCL-specific optimization for plain blob HTTP output. #191

SBCL-specific optimization for plain blob HTTP output. #191

phmarek commented Mar 31, 2021 •

edited

Loading

stassats commented Mar 31, 2021

SBCL-specific optimization for plain blob HTTP output. #191

SBCL-specific optimization for plain blob HTTP output. #191

Conversation

phmarek commented Mar 31, 2021 • edited Loading

stassats commented Mar 31, 2021

phmarek commented Mar 31, 2021 •

edited

Loading