Add performance benchmarking #6

JonathanGiles · 2024-02-27T21:48:33Z

It would be cool to use something like this to start performance benchmarking and improving the performance of TeenyHttpd.

Thoughts @alex-cova ?

alex-cova · 2024-02-27T23:30:24Z

Wrk is a great tool, I had used it before, We can also use JMH for components like TeenyJson. Not sure if we need a script like profile.sh

These are some results so far.

Same command used in java-httpserver-vthreads

Cached Thread Pool

alex@Alexs-MacBook-Pro ~ % wrk --latency -d 60s -c 100 -t 8 http://localhost:8080/store/products
Running 1m test @ http://localhost:8080/store/products
8 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.92ms 2.45ms 50.74ms 93.64%
Req/Sec 230.37 566.98 3.00k 87.19%
Latency Distribution
50% 1.30ms
75% 1.64ms
90% 2.92ms
99% 14.55ms
9274 requests in 1.00m, 1.80MB read
Socket errors: connect 0, read 10039, write 0, timeout 0
Requests/sec: 154.30
Transfer/sec: 30.59KB

Work Stealing pool

alex@Alexs-MacBook-Pro ~ % wrk --latency -d 60s -c 100 -t 8 http://localhost:8080/store/products
Running 1m test @ http://localhost:8080/store/products
8 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 3.81ms 6.20ms 50.30ms 95.10%
Req/Sec 70.03 292.76 2.17k 95.34%
Latency Distribution
50% 2.05ms
75% 3.61ms
90% 6.00ms
99% 42.87ms
5393 requests in 1.00m, 1.04MB read
Socket errors: connect 0, read 5984, write 0, timeout 0
Requests/sec: 89.73
Transfer/sec: 17.79KB

Virtual Threads

alex@Alexs-MacBook-Pro ~ % wrk --latency -d 60s -c 100 -t 8 http://localhost:8080/store/products
Running 1m test @ http://localhost:8080/store/products
8 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 2.96ms 6.78ms 103.99ms 97.99%
Req/Sec 93.91 374.23 2.52k 94.59%
Latency Distribution
50% 1.61ms
75% 2.77ms
90% 5.25ms
99% 13.76ms
7164 requests in 1.00m, 1.39MB read
Socket errors: connect 0, read 7509, write 0, timeout 0
Requests/sec: 119.23
Transfer/sec: 23.64KB

Environment

Apple M1 Pro 32G
JDK Amazon Corretto 21.0.1

👉 Wrk Tool

We should have in mind this when working with virtual threads:

The monopolization has been explained in the Virtual threads are useful for I/O-bound workloads only section. When running long computations, we do not allow the JVM to unmount and switch to another virtual thread until the virtual thread terminates. Indeed, the current scheduler does not support preempting tasks.

This monopolization can lead to the creation of new carrier threads to execute other virtual threads. Creating carrier threads results in creating platform threads. So, there is a memory cost associated with this creation.

Ready to upgrade to java 21? 😬

I would like Teeny to have basic prometheus metrics

http_server_requests_seconds_sum{application="teeny",error="none",exception="none",method="POST",outcome="SUCCESS",status="200",uri="/store/pets",} 1.950221315

alex-cova · 2024-02-27T23:35:11Z

Results against TestServer.java

alex@Alexs-MacBook-Pro ~ % wrk --latency -d 60s -c 100 -t 8 http://localhost/user/1/details
Running 1m test @ http://localhost/user/1/details
  8 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.37ms    1.10ms  33.87ms   97.64%
    Req/Sec     2.00k     1.49k    4.88k    50.63%
  Latency Distribution
     50%    1.20ms
     75%    1.36ms
     90%    1.80ms
     99%    4.15ms
  32277 requests in 1.00m, 3.91MB read
  Socket errors: connect 0, read 32923, write 0, timeout 0
Requests/sec:    537.07
Transfer/sec:     66.61KB

JonathanGiles · 2024-02-27T23:41:39Z

Plenty of scope for perf gains then! :)

alex-cova · 2024-02-28T06:46:05Z

Plenty of scope for perf gains then! :)

Just profiled Teeny (IJ profiler) and looks like there's a death-lock, everything performs well until ~30th concurrent request, made a few adjustments and then got ~850 req/sec, this will be fun 😀

JonathanGiles · 2024-02-28T07:15:12Z

I really wanted at some point to get virtual threads integrated on certain levels of jdk, etc. It would be interesting to find other areas to improve.

alex-cova · 2024-03-06T17:07:51Z

I've been testing different strategies to make teeny more performant, I reached 1K/sec, one of them IMO the most important component to process more requests is the ThreadPoolExecutor, I just looked at this ThreadPoolExecutor.java and this

thoughts?

JonathanGiles · 2024-03-09T20:21:56Z

Let's get the perf up!

alex-cova · 2024-03-10T08:34:55Z

Just implemented JMH at TeenyJson branch, these are the results so far

Benchmark                          Mode  Cnt       Score      Error  Units
JsonBenchmarks.decodingBenchmark  thrpt    5   30337.625 ±   60.575  ops/s
JsonBenchmarks.encodingBenchmark  thrpt    5  199324.808 ± 6876.177  ops/s

alex-cova · 2024-03-17T04:44:11Z

Ups, Looks like if you run Teeny from IntelliJ, teeny will underperform.

These results are from running teeny from the cli

java -jar target/teenyhttpd-1.0.6.jar

alex@Alexs-MacBook-Pro ~ % wrk -d 5s http://localhost/health
Running 5s test @ http://localhost/health
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     7.41ms   40.40ms 352.74ms   96.50%
    Req/Sec     8.50k     3.16k   11.05k    88.89%
  16380 requests in 5.04s, 1.84MB read
  Socket errors: connect 0, read 16380, write 0, timeout 0
Requests/sec:   3252.73
Transfer/sec:    374.82KB

alex-cova · 2024-03-18T06:30:40Z

Teeny update:

switched to bombardier
I've been testing a new NIO based server

alex@Alexs-MacBook-Pro ~ % ./bombardier -d 5s http://localhost:8080          
Bombarding http://localhost:8080 for 5s using 125 connection(s)
[=========================================================================================================================================================] 5s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec     15177.85    3110.70   21408.30
  Latency        8.20ms     9.08ms   218.10ms
  HTTP codes:
    1xx - 0, 2xx - 75834, 3xx - 0, 4xx - 0, 5xx - 0
    others - 400
  Errors:
    the server closed connection before returning the first response byte. Make sure the server returns 'Connection: close' response header before closing the connection - 372
    dial tcp [::1]:8080: connect: connection refused - 27
    write tcp 127.0.0.1:53575->127.0.0.1:8080: write: broken pipe - 1
  Throughput:     2.98MB/s

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add performance benchmarking #6

Add performance benchmarking #6

JonathanGiles commented Feb 27, 2024

alex-cova commented Feb 27, 2024

alex-cova commented Feb 27, 2024

JonathanGiles commented Feb 27, 2024

alex-cova commented Feb 28, 2024

JonathanGiles commented Feb 28, 2024

alex-cova commented Mar 6, 2024

JonathanGiles commented Mar 9, 2024

alex-cova commented Mar 10, 2024

alex-cova commented Mar 17, 2024

alex-cova commented Mar 18, 2024

Add performance benchmarking #6

Add performance benchmarking #6

Comments

JonathanGiles commented Feb 27, 2024

alex-cova commented Feb 27, 2024

alex-cova commented Feb 27, 2024

JonathanGiles commented Feb 27, 2024

alex-cova commented Feb 28, 2024

JonathanGiles commented Feb 28, 2024

alex-cova commented Mar 6, 2024

JonathanGiles commented Mar 9, 2024

alex-cova commented Mar 10, 2024

alex-cova commented Mar 17, 2024

alex-cova commented Mar 18, 2024