Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add performance benchmarking #6

Open
JonathanGiles opened this issue Feb 27, 2024 · 10 comments
Open

Add performance benchmarking #6

JonathanGiles opened this issue Feb 27, 2024 · 10 comments

Comments

@JonathanGiles
Copy link
Owner

It would be cool to use something like this to start performance benchmarking and improving the performance of TeenyHttpd.

Thoughts @alex-cova ?

@alex-cova
Copy link
Collaborator

Wrk is a great tool, I had used it before, We can also use JMH for components like TeenyJson. Not sure if we need a script like profile.sh

These are some results so far.

Same command used in java-httpserver-vthreads

Cached Thread Pool

alex@Alexs-MacBook-Pro ~ % wrk --latency -d 60s -c 100 -t 8 http://localhost:8080/store/products
Running 1m test @ http://localhost:8080/store/products
8 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 1.92ms 2.45ms 50.74ms 93.64%
Req/Sec 230.37 566.98 3.00k 87.19%
Latency Distribution
50% 1.30ms
75% 1.64ms
90% 2.92ms
99% 14.55ms
9274 requests in 1.00m, 1.80MB read
Socket errors: connect 0, read 10039, write 0, timeout 0
Requests/sec: 154.30
Transfer/sec: 30.59KB


Work Stealing pool

alex@Alexs-MacBook-Pro ~ % wrk --latency -d 60s -c 100 -t 8 http://localhost:8080/store/products
Running 1m test @ http://localhost:8080/store/products
8 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 3.81ms 6.20ms 50.30ms 95.10%
Req/Sec 70.03 292.76 2.17k 95.34%
Latency Distribution
50% 2.05ms
75% 3.61ms
90% 6.00ms
99% 42.87ms
5393 requests in 1.00m, 1.04MB read
Socket errors: connect 0, read 5984, write 0, timeout 0
Requests/sec: 89.73
Transfer/sec: 17.79KB


Virtual Threads

alex@Alexs-MacBook-Pro ~ % wrk --latency -d 60s -c 100 -t 8 http://localhost:8080/store/products
Running 1m test @ http://localhost:8080/store/products
8 threads and 100 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 2.96ms 6.78ms 103.99ms 97.99%
Req/Sec 93.91 374.23 2.52k 94.59%
Latency Distribution
50% 1.61ms
75% 2.77ms
90% 5.25ms
99% 13.76ms
7164 requests in 1.00m, 1.39MB read
Socket errors: connect 0, read 7509, write 0, timeout 0
Requests/sec: 119.23
Transfer/sec: 23.64KB

Environment

  • Apple M1 Pro 32G
  • JDK Amazon Corretto 21.0.1

👉 Wrk Tool

We should have in mind this when working with virtual threads:

The monopolization has been explained in the Virtual threads are useful for I/O-bound workloads only section. When running long computations, we do not allow the JVM to unmount and switch to another virtual thread until the virtual thread terminates. Indeed, the current scheduler does not support preempting tasks.

This monopolization can lead to the creation of new carrier threads to execute other virtual threads. Creating carrier threads results in creating platform threads. So, there is a memory cost associated with this creation.

Ready to upgrade to java 21? 😬


I would like Teeny to have basic prometheus metrics

http_server_requests_seconds_sum{application="teeny",error="none",exception="none",method="POST",outcome="SUCCESS",status="200",uri="/store/pets",} 1.950221315

@alex-cova
Copy link
Collaborator

Results against TestServer.java

alex@Alexs-MacBook-Pro ~ % wrk --latency -d 60s -c 100 -t 8 http://localhost/user/1/details
Running 1m test @ http://localhost/user/1/details
  8 threads and 100 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.37ms    1.10ms  33.87ms   97.64%
    Req/Sec     2.00k     1.49k    4.88k    50.63%
  Latency Distribution
     50%    1.20ms
     75%    1.36ms
     90%    1.80ms
     99%    4.15ms
  32277 requests in 1.00m, 3.91MB read
  Socket errors: connect 0, read 32923, write 0, timeout 0
Requests/sec:    537.07
Transfer/sec:     66.61KB

@JonathanGiles
Copy link
Owner Author

Plenty of scope for perf gains then! :)

@alex-cova
Copy link
Collaborator

Plenty of scope for perf gains then! :)

Just profiled Teeny (IJ profiler) and looks like there's a death-lock, everything performs well until ~30th concurrent request, made a few adjustments and then got ~850 req/sec, this will be fun 😀

@JonathanGiles
Copy link
Owner Author

I really wanted at some point to get virtual threads integrated on certain levels of jdk, etc. It would be interesting to find other areas to improve.

@alex-cova
Copy link
Collaborator

I've been testing different strategies to make teeny more performant, I reached 1K/sec, one of them IMO the most important component to process more requests is the ThreadPoolExecutor, I just looked at this ThreadPoolExecutor.java and this

thoughts?

@JonathanGiles
Copy link
Owner Author

Let's get the perf up!

@alex-cova
Copy link
Collaborator

Just implemented JMH at TeenyJson branch, these are the results so far

Benchmark                          Mode  Cnt       Score      Error  Units
JsonBenchmarks.decodingBenchmark  thrpt    5   30337.625 ±   60.575  ops/s
JsonBenchmarks.encodingBenchmark  thrpt    5  199324.808 ± 6876.177  ops/s

@alex-cova
Copy link
Collaborator

Ups, Looks like if you run Teeny from IntelliJ, teeny will underperform.

These results are from running teeny from the cli

java -jar target/teenyhttpd-1.0.6.jar
alex@Alexs-MacBook-Pro ~ % wrk -d 5s http://localhost/health
Running 5s test @ http://localhost/health
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     7.41ms   40.40ms 352.74ms   96.50%
    Req/Sec     8.50k     3.16k   11.05k    88.89%
  16380 requests in 5.04s, 1.84MB read
  Socket errors: connect 0, read 16380, write 0, timeout 0
Requests/sec:   3252.73
Transfer/sec:    374.82KB

@alex-cova
Copy link
Collaborator

Teeny update:

  1. switched to bombardier
  2. I've been testing a new NIO based server
alex@Alexs-MacBook-Pro ~ % ./bombardier -d 5s http://localhost:8080          
Bombarding http://localhost:8080 for 5s using 125 connection(s)
[=========================================================================================================================================================] 5s
Done!
Statistics        Avg      Stdev        Max
  Reqs/sec     15177.85    3110.70   21408.30
  Latency        8.20ms     9.08ms   218.10ms
  HTTP codes:
    1xx - 0, 2xx - 75834, 3xx - 0, 4xx - 0, 5xx - 0
    others - 400
  Errors:
    the server closed connection before returning the first response byte. Make sure the server returns 'Connection: close' response header before closing the connection - 372
    dial tcp [::1]:8080: connect: connection refused - 27
    write tcp 127.0.0.1:53575->127.0.0.1:8080: write: broken pipe - 1
  Throughput:     2.98MB/s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants