forked from envoyproxy/envoy
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: FAQ on benchmarking best practices. (envoyproxy#11140)
Includes a bunch of tips from @jmarantz, @oschaaf, @mattklein123. Signed-off-by: Harvey Tuch <[email protected]>
- Loading branch information
Showing
4 changed files
with
88 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,5 @@ | ||
.. _faq_how_fast_is_envoy: | ||
|
||
How fast is Envoy? | ||
================== | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
What are best practices for benchmarking Envoy? | ||
=============================================== | ||
|
||
There is :ref:`no single QPS, latency or throughput overhead <faq_how_fast_is_envoy>` that can | ||
characterize a network proxy such as Envoy. Instead, any measurements need to be contextually aware, | ||
ensuring an apples-to-apples comparison with other systems by configuring and load testing Envoy | ||
appropriately. As a result, we can't provide a canonical benchmark configuration, but instead offer | ||
the following guidance: | ||
|
||
* A release Envoy binary should be used. If building, please ensure that `-c opt` | ||
is used on the Bazel command line. When consuming Envoy point releases, make | ||
sure you are using the latest point release; given the pace of Envoy development | ||
it's not reasonable to pick older versions when making a statement about Envoy | ||
performance. Similarly, if working on a master build, please perform due diligence | ||
and ensure no regressions or performance improvements have landed proximal to your | ||
benchmark work and that your are close to HEAD. | ||
|
||
* The :option:`--concurrency` Envoy CLI flag should be unset (providing one worker thread per | ||
logical core on your machine) or set to match the number of cores/threads made available to other | ||
network proxies in your comparison. | ||
|
||
* Disable :ref:`circuit breaking <faq_disable_circuit_breaking>`. A common issue during benchmarking | ||
is that Envoy's default circuit breaker limits are low, leading to connection and request queuing. | ||
|
||
* Disable :ref:`generate_request_id | ||
<envoy_v3_api_field_extensions.filters.network.http_connection_manager.v3.HttpConnectionManager.generate_request_id>`. | ||
|
||
* Disable :ref:`dynamic_stats | ||
<envoy_v3_api_field_extensions.filters.http.router.v3.Router.dynamic_stats>`. If you are measuring | ||
the overhead vs. a direct connection, you might want to consider disabling all stats via | ||
:ref:`reject_all <envoy_v3_api_field_config.metrics.v3.StatsMatcher.reject_all>`. | ||
|
||
* Ensure that the networking and HTTP filter chains are reflective of comparable features | ||
in the systems that Envoy is being compared with. | ||
|
||
* Ensure that TLS settings (if any) are realistic and that consistent cyphers are used in | ||
any comparison. Session reuse may have a significant impact on results and should be tracked via | ||
:ref:`listener SSL stats <config_listener_stats>`. | ||
|
||
* Ensure that :ref:`HTTP/2 settings <envoy_v3_api_msg_config.core.v3.Http2ProtocolOptions>`, in | ||
particular those that affect flow control and stream concurrency, are consistent in any | ||
comparison. Ideally taking into account BDP and network link latencies when optimizing any | ||
HTTP/2 settings. | ||
|
||
* Verify in the listener and cluster stats that the number of streams, connections and errors | ||
matches what is expected in any given experiment. | ||
|
||
* Make sure you are aware of how connections created by your load generator are | ||
distributed across Envoy worker threads. This is especially important for | ||
benchmarks that use low connection counts and perfect keep-alive. You should be aware that | ||
Envoy will allocate all streams for a given connection to a single worker thread. This means, | ||
for example, that if you have 72 logical cores and worker threads, but only a single HTTP/2 | ||
connection from your load generator, then only 1 worker thread will be active. | ||
|
||
* Make sure request-release timing expectations line up with what is intended. | ||
Some load generators produce naturally jittery and/or batchy timings. This | ||
might end up being an unintended dominant factor in certain tests. | ||
|
||
* The specifics of how your load generator reuses connections is an important factor (e.g. MRU, | ||
random, LRU, etc.) as this impacts work distribution. | ||
|
||
* If you're trying to measure small (say < 1ms) latencies, make sure the measurement tool and | ||
environment have the required sensitivity and the noise floor is sufficiently low. | ||
|
||
* Be critical of your bootstrap or xDS configuration. Ideally every line has a motivation and is | ||
necessary for the benchmark under consideration. | ||
|
||
* Consider using `Nighthawk <https://github.com/envoyproxy/nighthawk>`_ as your | ||
load generator and measurement tool. We are committed to building out | ||
benchmarking and latency measurement best practices in this tool. | ||
|
||
* Examine `perf` profiles of Envoy during the benchmark run, e.g. with `flame graphs | ||
<http://www.brendangregg.com/flamegraphs.html>`_. Verify that Envoy is spending its time | ||
doing the expected essential work under test, rather than some unrelated or tangential | ||
work. | ||
|
||
* Familiarize yourself with `latency measurement best practices | ||
<https://www.youtube.com/watch?v=lJ8ydIuPFeU>`_. In particular, never measure latency at | ||
max load, this is not generally meaningful or reflecting of real system performance; aim | ||
to measure below the knee of the QPS-latency curve. Prefer open vs. closed loop load | ||
generators. | ||
|
||
* Avoid `benchmarking crimes <https://www.cse.unsw.edu.au/~gernot/benchmarking-crimes.html>`_. |