Skip to content

Commit

Permalink
docs: FAQ on benchmarking best practices. (envoyproxy#11140)
Browse files Browse the repository at this point in the history
Includes a bunch of tips from @jmarantz, @oschaaf, @mattklein123.

Signed-off-by: Harvey Tuch <[email protected]>
  • Loading branch information
htuch authored May 12, 2020
1 parent 8e6de64 commit 45726b7
Show file tree
Hide file tree
Showing 4 changed files with 88 additions and 0 deletions.
2 changes: 2 additions & 0 deletions docs/root/faq/load_balancing/disable_circuit_breaking.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _faq_disable_circuit_breaking:

Is there a way to disable circuit breaking?
===========================================

Expand Down
1 change: 1 addition & 0 deletions docs/root/faq/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ Performance
:maxdepth: 2

performance/how_fast_is_envoy
performance/how_to_benchmark_envoy

Configuration
-------------
Expand Down
2 changes: 2 additions & 0 deletions docs/root/faq/performance/how_fast_is_envoy.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _faq_how_fast_is_envoy:

How fast is Envoy?
==================

Expand Down
83 changes: 83 additions & 0 deletions docs/root/faq/performance/how_to_benchmark_envoy.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
What are best practices for benchmarking Envoy?
===============================================

There is :ref:`no single QPS, latency or throughput overhead <faq_how_fast_is_envoy>` that can
characterize a network proxy such as Envoy. Instead, any measurements need to be contextually aware,
ensuring an apples-to-apples comparison with other systems by configuring and load testing Envoy
appropriately. As a result, we can't provide a canonical benchmark configuration, but instead offer
the following guidance:

* A release Envoy binary should be used. If building, please ensure that `-c opt`
is used on the Bazel command line. When consuming Envoy point releases, make
sure you are using the latest point release; given the pace of Envoy development
it's not reasonable to pick older versions when making a statement about Envoy
performance. Similarly, if working on a master build, please perform due diligence
and ensure no regressions or performance improvements have landed proximal to your
benchmark work and that your are close to HEAD.

* The :option:`--concurrency` Envoy CLI flag should be unset (providing one worker thread per
logical core on your machine) or set to match the number of cores/threads made available to other
network proxies in your comparison.

* Disable :ref:`circuit breaking <faq_disable_circuit_breaking>`. A common issue during benchmarking
is that Envoy's default circuit breaker limits are low, leading to connection and request queuing.

* Disable :ref:`generate_request_id
<envoy_v3_api_field_extensions.filters.network.http_connection_manager.v3.HttpConnectionManager.generate_request_id>`.

* Disable :ref:`dynamic_stats
<envoy_v3_api_field_extensions.filters.http.router.v3.Router.dynamic_stats>`. If you are measuring
the overhead vs. a direct connection, you might want to consider disabling all stats via
:ref:`reject_all <envoy_v3_api_field_config.metrics.v3.StatsMatcher.reject_all>`.

* Ensure that the networking and HTTP filter chains are reflective of comparable features
in the systems that Envoy is being compared with.

* Ensure that TLS settings (if any) are realistic and that consistent cyphers are used in
any comparison. Session reuse may have a significant impact on results and should be tracked via
:ref:`listener SSL stats <config_listener_stats>`.

* Ensure that :ref:`HTTP/2 settings <envoy_v3_api_msg_config.core.v3.Http2ProtocolOptions>`, in
particular those that affect flow control and stream concurrency, are consistent in any
comparison. Ideally taking into account BDP and network link latencies when optimizing any
HTTP/2 settings.

* Verify in the listener and cluster stats that the number of streams, connections and errors
matches what is expected in any given experiment.

* Make sure you are aware of how connections created by your load generator are
distributed across Envoy worker threads. This is especially important for
benchmarks that use low connection counts and perfect keep-alive. You should be aware that
Envoy will allocate all streams for a given connection to a single worker thread. This means,
for example, that if you have 72 logical cores and worker threads, but only a single HTTP/2
connection from your load generator, then only 1 worker thread will be active.

* Make sure request-release timing expectations line up with what is intended.
Some load generators produce naturally jittery and/or batchy timings. This
might end up being an unintended dominant factor in certain tests.

* The specifics of how your load generator reuses connections is an important factor (e.g. MRU,
random, LRU, etc.) as this impacts work distribution.

* If you're trying to measure small (say < 1ms) latencies, make sure the measurement tool and
environment have the required sensitivity and the noise floor is sufficiently low.

* Be critical of your bootstrap or xDS configuration. Ideally every line has a motivation and is
necessary for the benchmark under consideration.

* Consider using `Nighthawk <https://github.com/envoyproxy/nighthawk>`_ as your
load generator and measurement tool. We are committed to building out
benchmarking and latency measurement best practices in this tool.

* Examine `perf` profiles of Envoy during the benchmark run, e.g. with `flame graphs
<http://www.brendangregg.com/flamegraphs.html>`_. Verify that Envoy is spending its time
doing the expected essential work under test, rather than some unrelated or tangential
work.

* Familiarize yourself with `latency measurement best practices
<https://www.youtube.com/watch?v=lJ8ydIuPFeU>`_. In particular, never measure latency at
max load, this is not generally meaningful or reflecting of real system performance; aim
to measure below the knee of the QPS-latency curve. Prefer open vs. closed loop load
generators.

* Avoid `benchmarking crimes <https://www.cse.unsw.edu.au/~gernot/benchmarking-crimes.html>`_.

0 comments on commit 45726b7

Please sign in to comment.