Skip to content

Commit

Permalink
Overload Manager: Implement LoadshedPoint for Tcp Accept. (envoyproxy…
Browse files Browse the repository at this point in the history
…#26706)

* Implement TCP Accept LoadShedPoint.

Signed-off-by: Kevin Baichoo <[email protected]>
  • Loading branch information
KBaichoo authored Apr 14, 2023
1 parent 952569c commit 1219e3f
Show file tree
Hide file tree
Showing 24 changed files with 351 additions and 42 deletions.
1 change: 0 additions & 1 deletion api/envoy/config/overload/v3/overload.proto
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,6 @@ message OverloadManager {
// The set of overload actions.
repeated OverloadAction actions = 3;

// [#not-implemented-hide:]
// The set of load shed points.
repeated LoadShedPoint loadshed_points = 5;

Expand Down
4 changes: 4 additions & 0 deletions changelogs/current.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -338,6 +338,10 @@ new_features:
change: |
added stat ``overload.refresh_interval_delay`` to track the delay between overload manager resource loop refresh in
milliseconds.
- area: load shed point
change: |
added load shed point ``envoy.load_shed_points.tcp_listener_accept`` that rejects new tcp connections
by closing the connection before the new connection accept phase.
- area: http
change: |
make adding ProxyProtocolFilterState in the HCM optional.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,12 @@ The :ref:`overload manager <arch_overview_overload_manager>` is configured in th
field.

An example configuration of the overload manager is shown below. It shows a
configuration to drain HTTP/X connections when heap memory usage reaches 95%
and to stop accepting requests when heap memory usage reaches 99%.
configuration to drain HTTP/X connections when heap memory usage reaches 92%
(configured via ``envoy.overload_actions.disable_http_keepalive``), to stop
accepting requests when heap memory usage reaches 95% (configured via
``envoy.overload_actions.stop_accepting_requests``) and to stop accepting new
TCP connections when memory usage reaches 95% (configured via
``envoy.load_shed_points.tcp_listener_accept``).

.. code-block:: yaml
Expand All @@ -26,12 +30,18 @@ and to stop accepting requests when heap memory usage reaches 99%.
triggers:
- name: "envoy.resource_monitors.fixed_heap"
threshold:
value: 0.95
value: 0.92
- name: "envoy.overload_actions.stop_accepting_requests"
triggers:
- name: "envoy.resource_monitors.fixed_heap"
threshold:
value: 0.99
value: 0.95
loadshed_points:
- name: "envoy.load_shed_points.tcp_listener_accept"
triggers:
- name: "envoy.resource_monitors.fixed_heap"
threshold:
value: 0.95
Resource monitors
-----------------
Expand Down Expand Up @@ -102,6 +112,45 @@ The following overload actions are supported:
- Envoy will reset expensive streams to terminate them. See
:ref:`below <config_overload_manager_reset_streams>` for details on configuration.


Load Shed Points
----------------

Load Shed Points are similar to overload actions as they are dependent on a
given trigger to activate which determines whether Envoy ends up shedding load at
the given point in a connection or stream lifecycle.

For a given request on a newly created connection, we can think of the
configured load shed points as a decision tree at key junctions of a connection
/ stream lifecycle. While a connection / stream might pass one junction, it
is possible that later on the conditions might change causing Envoy to shed load
at a later junction.

In comparision to analogous overload actions, Load Shed Points are more
reactive to changing conditions, especially in cases of large traffic spikes.
Overload actions can be better suited in cases where Envoy is deciding to shed load
but the worker threads aren't actively processing the connections or streams that
Envoy wants to shed. For example
``envoy.overload_actions.reset_high_memory_stream`` can reset streams that are
using a lot of memory even if those streams aren't actively making progress.

Compared to overload actions, Load Shed Points are also more flexible to
integrate custom (e.g. company inteneral) Load Shed Points as long as the extension
has access to the Overload Manager to request the custom Load Shed Point.

The following core load shed points are supported:

.. list-table::
:header-rows: 1
:widths: 1, 2

* - Name
- Description

* - envoy.load_shed_points.tcp_listener_accept
- Envoy will reject (close) new TCP connections. This occurs before the
:ref:`Listener Filter Chain <life_of_a_request>` is created.

.. _config_overload_manager_reducing_timeouts:

Reducing timeouts
Expand Down
1 change: 1 addition & 0 deletions envoy/network/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -213,6 +213,7 @@ envoy_cc_library(
"//envoy/common:resource_interface",
"//envoy/config:typed_metadata_interface",
"//envoy/init:manager_interface",
"//envoy/server/overload:load_shed_point_interface",
"//envoy/stats:stats_interface",
"//source/common/common:interval_value",
"@envoy_api//envoy/config/core/v3:pkg_cc_proto",
Expand Down
7 changes: 7 additions & 0 deletions envoy/network/listener.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#include "envoy/network/connection_balancer.h"
#include "envoy/network/listen_socket.h"
#include "envoy/network/udp_packet_writer_handler.h"
#include "envoy/server/overload/load_shed_point.h"
#include "envoy/stats/scope.h"

#include "source/common/common/interval_value.h"
Expand Down Expand Up @@ -412,6 +413,12 @@ class Listener {
* after being opened.
*/
virtual void setRejectFraction(UnitFloat reject_fraction) PURE;

/**
* Configures the LoadShedPoints for this listener.
*/
virtual void
configureLoadShedPoints(Server::LoadShedPointProvider& load_shed_point_provider) PURE;
};

using ListenerPtr = std::unique_ptr<Listener>;
Expand Down
8 changes: 8 additions & 0 deletions envoy/server/overload/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ envoy_cc_library(
name = "overload_manager_interface",
hdrs = ["overload_manager.h"],
deps = [
":load_shed_point_interface",
":thread_local_overload_state",
"//envoy/event:dispatcher_interface",
"//envoy/thread_local:thread_local_interface",
Expand All @@ -30,3 +31,10 @@ envoy_cc_library(
"//source/common/common:interval_value",
],
)

# This is seperate from `:overload_manager_interface` to break
# circular dependencies.
envoy_cc_library(
name = "load_shed_point_interface",
hdrs = ["load_shed_point.h"],
)
41 changes: 41 additions & 0 deletions envoy/server/overload/load_shed_point.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
#pragma once

#include <memory>

#include "envoy/common/pure.h"

#include "absl/strings/string_view.h"

namespace Envoy {
namespace Server {

/**
* A point within the connection or request lifecycle that provides context on
* whether to shed load at that given stage for the current entity at the point.
*/
class LoadShedPoint {
public:
virtual ~LoadShedPoint() = default;

// Whether to shed the load.
virtual bool shouldShedLoad() PURE;
};

using LoadShedPointPtr = std::unique_ptr<LoadShedPoint>;

/**
* Provides configured LoadShedPoints.
*/
class LoadShedPointProvider {
public:
virtual ~LoadShedPointProvider() = default;

/**
* Get the load shed point identified by the following string. Returns nullptr
* for non-configured points.
*/
virtual LoadShedPoint* getLoadShedPoint(absl::string_view point_name) PURE;
};

} // namespace Server
} // namespace Envoy
25 changes: 2 additions & 23 deletions envoy/server/overload/overload_manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
#include "envoy/common/pure.h"
#include "envoy/event/dispatcher.h"
#include "envoy/event/scaled_range_timer_manager.h"
#include "envoy/server/overload/load_shed_point.h"
#include "envoy/server/overload/thread_local_overload_state.h"

#include "source/common/singleton/const_singleton.h"
Expand Down Expand Up @@ -52,29 +53,13 @@ class OverloadActionStatsNameValues {

using OverloadActionStatsNames = ConstSingleton<OverloadActionStatsNameValues>;

/**
* A point within the connection or request lifecycle that provides context on
* whether to shed load at that given stage for the current entity at the point.
*/
class LoadShedPoint {
public:
virtual ~LoadShedPoint() = default;

// Whether to shed the load.
virtual bool shouldShedLoad() PURE;
};

using LoadShedPointPtr = std::unique_ptr<LoadShedPoint>;

/**
* The OverloadManager protects the Envoy instance from being overwhelmed by client
* requests. It monitors a set of resources and notifies registered listeners if
* configured thresholds for those resources have been exceeded.
*/
class OverloadManager {
class OverloadManager : public LoadShedPointProvider {
public:
virtual ~OverloadManager() = default;

/**
* Start a recurring timer to monitor resources and notify listeners when overload actions
* change state.
Expand All @@ -99,12 +84,6 @@ class OverloadManager {
*/
virtual ThreadLocalOverloadState& getThreadLocalOverloadState() PURE;

/**
* Get the load shed point identified by the following string. Returns nullptr
* on for non-configured points.
*/
virtual LoadShedPoint* getLoadShedPoint(absl::string_view point_name) PURE;

/**
* Get a factory for constructing scaled timer managers that respond to overload state.
*/
Expand Down
9 changes: 8 additions & 1 deletion source/common/network/tcp_listener_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,8 @@ void TcpListenerImpl::onSocketEvent(short flags) {
io_handle->close();
cb_.onReject(TcpListenerCallbacks::RejectCause::GlobalCxLimit);
continue;
} else if (random_.bernoulli(reject_fraction_)) {
} else if ((listener_accept_ != nullptr && listener_accept_->shouldShedLoad()) ||
random_.bernoulli(reject_fraction_)) {
io_handle->close();
cb_.onReject(TcpListenerCallbacks::RejectCause::OverloadAction);
continue;
Expand Down Expand Up @@ -127,5 +128,11 @@ void TcpListenerImpl::setRejectFraction(const UnitFloat reject_fraction) {
reject_fraction_ = reject_fraction;
}

void TcpListenerImpl::configureLoadShedPoints(
Server::LoadShedPointProvider& load_shed_point_provider) {
listener_accept_ =
load_shed_point_provider.getLoadShedPoint("envoy.load_shed_points.tcp_listener_accept");
}

} // namespace Network
} // namespace Envoy
2 changes: 2 additions & 0 deletions source/common/network/tcp_listener_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ class TcpListenerImpl : public BaseListenerImpl {
void disable() override;
void enable() override;
void setRejectFraction(UnitFloat reject_fraction) override;
void configureLoadShedPoints(Server::LoadShedPointProvider& load_shed_point_provider) override;

static const absl::string_view GlobalMaxCxRuntimeKey;

Expand All @@ -45,6 +46,7 @@ class TcpListenerImpl : public BaseListenerImpl {
bool bind_to_port_;
UnitFloat reject_fraction_;
const bool ignore_global_conn_limit_;
Server::LoadShedPoint* listener_accept_{nullptr};
};

} // namespace Network
Expand Down
1 change: 1 addition & 0 deletions source/common/network/udp_listener_impl.h
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ class UdpListenerImpl : public BaseListenerImpl,
void disable() override;
void enable() override;
void setRejectFraction(UnitFloat) override {}
void configureLoadShedPoints(Server::LoadShedPointProvider&) override {}

// Network::UdpListener
Event::Dispatcher& dispatcher() override;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ class ActiveInternalListener : public Network::InternalListener,
}

void setRejectFraction(UnitFloat) override {}
void configureLoadShedPoints(Server::LoadShedPointProvider&) override {}
};

// Network::TcpConnectionHandler
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,12 @@ ConnectionHandlerImpl::ConnectionHandlerImpl(Event::Dispatcher& dispatcher,
: worker_index_(worker_index), dispatcher_(dispatcher),
per_handler_stat_prefix_(dispatcher.name() + "."), disable_listeners_(false) {}

ConnectionHandlerImpl::ConnectionHandlerImpl(Event::Dispatcher& dispatcher,
absl::optional<uint32_t> worker_index,
OverloadManager& overload_manager)
: worker_index_(worker_index), dispatcher_(dispatcher), overload_manager_(overload_manager),
per_handler_stat_prefix_(dispatcher.name() + "."), disable_listeners_(false) {}

void ConnectionHandlerImpl::incNumConnections() { ++num_handler_connections_; }

void ConnectionHandlerImpl::decNumConnections() {
Expand Down Expand Up @@ -68,7 +74,7 @@ void ConnectionHandlerImpl::addListener(absl::optional<uint64_t> overridden_list
ASSERT(config.listenSocketFactories().size() == 1);
details->addActiveListener(config, config.listenSocketFactories()[0]->localAddress(),
listener_reject_fraction_, disable_listeners_,
std::move(internal_listener));
std::move(internal_listener), overload_manager_);
} else if (config.listenSocketFactories()[0]->socketType() == Network::Socket::Type::Stream) {
for (auto& socket_factory : config.listenSocketFactories()) {
auto address = socket_factory->localAddress();
Expand All @@ -78,7 +84,8 @@ void ConnectionHandlerImpl::addListener(absl::optional<uint64_t> overridden_list
std::make_unique<ActiveTcpListener>(
*this, config, runtime,
socket_factory->getListenSocket(worker_index_.has_value() ? *worker_index_ : 0),
address, config.connectionBalancer(*address)));
address, config.connectionBalancer(*address)),
overload_manager_);
}
} else {
ASSERT(config.udpListenerConfig().has_value(), "UDP listener factory is not initialized.");
Expand All @@ -89,7 +96,8 @@ void ConnectionHandlerImpl::addListener(absl::optional<uint64_t> overridden_list
config, address, listener_reject_fraction_, disable_listeners_,
config.udpListenerConfig()->listenerFactory().createActiveUdpListener(
runtime, *worker_index_, *this, socket_factory->getListenSocket(*worker_index_),
dispatcher_, config));
dispatcher_, config),
overload_manager_);
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ class ConnectionHandlerImpl : public ConnectionHandler,
using ActiveTcpListenerOptRef = absl::optional<std::reference_wrapper<ActiveTcpListener>>;

ConnectionHandlerImpl(Event::Dispatcher& dispatcher, absl::optional<uint32_t> worker_index);
ConnectionHandlerImpl(Event::Dispatcher& dispatcher, absl::optional<uint32_t> worker_index,
OverloadManager& overload_manager);

// Network::ConnectionHandler
uint64_t numConnections() const override { return num_handler_connections_; }
Expand Down Expand Up @@ -112,7 +114,7 @@ class ConnectionHandlerImpl : public ConnectionHandler,
void addActiveListener(Network::ListenerConfig& config,
const Network::Address::InstanceConstSharedPtr& address,
UnitFloat& listener_reject_fraction, bool disable_listeners,
ActiveListener&& listener) {
ActiveListener&& listener, OptRef<OverloadManager> overload_manager) {
auto per_address_details = std::make_shared<PerAddressActiveListenerDetails>();
per_address_details->typed_listener_ = *listener;
per_address_details->listener_ = std::move(listener);
Expand All @@ -122,6 +124,9 @@ class ConnectionHandlerImpl : public ConnectionHandler,
}
if (auto* listener = per_address_details->listener_->listener(); listener != nullptr) {
listener->setRejectFraction(listener_reject_fraction);
if (overload_manager) {
listener->configureLoadShedPoints(overload_manager.value());
}
}
per_address_details->listener_tag_ = config.listenerTag();
per_address_details_list_.emplace_back(per_address_details);
Expand All @@ -140,6 +145,7 @@ class ConnectionHandlerImpl : public ConnectionHandler,
// This has a value on worker threads, and no value on the main thread.
const absl::optional<uint32_t> worker_index_;
Event::Dispatcher& dispatcher_;
OptRef<OverloadManager> overload_manager_;
const std::string per_handler_stat_prefix_;
// Declare before its users ActiveListenerDetails.
std::atomic<uint64_t> num_handler_connections_{};
Expand All @@ -155,6 +161,11 @@ class ConnectionHandlerImpl : public ConnectionHandler,

class ConnectionHandlerFactoryImpl : public ConnectionHandlerFactory {
public:
std::unique_ptr<ConnectionHandler>
createConnectionHandler(Event::Dispatcher& dispatcher, absl::optional<uint32_t> worker_index,
OverloadManager& overload_manager) override {
return std::make_unique<ConnectionHandlerImpl>(dispatcher, worker_index, overload_manager);
}
std::unique_ptr<ConnectionHandler>
createConnectionHandler(Event::Dispatcher& dispatcher,
absl::optional<uint32_t> worker_index) override {
Expand Down
3 changes: 3 additions & 0 deletions source/server/listener_manager_factory.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ class ConnectionHandlerFactory : public Config::UntypedFactory {
virtual std::unique_ptr<ConnectionHandler>
createConnectionHandler(Event::Dispatcher& dispatcher,
absl::optional<uint32_t> worker_index) PURE;
virtual std::unique_ptr<ConnectionHandler>
createConnectionHandler(Event::Dispatcher& dispatcher, absl::optional<uint32_t> worker_index,
OverloadManager& overload_manager) PURE;

std::string category() const override { return "envoy.connection_handler"; }
};
Expand Down
Loading

0 comments on commit 1219e3f

Please sign in to comment.