-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add condition to flush based on time #4
Conversation
Currently the Atlas high performance mode is enabled for all apps. This mode enables batching of metrics to reduce the number of events sent to spectatord. This high performance mode has a negative impact on really low RPS apps, because they can end up emitting IPC metrics every 2-4 minutes rather than every minute. This commit updates the buffering logic to ensure we emit metrics if some configurable amount of time has passed (flush_interval).
9aad5c1
to
c649f23
Compare
|
||
// Wait for 3 seconds, increment, and the counter should not be flushed (3s is less than the 5s flush interval) | ||
std::this_thread::sleep_for(std::chrono::seconds(3)); | ||
c.Increment(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incrementing after the sleeps is needed since the increment itself is what triggers the logic in setup_unix_domain
to determine whether a flush is needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
spectator/config.h
Outdated
@@ -9,6 +9,7 @@ struct Config { | |||
std::string endpoint; | |||
std::unordered_map<std::string, std::string> common_tags; | |||
uint32_t bytes_to_buffer; | |||
std::chrono::seconds flush_interval; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
spectator/publisher.h
Outdated
@@ -34,6 +35,8 @@ class SpectatordPublisher { | |||
asio::local::datagram_protocol::socket local_socket_; | |||
std::string buffer_; | |||
uint32_t bytes_to_buffer_; | |||
std::chrono::steady_clock::time_point last_flush_time_; | |||
std::chrono::seconds flush_interval_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
spectator/publisher.h
Outdated
@@ -34,6 +35,8 @@ class SpectatordPublisher { | |||
asio::local::datagram_protocol::socket local_socket_; | |||
std::string buffer_; | |||
uint32_t bytes_to_buffer_; | |||
std::chrono::steady_clock::time_point last_flush_time_; | |||
std::chrono::seconds flush_interval_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should have the flush interval be mills just for additional granularity.
spectator/publisher.cc
Outdated
@@ -55,11 +59,16 @@ void SpectatordPublisher::setup_unix_domain(absl::string_view path) { | |||
std::string local_path{path}; | |||
sender_ = [local_path, this](std::string_view msg) { | |||
buffer_.append(msg); | |||
if (buffer_.length() >= bytes_to_buffer_) { | |||
auto now = std::chrono::steady_clock::now(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
spectator/publisher.cc
Outdated
@@ -55,11 +59,16 @@ void SpectatordPublisher::setup_unix_domain(absl::string_view path) { | |||
std::string local_path{path}; | |||
sender_ = [local_path, this](std::string_view msg) { | |||
buffer_.append(msg); | |||
if (buffer_.length() >= bytes_to_buffer_) { | |||
auto now = std::chrono::steady_clock::now(); | |||
bool should_flush = buffer_.length() >= bytes_to_buffer_ || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const
spectator/publisher.cc
Outdated
if (buffer_.length() >= bytes_to_buffer_) { | ||
auto now = std::chrono::steady_clock::now(); | ||
bool should_flush = buffer_.length() >= bytes_to_buffer_ || | ||
std::chrono::duration_cast<std::chrono::seconds>(now - last_flush_time_) >= flush_interval_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to do duration cast
spectator/publisher_test.cc
Outdated
logger->info("Unix Server started on path {}", path); | ||
|
||
// Set buffer size to a large value so that flushing is based on time | ||
SpectatordPublisher publisher{fmt::format("unix:{}", path), 10000, std::chrono::seconds(5)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a blocker here, but ideally we'd have some sort of test time system instead of a real clock.
The change overall looks good to me - however I still quite don't get why we need it based on my previous tests. Here's a quick math too: Envoy flushes every 5 seconds and we send the metrics here no matter what the size, every 60 seconds. That means in the worst case we'd have 12 batches of whole Envoy metrics. The current batch limit is 60KB, so for this to help in every flush interval we need to be sending less than 5KB of metric lines. Last time I checked we were above this already. |
@cancecen This is related to the special handling that happens for IPC metrics. They're not flushed with everything else: Note the difference between IPC and HTTP metrics: |
Is there something else that could cause that sort of un-evenness? |
yeah, but they get sent out to the buffer immediately. |
I don't think so. Did you observe evenness when you tested this with your change? I can't remember if you showed the fixed graph in your demo, sorry. |
Currently the Atlas high performance mode is enabled for all apps. This mode enables batching of metrics to reduce the number of events sent to spectatord.
This high performance mode has a negative impact on low RPS apps, because they can end up emitting IPC metrics every 2-4 minutes rather than every minute. That's a needless regression relative to SBN's behavior.
This commit updates the buffering logic to ensure we emit metrics if some configurable amount of time has passed (flush_interval).
High RPS apps should not be impacted by this change since the idea is to set this interval at ~60 seconds, and high RPS apps are already flushing more often than that.
How was this tested?
docker build --memory=4g -t spectator-cpp:latest .; docker run -it spectator-cpp:latest