Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RONDB-854: Metrics updater for RDRS2 #637

Merged
merged 1 commit into from
Feb 11, 2025

Conversation

mronstro
Copy link
Collaborator

@mronstro mronstro commented Feb 8, 2025

To implement Request statistics we use prometheus-cpp library. However it is not a good idea to call this library on each request. This will kill performance.

To handle this prometheus-cpp offers a possibility to report histograms instead of reporting every response time. In this implementation we have reported 61 entries in the histogram plus 3 for error codes.

This means that request counters can be had by summing all of those histogram counters together.

In addition we keep a counter of number of primary key lookups that RDRS2 is doing towards RonDB. This uses a separate counter.

Also ping and health have separate counters and no response time handling.

Since prometheus end point will likely be called every 10 seconds it means that we report 323 values every 10 seconds. This should also ensure that we don't overload the memory of the prometheus server. Reporting each response time would create hundreds of thousands of rows in prometheus and not likely to be handled well by the prometheus server.

The histogram reports static increments for short response times, for long response times the times are increasing logarithmically instead. This gives good accuracy for common, short response times while still providing some level of accuracy to long response times.

@mronstro mronstro force-pushed the RONDB-854 branch 4 times, most recently from 3258970 to 558795e Compare February 10, 2025 20:37
To implement Request statistics we use prometheus-cpp library.
However it is not a good idea to call this library on each request.
This will kill performance.

To handle this prometheus-cpp offers a possibility to report histograms
instead of reporting every response time. In this implementation we have
reported 61 entries in the histogram plus 3 for error codes.

This means that request counters can be had by summing all of those
histogram counters together.

In addition we keep a counter of number of primary key lookups that RDRS2
is doing towards RonDB. This uses a separate counter.

Also ping and health have separate counters and no response time handling.

Since prometheus end point will likely be called every 10 seconds it means
that we report 323 values every 10 seconds. This should also ensure that
we don't overload the memory of the prometheus server. Reporting each
response time would create hundreds of thousands of rows in prometheus
and not likely to be handled well by the prometheus server.

The histogram reports static increments for short response times, for
long response times the times are increasing logarithmically instead.
This gives good accuracy for common, short response times while still
providing some level of accuracy to long response times.
@mronstro mronstro merged commit 8230463 into logicalclocks:24.10-main Feb 11, 2025
@mronstro mronstro deleted the RONDB-854 branch February 11, 2025 12:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants