RONDB-854: Metrics updater for RDRS2 #637

mronstro · 2025-02-08T00:33:36Z

To implement Request statistics we use prometheus-cpp library. However it is not a good idea to call this library on each request. This will kill performance.

To handle this prometheus-cpp offers a possibility to report histograms instead of reporting every response time. In this implementation we have reported 61 entries in the histogram plus 3 for error codes.

This means that request counters can be had by summing all of those histogram counters together.

In addition we keep a counter of number of primary key lookups that RDRS2 is doing towards RonDB. This uses a separate counter.

Also ping and health have separate counters and no response time handling.

Since prometheus end point will likely be called every 10 seconds it means that we report 323 values every 10 seconds. This should also ensure that we don't overload the memory of the prometheus server. Reporting each response time would create hundreds of thousands of rows in prometheus and not likely to be handled well by the prometheus server.

The histogram reports static increments for short response times, for long response times the times are increasing logarithmically instead. This gives good accuracy for common, short response times while still providing some level of accuracy to long response times.

To implement Request statistics we use prometheus-cpp library. However it is not a good idea to call this library on each request. This will kill performance. To handle this prometheus-cpp offers a possibility to report histograms instead of reporting every response time. In this implementation we have reported 61 entries in the histogram plus 3 for error codes. This means that request counters can be had by summing all of those histogram counters together. In addition we keep a counter of number of primary key lookups that RDRS2 is doing towards RonDB. This uses a separate counter. Also ping and health have separate counters and no response time handling. Since prometheus end point will likely be called every 10 seconds it means that we report 323 values every 10 seconds. This should also ensure that we don't overload the memory of the prometheus server. Reporting each response time would create hundreds of thousands of rows in prometheus and not likely to be handled well by the prometheus server. The histogram reports static increments for short response times, for long response times the times are increasing logarithmically instead. This gives good accuracy for common, short response times while still providing some level of accuracy to long response times.

mronstro force-pushed the RONDB-854 branch 4 times, most recently from 3258970 to 558795e Compare February 10, 2025 20:37

mronstro force-pushed the RONDB-854 branch from 558795e to c75e336 Compare February 11, 2025 01:40

mronstro merged commit 8230463 into logicalclocks:24.10-main Feb 11, 2025

mronstro deleted the RONDB-854 branch February 11, 2025 12:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RONDB-854: Metrics updater for RDRS2 #637

RONDB-854: Metrics updater for RDRS2 #637

mronstro commented Feb 8, 2025

RONDB-854: Metrics updater for RDRS2 #637

RONDB-854: Metrics updater for RDRS2 #637

Conversation

mronstro commented Feb 8, 2025