[0009] RFC for centralized metrics storage and visualization. #28

shimkiv · 2023-09-18T06:53:06Z

This is about different experiments and benchmarks results storage and visualization.
This is not about changes to, for example, existing Mina Daemon metrics exposure.

bkase · 2023-09-18T10:50:45Z

0009-metrics-storage-and-visualization.md

+### System Components
+
+- **Traefik**: Manages connectivity for all Docker containers.
+- **InfluxDB**: Acts as the centralized database for metrics storage.


Why InfluxDB over using the storage built-in to Prometheus that we're already using in the protocol code. Should we instead try to use Prometheus in other services beyond the protocol code?

Prometheus and InfluxDB are both highly respected in the monitoring and metrics world. Each tool has its strengths and use cases. However, for our specific requirements, InfluxDB in tandem with Grafana is more suitable. Here's I think why:

Purpose and Design

Prometheus

Prometheus is primarily a monitoring and alerting toolkit. Its main strength is in the collection and real-time processing of metrics.
It is designed for reliability and can operate with a minimal setup.
Its pull-based model is optimized for service discovery and runtime monitoring, scraping metrics from predefined endpoints.
It's an excellent choice for capturing short-term metrics and provides a powerful querying language (PromQL).

InfluxDB

InfluxDB is a purpose-built Time Series Database (TSDB). Its primary strength is in storing, retrieving, and performing operations on time series data.
It is optimized for high write loads and storage efficiency.
It supports long-term storage and can handle vast amounts of time series data without a hiccup.
InfluxQL and Flux provide comprehensive querying capabilities tailored for time-based datasets.

Data Storage and Retention

Prometheus

While Prometheus can retain data for longer periods, it's not its primary use case. Over extended periods, this can lead to challenges in storage management and efficiency.
It's more suited for shorter retention periods (typically hours to weeks).

InfluxDB

Designed for efficient storage and querying of time series data over long periods, making it suitable for our requirement of long-term storage.
Provides more flexibility and efficiency in data retention policies, down-sampling, and data lifecycle management.

Integration with Grafana

Both Prometheus and InfluxDB seamlessly integrate with Grafana, but when using InfluxDB, you get the benefit of a database built for the specific needs of time series visualization.

Scalability

Prometheus

Scaling Prometheus involves federation, where you have multiple Prometheus servers scraping targets.
For long-term storage, you might integrate it with external solutions, adding complexity.

InfluxDB

InfluxDB offers native clustering for horizontal scalability and high availability, making it easier to scale as your dataset grows.

In conclusion

Prometheus excels in real-time metrics collection and alerting, especially when you need to actively monitor and respond to system behaviors. However, for our needs – which revolve around storing metrics for extended periods and analyzing historical data – InfluxDB is a better fit. Paired with Grafana, it provides a powerful, scalable, and efficient solution for long-term metrics storage and visualization.

mrmr1993

Can you add a section on developer experience? I think it's important to have a concrete idea of how easy/hard it will be to integrate with this system for our engineers, and it would be good to re-analyse the alternatives with that aspect in mind too.

shimkiv · 2023-09-20T09:44:44Z

I've updated the PR description just in case to be on the same page when it comes to what this is all about.

RFC for centralized metrics storage and visualization.

0ba3ca4

shimkiv added the metrics label Sep 18, 2023

shimkiv self-assigned this Sep 18, 2023

shimkiv requested review from bkase, nholland94 and mrmr1993 as code owners September 18, 2023 06:53

shimkiv requested review from nicc, garwalsh, mitschabaude and stevenplatt September 18, 2023 06:53

bkase reviewed Sep 18, 2023

View reviewed changes

mrmr1993 reviewed Sep 20, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[0009] RFC for centralized metrics storage and visualization. #28

[0009] RFC for centralized metrics storage and visualization. #28

shimkiv commented Sep 18, 2023 •

edited

Loading

bkase Sep 18, 2023

shimkiv Sep 18, 2023

mrmr1993 left a comment

shimkiv commented Sep 20, 2023

[0009] RFC for centralized metrics storage and visualization. #28

Are you sure you want to change the base?

[0009] RFC for centralized metrics storage and visualization. #28

Conversation

shimkiv commented Sep 18, 2023 • edited Loading

bkase Sep 18, 2023

Choose a reason for hiding this comment

shimkiv Sep 18, 2023

Choose a reason for hiding this comment

Purpose and Design

Prometheus

InfluxDB

Data Storage and Retention

Prometheus

InfluxDB

Integration with Grafana

Scalability

Prometheus

InfluxDB

In conclusion

mrmr1993 left a comment

Choose a reason for hiding this comment

shimkiv commented Sep 20, 2023

shimkiv commented Sep 18, 2023 •

edited

Loading