Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[0009] RFC for centralized metrics storage and visualization. #28

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

shimkiv
Copy link
Member

@shimkiv shimkiv commented Sep 18, 2023

This is about different experiments and benchmarks results storage and visualization.
This is not about changes to, for example, existing Mina Daemon metrics exposure.

### System Components

- **Traefik**: Manages connectivity for all Docker containers.
- **InfluxDB**: Acts as the centralized database for metrics storage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why InfluxDB over using the storage built-in to Prometheus that we're already using in the protocol code. Should we instead try to use Prometheus in other services beyond the protocol code?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prometheus and InfluxDB are both highly respected in the monitoring and metrics world. Each tool has its strengths and use cases. However, for our specific requirements, InfluxDB in tandem with Grafana is more suitable. Here's I think why:

Purpose and Design

Prometheus

Prometheus is primarily a monitoring and alerting toolkit. Its main strength is in the collection and real-time processing of metrics.
It is designed for reliability and can operate with a minimal setup.
Its pull-based model is optimized for service discovery and runtime monitoring, scraping metrics from predefined endpoints.
It's an excellent choice for capturing short-term metrics and provides a powerful querying language (PromQL).

InfluxDB

InfluxDB is a purpose-built Time Series Database (TSDB). Its primary strength is in storing, retrieving, and performing operations on time series data.
It is optimized for high write loads and storage efficiency.
It supports long-term storage and can handle vast amounts of time series data without a hiccup.
InfluxQL and Flux provide comprehensive querying capabilities tailored for time-based datasets.

Data Storage and Retention

Prometheus

While Prometheus can retain data for longer periods, it's not its primary use case. Over extended periods, this can lead to challenges in storage management and efficiency.
It's more suited for shorter retention periods (typically hours to weeks).

InfluxDB

Designed for efficient storage and querying of time series data over long periods, making it suitable for our requirement of long-term storage.
Provides more flexibility and efficiency in data retention policies, down-sampling, and data lifecycle management.

Integration with Grafana

Both Prometheus and InfluxDB seamlessly integrate with Grafana, but when using InfluxDB, you get the benefit of a database built for the specific needs of time series visualization.

Scalability

Prometheus

Scaling Prometheus involves federation, where you have multiple Prometheus servers scraping targets.
For long-term storage, you might integrate it with external solutions, adding complexity.

InfluxDB

InfluxDB offers native clustering for horizontal scalability and high availability, making it easier to scale as your dataset grows.

In conclusion

Prometheus excels in real-time metrics collection and alerting, especially when you need to actively monitor and respond to system behaviors. However, for our needs – which revolve around storing metrics for extended periods and analyzing historical data – InfluxDB is a better fit. Paired with Grafana, it provides a powerful, scalable, and efficient solution for long-term metrics storage and visualization.

Copy link
Member

@mrmr1993 mrmr1993 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a section on developer experience? I think it's important to have a concrete idea of how easy/hard it will be to integrate with this system for our engineers, and it would be good to re-analyse the alternatives with that aspect in mind too.

@shimkiv
Copy link
Member Author

shimkiv commented Sep 20, 2023

I've updated the PR description just in case to be on the same page when it comes to what this is all about.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants