Skip to content

Latest commit

 

History

History
113 lines (82 loc) · 6.65 KB

METRICS.md

File metadata and controls

113 lines (82 loc) · 6.65 KB

Metrics Collection and Reporting

refiner is proactively enabled with metrics collection via prometheus.

Config

Install Prometheus https://prometheus.io/download/

  • Edit /opt/homebrew/etc/prometheus.yml for mac/m1. or
  • Edit /usr/local/etc/prometheus.yml for linux/x86.

Add the config for prometheus to pick up exported refiner telemetry metrics.

Restart your prometheus server

brew services restart prometheus

Monitoring can be setup (for example) by plugging the endpoint serving in prometheus-format into a grafana plugin, which can be viewed in grafana - sliced and diced further as per need per metric.

Metrics

The following metrics captured from refiner are exported with /metrics endpoint via prometheus.

# TYPE refiner_events_refiner_pipeline_success_duration gauge
refiner_events_refiner_pipeline_success_duration{operation="pipeline_success",table="refiner_metrics"} 0.004265
# TYPE refiner_events_refiner_pipeline_success_count counter
refiner_events_refiner_pipeline_success_count{operation="pipeline_success",table="refiner_metrics"} 4
# TYPE refiner_events_journal_fetch_items_duration gauge
refiner_events_journal_fetch_items_duration{operation="fetch_items",table="journal_metrics"} 1.2e-5
# TYPE refiner_events_journal_fetch_items_count counter
refiner_events_journal_fetch_items_count{operation="fetch_items",table="journal_metrics"} 1
# TYPE refiner_events_journal_fetch_last_duration gauge
refiner_events_journal_fetch_last_duration{operation="fetch_last",table="journal_metrics"} 3.6e-5
# TYPE refiner_events_journal_fetch_last_count counter
refiner_events_journal_fetch_last_count{operation="fetch_last",table="journal_metrics"} 1
# TYPE refiner_events_brp_proof_duration gauge
refiner_events_brp_proof_duration{operation="proof",table="brp_metrics"} 6.259999999999999e-4
# TYPE refiner_events_brp_proof_count counter
refiner_events_brp_proof_count{operation="proof",table="brp_metrics"} 4
# TYPE refiner_events_brp_upload_success_duration gauge
refiner_events_brp_upload_success_duration{operation="upload_success",table="brp_metrics"} 0.0023769999999999998
# TYPE refiner_events_brp_upload_success_count counter
refiner_events_brp_upload_success_count{operation="upload_success",table="brp_metrics"} 4
# TYPE refiner_events_bsp_execute_duration gauge
refiner_events_bsp_execute_duration{operation="execute",table="bsp_metrics"} 2.1799999999999999e-4
# TYPE refiner_events_bsp_execute_count counter
refiner_events_bsp_execute_count{operation="execute",table="bsp_metrics"} 4
# TYPE refiner_events_bsp_decode_duration gauge
refiner_events_bsp_decode_duration{operation="decode",table="bsp_metrics"} 0.0
# TYPE refiner_events_bsp_decode_count counter
refiner_events_bsp_decode_count{operation="decode",table="bsp_metrics"} 4
# TYPE refiner_events_ipfs_fetch_duration gauge
refiner_events_ipfs_fetch_duration{operation="fetch",table="ipfs_metrics"} 0.001588
# TYPE refiner_events_ipfs_fetch_count counter
refiner_events_ipfs_fetch_count{operation="fetch",table="ipfs_metrics"} 4
# TYPE refiner_events_ipfs_pin_duration gauge
refiner_events_ipfs_pin_duration{operation="pin",table="ipfs_metrics"} 0.00174
# TYPE refiner_events_ipfs_pin_count counter
refiner_events_ipfs_pin_count{operation="pin",table="ipfs_metrics"} 4

API

View exported gauges and counters using prometheus at the endpoint -> http://localhost:9568/metrics.

Create graphs using prometheus at the endpoint -> http://localhost:9090/graph.

View timeseries and add alerting with grafana at the endpoint -> http://localhost:3000/explore.

Docker containers automatically export to this endpoint as well via exposed ports and port forwarding.

Graph

Observe live the gauge time series graphs with plots for example with metrics for pipeline_success and ipfs_fetch -> http://localhost:9090/graph?g0.expr=refiner_events_refiner_pipeline_success_duration&g0.tab=0&g0.stacked=1&g0.show_exemplars=0&g0.range_input=15m&g0.step_input=1&g1.expr=refiner_events_ipfs_fetch_duration&g1.tab=0&g1.stacked=1&g1.show_exemplars=1&g1.range_input=15m&g1.step_input=1

Observe

Monitor & Alert

For monitoring and alerting we advice using Grafana (in conjunction with the aggregated prometheus metrics). Import the prebuilt dashboard for Refiner into Grafana here

Dashboard

Install and start Grafana

brew install grafana
brew services start grafana

Ensure Grafana (default port 3000) and Prometheus (default port 9090) have started.

$ brew services list
Name          Status  User   File
grafana       started user ~/Library/LaunchAgents/homebrew.mxcl.grafana.plist
prometheus    started user ~/Library/LaunchAgents/homebrew.mxcl.prometheus.plist

Login to your Grafana dashboard -> http://localhost:3000/.

Make sure prometheus is added as a data source -> http://localhost:3000/datasources with the default values for prometheus. Click on Explore.

Select the metrics and time-series data to view from the dropdown with "Select Metric". Below is an example of three selections refiner_events_brp_upload_success_duration, refiner_events_refiner_pipeline_success_duration, refiner_events_ipfs_fetch_duration.

This can directly be viewed here. You can also add operations on the exported data with aggregations like sum and range functions like delta etc as seen below.

grafana