Skip to content
This repository has been archived by the owner on Apr 15, 2024. It is now read-only.

Commit

Permalink
docs: metrics documentation (#675)
Browse files Browse the repository at this point in the history
* docs: metrics documentation

* docs: metrics documentation
  • Loading branch information
rach-id authored Jan 3, 2024
1 parent 014a658 commit 754f500
Show file tree
Hide file tree
Showing 2 changed files with 43 additions and 0 deletions.
22 changes: 22 additions & 0 deletions docs/orchestrator.md
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,28 @@ If the validator still has access to the previously running orchestrator, it wou
Running a second orchestrator in the same machine would require using different P2P listening ports, i.e. changing the `listen-addr` value in the `<orchestrator_home>/config/config.toml` file and using different ports between the two instances.
### Telemetry
The orchestrator supports metrics that describe its runtime and gives more information on its health. The supported metrics are:
- `orchestrator_processed_nonces_counter`: The count of the total number of nonces that have been processed by the orchestrator. During normal conditions, this number will be incremented by 1 every hour, i.e. 400 blocks which is the current data commitment window. The health of the orchestrator can be determined using this metric via checking if it's been constantly signing nonces. If the counter wasn't incremented for more than an hour, the orchestrator might be failing.
- `orchestrator_failed_nonces_counter`: The count of the number of nonces that the orchestrator tried to process, but failed. These nonces might be re-queued to be reprocessed subsequently. If the orchestrator manages to process them correctly, the `orchestrator_processed_nonces_counter` will be incremented. Otherwise, they might be re-enqueued to be re-processed.
- `orchestrator_reprocessed_nonces_counter`: The count of the number of nonces that failed to be processed by the orchestrator, but were re-enqueued.
- `orchestrator_processing_time`: The time it takes for a nonce to be processed or fail after it was picked by the orchestrator processor.
To enable these metrics, make sure to set the `metrics` to true in the orchestrator configuration file:
```toml
# Enables OTLP metrics with HTTP exporter.
metrics = "true"
```
And setup a correct endpoint to connect to an [otel collector](https://opentelemetry.io/docs/collector/installation/), by default it targets the `"localhost:4318"` endpoint. These can also be setup using the command line flags.
The orchestrator provides also the LibP2P native metrics. These are also enabled when the above parameter is set to `true` and are served, by default, to the `"localhost:30001/metrics"`, which can be updated using the orchestrator config file or the command line flags.
An example configuration is provided in the `e2e/telemetry` folder along with the corresponding docker-compose file.
#### Systemd service
If you want to start the orchestrator as a `systemd` service, you could use the following:
Expand Down
21 changes: 21 additions & 0 deletions docs/relayer.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,3 +121,24 @@ To start the relayer using the default home directory, run the following:
> **_NOTE:_** The above command assumes that the necessary configuration is specified in the `<relayer_home>/config/config.toml` file.
Then, you will be prompted to enter your EVM key passphrase for the EVM address passed using the `--evm.account` flag, so that the relayer can use it to send transactions to the target Blobstream smart contract. Make sure that it's funded.

### Telemetry

The relayer supports metrics that describe its runtime and gives more information on its health. The supported metrics are:

- `relayer_processed_nonces_counter`: The count of the total number of nonces that have been processed by the relayer. During normal conditions, this number will be incremented by 1 every hour, i.e. 400 blocks which is the current data commitment window. The health of the relayer can be determined using this metric via checking if it's been constantly signing nonces. If the counter wasn't incremented for more than an hour, the relayer might be failing.
- `relayer_number_of_failures`: The number of failures the relayer failed to relay a nonce.
- `relayer_processing_time`: The time it takes for a nonce to be processed or fail after it was picked by the relayer.

To enable these metrics, make sure to set the `metrics` to true in the relayer configuration file:

```toml
# Enables OTLP metrics with HTTP exporter.
metrics = "true"
```

And setup a correct endpoint to connect to an [otel collector](https://opentelemetry.io/docs/collector/installation/), by default it targets the `"localhost:4318"` endpoint. These can also be setup using the command line flags.

The relayer provides also the LibP2P native metrics. These are also enabled when the above parameter is set to `true` and are served, by default, to the `"localhost:30001/metrics"`, which can be updated using the relayer config file or the command line flags.

An example configuration is provided in the `e2e/telemetry` folder along with the corresponding docker-compose file `e2e/docker-compose.yml`.

0 comments on commit 754f500

Please sign in to comment.