Vouch provides comprehensive metrics to check the health and performance of its activities. This document describes the metrics available for Prometheus and similar monitoring systems.
The metrics server listens on the address provided by the metrics.address
configuration value, and makes metrics available at the /metrics
endpoint.
There are a number of metrics that provide general information about Vouch. Specifically:
vouch_release
contains the version of Vouch, in theversion
labelvouch_ready
is set to1
when Vouch is ready to start attesting, and0
otherwise. If this number stays at 0 it implies a configuration or connection issue that should be addressedvouch_epochs_processed_total
is set to the number of epochs for which Vouch has been attesting. This number resets to 0 when Vouch restarts, and increments every time Vouch starts to process an epoch; if it fails to increment it implies that Vouch has stopped processingvouch_start_time_secs
is the unix timestamp of the time that Vouch started. This value will remain the same throughout a run of Vouch; if it increments it implies that Vouch has restarted.
In addition, high level metrics track the latest slot for which Vouch carried out a successful operation:
vouch_attestation_process_latest_slot
the latest slot for which Vouch carried out an attestation. As long as Vouch has validators, this is expected to increment approximately every 6.5 minutes, although it is possible to have to wait up to 13 minutes due to the random assignment of slots to validatorsvouch_attestationaggregation_process_latest_slot
the latest slot for which Vouch carried out an attestation aggregation. This is a relatively infrequent occurrencevouch_beaconblockproposal_process_latest_slot
the latest slot for which Vouch carried out a proposal. This is a relatively infrequent occurrencevouch_synccommitteeaggregation_process_latest_slot
the latest slot for which Vouch carried out a sync committee aggregation. This is a very infrequent occurrencevouch_synccommitteemessage_process_latest_slot
the latest slot for which Vouch generated a sync committee message. This is a very infrequent occurrence
There are also counts for each process. The specific metrics are:
vouch_beaconblockproposal_process_requests_total
number of beacon block proposal processes;vouch_attestation_process_requests_total
number of attestation processes;vouch_beaconcommitteesubscription_process_requests_total
number of beacon committee subscription processes; andvouch_attestationaggregation_process_requests_total
number of attestation aggregation processes.
All of the metrics have the label "result" with the value either "succeeded" or "failed". Any increase in the latter values implies the validator is not completing all of its activities, and should be investigated.
Vouch keeps track of the number of accounts for which it is validating in the vouch_accountmanager_accounts_total
metric. This metric has one label, state
, which can take one of the following values:
unknown
the validator is not known to the Ethereum 2 networkpending_initialized
the validator is known to the Ethereum 2 network but not yet in the queue to be activatedpending_queued
the validator is in the queue to be activatedactive_ongoing
the validator is activeactive_exiting
the validator is active but stopping its dutiesexited_unslashed
the validator has exited without being slashedexited_slashed
the validator has exited after being slashedwithdrawal_possible
the validator's funds are applicable for withdrawal (although withdrawal is not possible in phase 0)
Vouch will attest for accounts that are either active_ongoing
or active_exiting
. Any increase in active_exiting
should be matched with valid exit requests. Any increase in active_slashed
suggests a problem with the validator setup that should be investigated as a matter of urgency.
Vouch uses marks to show the point in time within a slot at which it completes its various operations. The mark is made after the operation has submitted any results of its work to its beacon nodes, and so can be used to confirm that Vouch is acting in a timely fashion. Each mark is a histogram from 0 to 12 seconds, in 0.1 second increments. The marks are as follows:
vouch_attestation_mark_seconds
is the time in the slot at which the attestation(s) for the slot have been submitted to the beacon nodes. In a healthy network it would be expected that the majority of these would be before the 6 second mark. Any significant number of these after the 7 second mark suggests that part of the validating infrastructure may be slow, and should be investigatedvouch_beaconblockproposal_mark_seconds
is the time in the slot at which the block for the slot has been submitted to the beacon nodes. In a healthy network it would be expected that the majority of these would be before the 1 second mark. Any significant number of these after the 2 second mark suggests that part of the validating infrastructure may be slow, and should be investigatedvouch_synccommitteemessage_mark_seconds
is the time in the slot at which the sync committee message(s) for the slot has been submitted to the beacon nodes. In a healthy network it would be expected that the majority of these would be before the 6 second mark. Any significant number of these after the 7 second mark suggests that part of the validating infrastructure may be slow, and should be investigatedvouch_attestationaggregation_mark_seconds
is the time in the slot at which attestation aggregation message(s) for the slot has been submitted to the beacon nodes. In a healthy network it would be expected that the majority of these would be before the 9 second mark. Any significant number of these after the 10 second mark suggests that part of the validating infrastructure may be slow, and should be investigated. It would also be expected that all of these message would be after the 8 second mark. Any number of these before the 8 second mark suggests a problem with system time, and should be investigatedvouch_synccommitteeaggregation_mark_seconds
is the time in the slot at which sync committee aggregation message(s) for the slot has been submitted to the beacon nodes. In a healthy network it would be expected that the majority of these would be before the 9 second mark. Any significant number of these after the 10 second mark suggests that part of the validating infrastructure may be slow, and should be investigated. It would also be expected that all of these message would be after the 8 second mark. Any number of these before the 8 second mark suggests a problem with system time, and should be investigated
Performance metrics provide a mechanism to understand how quickly Vouch is carrying out its activities. The following information is provided:
vouch_attestation_process_duration_seconds
time taken to carry out the attestation processvouch_attestationaggregation_process_duration_seconds
time taken to carry out the attestation aggregation processvouch_beaconblockproposal_process_duration_seconds
time taken to carry out the beacon block proposal processvouch_beaconcommitteesubscription_process_duration_seconds
time taken to carry out the beacon committee subscription processvouch_synccommitteeaggregation_process_duration_seconds
time taken to carry out the sync committee aggregation processvouch_synccommitteemessage_process_duration_seconds
time taken to carry out the sync committee message processvouch_synccommitteesubscription_process_duration_seconds
time taken to carry out the sync committee subscription process
These metrics are provided as histograms, with buckets in increments of 0.1 seconds up to 2 seconds.
A major part of Vouch's work is in the strategy section, where it selects the appropriate data to sign. Data that combines the provider of the data along with the time taken to obtain and evaluate it contained in the vouch_strategy_operation_duration_seconds
metric. This is a histogram with buckets in increments of 0.1 seconds up to 4 seconds. It has three labels:
strategy
is the strategy for the operationprovider
is the provider for the operationoperation
is the operation that took place (e.g. "beacon block proposal")
Operations metrics provide information about Vouch's internal operations. These are generally lower-level information that can be useful to monitor activities for fine-tuning of server parameters, comparing one instance to another, etc.
Vouch's job scheduler provides a number of metrics. The specific metrics are:
vouch_scheduler_jobs_scheduled_total
number of jobs scheduled. This is expected to increment periodically throughout Vouch's runtimevouch_scheduler_jobs_cancelled_total
number of jobs cancelled. This increments when chain reorganizations occur, and pre-scheduled jobs are no longer validvouch_scheduler_jobs_started_total
number of jobs started. This has a labeltrigger
which can be "timer" if the job ran due to reaching its designated start time or "signal" if the job ran due to being triggered before its designated start time
Each of the above metrics also has a class
label which defines the general class of the job running. Possible values include:
Aggregate attestations
jobs relating to aggregating attestationsAggregate sync committee messages
jobs relating to aggregating sync committee messagesAttest
jobs relating to attestingEpoch
jobs relating to operations run in preparation for or at the start of epochsGenerate sync committee messages
jobs relating to generating sync committee messagesPrepare for sync committee messages
jobs relating to preparation of sync committee message generationPropose
jobs relating to proposing blocksRefresh accounts
jobs relating to updating internal account information
Client operations metrics provide information about the response time of beacon nodes, as well as if the request to them succeeded or failed. This can be used to understand how quickly and how well beacon nodes are responding to requests, for example if Vouch using multiple beacon nodes in different data centres this can be used to obtain data about their response times due to network latency.
vouch_client_operation_duration_seconds
is provided as a histogram, with buckets in increments of 0.1 seconds up to 4 seconds. It has two labels:
proposer
is the endpoint for the operationoperation
is the operation that took place (e.g. "beacon block proposal")
There is also a companion metric vouch_client_operation_requests_total
, which is a simple count of the number of operations that have taken place. It has three labels:
proposer
is the endpoint for the operationoperation
is the operation that took place (e.g. "beacon block proposal")result
is the result of the operation, either "succeeded" or "failed"
vouch_strategy_operation_used
provides details of the outcome of strategies, where one piece of data is obtained from a number of providers. It has three labels:
operation
is the operation that took place (e.g. "beacon block proposal")provider
is the provider of the information selected by the strategystrategy
is the strategy used to select the outcome
Network metrics provide information about the network from Vouch's point of view. Although these are not under Vouch's control, they have an impact on the performance of the validator. The specific metrics are:
vouch_block_receipt_delay_seconds
the delay between the start of a slot and the arrival of the block for that slot. This metric is provided as a histogram, with buckets in increments of 0.1 seconds up to 12 seconds. This has a labelepoch_slot
which is the position of the slot in the epoch (0 through 31, inclusive)vouch_attestationaggregation_coverage_ratio
the ratio of the number of attestations included in the aggregate to the total number of attestations for the aggregate. This metric is provided as a histogram, with buckets in increments of 0.1 up to 1.vouch_synccommitteeaggregation_coverage_ratio
the ratio of the number of sync committee messages included in the aggregate to the total number of members of the sync committee for the aggregate. This metric is provided as a histogram, with buckets in increments of 0.1 up to 1.
Relay metrics provide information about the performance, both individually and comparatively, of the block relays configured for use.
vouch_relay_auction_block_duration_seconds
is provided as a histogram, with buckets in increments of 0.1 seconds up to 4 seconds. It provides details of the total time taken for Vouch to obtain the best bid from competing relays. There is also a companion metric vouch_relay_auction_block_duration_seconds_count
, which is a simple count of the number of operations that have taken place.
vouch_relay_auction_block_used_total
provides the number of blocks used. It has two labels:
provider
is the address of the relay used from which the winning bid comescategory
is the categorization of the builder from which the winning bid comes. This is free-form text, and supplied by the user in the builder confguration (defaults to "standard" if no category is supplied)
vouch_relay_builder_bid_delta_meth_bucket
is provided as a histogram, with buckets in increments of 10 milliEther up to 1 Ether. It provides details of the difference in value between the winning bid and the bid from the given provider. It has a single label:
provider
is the address of the relay used from which a losing bid comes
There is also a companion metric vouch_relay_auction_block_duration_seconds_count
, which is a simple count of the number of operations that have taken place.
vouch_relay_builder_bid_duration_seconds_bucket
is provided as a histogram, with buckets in increments of 0.1 seconds up to 4 seconds. It provides details of the total time taken for Vouch to serve builder bid requests from beacon nodes. There is also a companion metric vouch_relay_builder_bid_duration_seconds_count
, which is a simple count of the number of operations that have taken place.
vouch_relay_execution_config_duration_seconds_bucket
is provided as a histogram, with buckets in increments of 0.1 seconds up to 4 seconds. It provides details of the total time taken for Vouch to obtain the execution configuration from the local or remote source. There is also a companion metric vouch_relay_execution_config_duration_seconds_count
, which is a simple count of the number of operations that have taken place.
vouch_relay_validator_registrations_duration_seconds_bucket
is provided as a histogram, with buckets in increments of 0.1 seconds up to 4 seconds. It provides details of the total time taken for Vouch to serve validator registration requests from beacon nodes. There is also a companion metric vouch_relay_validator_registrations_duration_seconds_count
, which is a simple count of the number of operations that have taken place.
Sync Committee Verification metrics can be enabled using the controller.verify-sync-committee-inclusion
flag in the configuration. This gives more insight in to the participation of Sync Committee duties:
vouch_synccommitteeverification_current_assigned
is a gauge that is set to the current number of vouch validators that are participating in Sync Committee duty.vouch_synccommitteeverification_mismatches_total
is a counter that increments for each Sync Committee participating validator when vouch receives a head event where the parent block root does not match the root vouch broadcast in the Sync Committee messages.vouch_synccommitteeverification_found_total
is a counter that increments for each vouch validator that has been included in the SyncAggregate. This is not incremented if we already detected a root mismatch or if we didn't record the Sync Committee head (expected after a restart)vouch_synccommitteeverification_missing_total
is a counter that increments for each vouch validator that has NOT been included in the SyncAggregate. This is not incremented if we already detected a root mismatch or if we didn't record the Sync Committee head (expected after a restart)vouch_synccommitteeverification_get_head_failures_total
is a counter that increments if we fail to get the head block to verify validator inclusion in the SyncAggregate. This is incremented once per slot we fail.
Note: It is expected that the sum of (found_total
+ missing_total
+ mismatches_total
) == number of Sync Committee participating validators for a given slot.