diff --git a/cardano-tracer/CHANGELOG.md b/cardano-tracer/CHANGELOG.md index f055f08eaec..16439b90084 100644 --- a/cardano-tracer/CHANGELOG.md +++ b/cardano-tracer/CHANGELOG.md @@ -1,14 +1,17 @@ # ChangeLog -## 0.3 (September 20, 2024) +## 0.3 (September 26, 2024) * Abondon `snap` webserver in favour of `wai`/`warp` for Prometheus and EKG Monitoring. * Add dynamic routing to EKG stores of all connected nodes. * Derive URL compliant routes from connected node names (instead of plain node names). * Remove the requirement of two distinct ports for the EKG backend (changing `hasEKG` config type). +* Improved OpenMetrics compliance of Prometheus exposition; also addresses [issue#5140][i5140]. +* Prometheus help annotations can be provided via the new optional config value `metricsHelp`. * For optional RTView component only: Disable SSL/https connections. Force `snap-server` dependency to build with `-flag -openssl`. * Add JSON responses when listing connected nodes for both Prometheus and EKG Monitoring. +* Fix: actually send `forHuman` rendering output to journald when specified. * Add consistency check for redundant port values in the config. ## 0.2.4 (August 13, 2024) @@ -48,3 +51,7 @@ ## 0.1.0 Initial version. + + + +[i5140]: https://github.com/IntersectMBO/cardano-node/issues/5140 \ No newline at end of file diff --git a/cardano-tracer/docs/cardano-tracer.md b/cardano-tracer/docs/cardano-tracer.md index 7331223c1a4..37bfab9bc8c 100644 --- a/cardano-tracer/docs/cardano-tracer.md +++ b/cardano-tracer/docs/cardano-tracer.md @@ -4,21 +4,24 @@ # Contents -1. [Introduction](#Introduction) - 1. [Motivation](#Motivation) - 3. [Overview](#Overview) -2. [Build and run](#Build-and-run) -3. [Configuration](#Configuration) - 1. [Distributed Scenario](#Distributed-scenario) - 2. [Local Scenario](#Local-scenario) - 3. [Network Magic](#Network-magic) - 4. [Requests](#Requests) - 5. [Logging](#Logging) - 6. [Logs Rotation](#Logs-rotation) - 7. [Prometheus](#Prometheus) - 8. [EKG Monitoring](#EKG-monitoring) - 9. [Verbosity](#Verbosity) - 10. [RTView](#RTView) +- [Cardano Tracer](#cardano-tracer) +- [Contents](#contents) +- [Introduction](#introduction) + - [Motivation](#motivation) + - [Overview](#overview) +- [Build and run](#build-and-run) +- [Configuration](#configuration) + - [Distributed Scenario](#distributed-scenario) + - [Important](#important) + - [Local Scenario](#local-scenario) + - [Network Magic](#network-magic) + - [Requests](#requests) + - [Logging](#logging) + - [Logs Rotation](#logs-rotation) + - [Prometheus](#prometheus) + - [EKG Monitoring](#ekg-monitoring) + - [Verbosity](#verbosity) + - [RTView](#rtview) # Introduction @@ -390,20 +393,51 @@ $ curl --silent -H "Accept: application/json" '127.0.0.1:3200' | jq '.' } ``` -The Promethus output is a map from Prometheus metric to value: +Prometheus uses the text-based exposition format, complete with `# TYPE` and `# HELP` annotations. The latter ones have to be provided by the `metricsHelp` config value (see below). + +The output should be [OpenMetrics](https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#text-format) compliant. Example snippet: ``` $ curl '127.0.0.1:3200/12700130004' -blockNum_int 35 -rts_gc_init_cpu_ms 5 -rts_gc_par_tot_bytes_copied 0 -served_block_counter 31 -submissions_accepted_counter 2771 -density_real 5.7692307692307696e-2 -blocksForged_int 6 - +# TYPE Mem_resident_int gauge +# HELP Mem_resident_int Kernel-reported RSS (resident set size) +Mem_resident_int 103792640 +# TYPE rts_gc_max_bytes_used gauge +rts_gc_max_bytes_used 5811512 +# TYPE rts_gc_gc_cpu_ms counter +rts_gc_gc_cpu_ms 50 +# TYPE RTS_gcMajorNum_int gauge +# HELP RTS_gcMajorNum_int Major GCs +RTS_gcMajorNum_int 4 +# TYPE rts_gc_par_avg_bytes_copied gauge +rts_gc_par_avg_bytes_copied 0 +# TYPE rts_gc_num_bytes_usage_samples counter +rts_gc_num_bytes_usage_samples 4 +# TYPE remainingKESPeriods_int gauge +remainingKESPeriods_int 62 +# TYPE rts_gc_bytes_copied counter +rts_gc_bytes_copied 17114384 +# TYPE nodeCannotForge_int gauge +# HELP nodeCannotForge_int How many times was this node unable to forge [a block]? +# EOF +``` + +Passing metric help annotations to the service can be done in the config file, either as a key-value map from metric name to help text, or as a seperate JSON file containing such a map. +The system's internal metric names have to be used as keys (cf. [metrics documentation](https://github.com/input-output-hk/cardano-node-wiki/blob/main/docs/new-tracing/tracers_doc_generated.md#metrics)). +``` +"metricsHelp": "path/to/key-value-map.json" +``` +or +``` +"metricsHelp": { + "Mem.resident": "Kernel-reported RSS (resident set size)", + "RTS.gcMajorNum": "Major GCs", + "nodeCannotForge": "How many times was this node unable to forge [a block]?" +} ``` + + ## EKG Monitoring At top-level route `/` EKG gives a list of connected nodes. diff --git a/cardano-tracer/src/Cardano/Tracer/Handlers/Metrics/Prometheus.hs b/cardano-tracer/src/Cardano/Tracer/Handlers/Metrics/Prometheus.hs index 2dee1a8d081..402e3ee6f57 100644 --- a/cardano-tracer/src/Cardano/Tracer/Handlers/Metrics/Prometheus.hs +++ b/cardano-tracer/src/Cardano/Tracer/Handlers/Metrics/Prometheus.hs @@ -35,24 +35,34 @@ import qualified System.Metrics as EKG import System.Metrics (Sample, Value (..), sampleAll) import System.Time.Extra (sleep) --- | Runs simple HTTP server that listens host and port and returns --- the list of currently connected nodes in such a format: +-- | Runs a simple HTTP server that listens on @endpoint@. -- --- * relay-1 --- * relay-2 --- * core-1 +-- At the root, it lists the connected nodes, either as HTML or JSON, depending +-- on the requests 'Accept: ' header. -- --- where 'relay-1', 'relay-2' and 'core-1' are nodes' names. +-- Routing is dynamic, depending on the connected nodes. A valid URL is derived +-- from the nodeName configured for the connecting node. E.g. a node name +-- of `127.0.0.1:30004` will result in the route `/12700130004` which +-- renders that node's Prometheus / OpenMetrics text exposition: -- --- Each of list items is a href. By clicking on it, the user will be --- redirected to the page with the list of metrics received from that node, --- in such a format: --- --- rts_gc_par_tot_bytes_copied 0 --- rts_gc_num_gcs 17 --- rts_gc_max_bytes_slop 15888 --- rts_gc_bytes_copied 165952 --- ekg_server_timestamp_ms 1639569439623 +-- # TYPE Mem_resident_int gauge +-- # HELP Mem_resident_int Kernel-reported RSS (resident set size) +-- Mem_resident_int 103792640 +-- # TYPE rts_gc_max_bytes_used gauge +-- rts_gc_max_bytes_used 5811512 +-- # TYPE rts_gc_gc_cpu_ms counter +-- rts_gc_gc_cpu_ms 50 +-- # TYPE RTS_gcMajorNum_int gauge +-- # HELP RTS_gcMajorNum_int Major GCs +-- RTS_gcMajorNum_int 4 +-- # TYPE rts_gc_num_bytes_usage_samples counter +-- rts_gc_num_bytes_usage_samples 4 +-- # TYPE remainingKESPeriods_int gauge +-- remainingKESPeriods_int 62 +-- # TYPE rts_gc_bytes_copied counter +-- rts_gc_bytes_copied 17114384 +-- # TYPE nodeCannotForge_int gauge +-- # HELP nodeCannotForge_int How many times was this node unable to forge [a block]? -- runPrometheusServer :: TracerEnv