Skip to content

Commit

Permalink
feat(otel-node): add node runtime metrics (#416)
Browse files Browse the repository at this point in the history
  • Loading branch information
david-luna authored Nov 19, 2024
1 parent 0a03f53 commit de21ab9
Show file tree
Hide file tree
Showing 8 changed files with 246 additions and 160 deletions.
168 changes: 85 additions & 83 deletions examples/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

42 changes: 38 additions & 4 deletions packages/opentelemetry-node/docs/metrics.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,13 @@
<!-- Goal of this doc: ??? -->
<!--
Goal of this doc:
The user understands which metrics are collected by default in EDOT and gets
insight on metrics export configurations.
Assumptions we're comfortable making about the reader:
* They are familiar with Elastic
* They are familiar with OpenTelemetry
* They have familiar with node runtime metrics
-->

# Metrics

Expand All @@ -20,7 +29,32 @@ node -r @elastic/opentelemetry-node/start.js my-app.js
You can tune how often metrics data is exported to the endpoint and the max time
to export data you can use the env vars already defined in [the spec](https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/#periodic-exporting-metricreader).

## Host metrics
## Process & runtime metrics

EDOT Node.js gathers metrics from the nodejs process your application is
running. In order to do that EDOT Node.js is using the following packages:

- `@opentelemetry/host-metrics` to gather `process.cpu.*` and `process.memory.*` metrics ([ref](https://github.com/open-telemetry/semantic-conventions/blob/80988c54712ee336cb3a6240b8845e9dfa8c9f49/docs/system/process-metrics.md?plain=1#L22))
- `@opentelemetry/instrumentation-runtime-node` to gather `nodejs.eventloop.*` ([ref](https://github.com/open-telemetry/semantic-conventions/blob/80988c54712ee336cb3a6240b8845e9dfa8c9f49/model/nodejs/metrics.yaml)) and `v8js.*` ([ref](https://github.com/open-telemetry/semantic-conventions/blob/80988c54712ee336cb3a6240b8845e9dfa8c9f49/model/v8js/metrics.yaml)) metrics

These metrics are useful when you're checking the performance of your
instrumented service. A subset of them are useful to detect possible
issues when doing an overview of the instrumented service. These are:

- `nodejs.eventloop.delay.p50` and `nodejs.eventloop.delay.p90` are the
50th and 90th [percentiles](https://en.wikipedia.org/wiki/Percentile) of
the event loop delay. The event loop delay measures the time span between
the scheduling of a callback and its execution. The bigger the number,
the more sync work you have in your service blocking the event loop.
- `nodejs.eventloop.utilization` is the utiliation of the event loop reported
by [`performance.eventLoopUtilization([utilization1[, utilization2]])`](https://nodejs.org/api/perf_hooks.html#performanceeventlooputilizationutilization1-utilization2) gives which
the percentage of time the event loop is being used (not idle).
- `process.cpu.utilization` is the percentage of time the CPU is running
the service code. Big values in this metric suggest your service is doing
compute intensive tasks.
- `process.memory.usage` is the value of [Resident Set Size](https://nodejs.org/api/process.html#processmemoryusagerss) in bytes. It
measures how much memory the process is allocating.


EDOT Node.js also gathers metrics from the
host machine with `@opentelemetry/host-metrics` package.
If your service is instrumented by EDOT Node.js, or by custom instrumentation that includes the packages mentioned above, Kibana will
display them as part of the [service metrics](https://www.elastic.co/guide/en/observability/current/apm-metrics.html) in its UI.
13 changes: 13 additions & 0 deletions packages/opentelemetry-node/lib/instrumentations.js
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@
* "@opentelemetry/instrumentation-redis-4": import('@opentelemetry/instrumentation-redis-4').RedisInstrumentationConfig | InstrumentationFactory,
* "@opentelemetry/instrumentation-restify": import('@opentelemetry/instrumentation-restify').RestifyInstrumentationConfig | InstrumentationFactory,
* "@opentelemetry/instrumentation-router": import('@opentelemetry/instrumentation').InstrumentationConfig | InstrumentationFactory,
* "@opentelemetry/instrumentation-runtime-node": import('@opentelemetry/instrumentation-runtime-node').RuntimeNodeInstrumentationConfig | InstrumentationFactory,
* "@opentelemetry/instrumentation-socket.io": import('@opentelemetry/instrumentation-socket.io').SocketIoInstrumentationConfig | InstrumentationFactory,
* "@opentelemetry/instrumentation-tedious": import('@opentelemetry/instrumentation-tedious').TediousInstrumentationConfig | InstrumentationFactory,
* "@opentelemetry/instrumentation-undici": import('@opentelemetry/instrumentation-undici').UndiciInstrumentationConfig | InstrumentationFactory,
Expand Down Expand Up @@ -84,6 +85,7 @@ const {RedisInstrumentation} = require('@opentelemetry/instrumentation-redis');
const {RedisInstrumentation: RedisFourInstrumentation} = require('@opentelemetry/instrumentation-redis-4');
const {RestifyInstrumentation} = require('@opentelemetry/instrumentation-restify');
const {RouterInstrumentation} = require('@opentelemetry/instrumentation-router');
const {RuntimeNodeInstrumentation} = require('@opentelemetry/instrumentation-runtime-node');
const {SocketIoInstrumentation} = require('@opentelemetry/instrumentation-socket.io');
const {TediousInstrumentation} = require('@opentelemetry/instrumentation-tedious');
const {UndiciInstrumentation} = require('@opentelemetry/instrumentation-undici');
Expand Down Expand Up @@ -126,6 +128,7 @@ const INSTRUMENTATIONS = {
'@opentelemetry/instrumentation-redis-4': (cfg) => new RedisFourInstrumentation(cfg),
'@opentelemetry/instrumentation-restify': (cfg) => new RestifyInstrumentation(cfg),
'@opentelemetry/instrumentation-router': (cfg) => new RouterInstrumentation(cfg),
'@opentelemetry/instrumentation-runtime-node': (cfg) => new RuntimeNodeInstrumentation(cfg),
'@opentelemetry/instrumentation-socket.io': (cfg) => new SocketIoInstrumentation(cfg),
'@opentelemetry/instrumentation-tedious': (cfg) => new TediousInstrumentation(cfg),
'@opentelemetry/instrumentation-undici': (cfg) => new UndiciInstrumentation(cfg),
Expand Down Expand Up @@ -232,6 +235,16 @@ function getInstrumentations(opts = {}) {
return;
}

// Skip if metrics are disabled by env var
const isMetricsDisabled =
process.env.ELASTIC_OTEL_METRICS_DISABLED === 'true';
if (
isMetricsDisabled &&
name === '@opentelemetry/instrumentation-runtime-node'
) {
return;
}

const isFactory = typeof opts[name] === 'function';
const isObject = typeof opts[name] === 'object';
const instrFactory = isFactory ? opts[name] : INSTRUMENTATIONS[name];
Expand Down
22 changes: 5 additions & 17 deletions packages/opentelemetry-node/lib/metrics/host.js
Original file line number Diff line number Diff line change
Expand Up @@ -29,26 +29,14 @@ function enableHostMetrics() {
hostMetricsInstance.start();
}

// It is known that host metrics sends a lot of data so for now we drop some
// instruments that are not handled by Kibana and doing aggregations
// for others that we want to include shorly (CPU metrics)
// Ref (data amount issue): https://github.com/elastic/elastic-otel-node/issues/51
// Ref (metrics in Kibana): https://github.com/elastic/kibana/pull/174700
// Dropping system metrics because:
// - sends a lot of data. Ref: https://github.com/elastic/elastic-otel-node/issues/51
// - not displayed by Kibana in metrics dashboard. Ref: https://github.com/elastic/kibana/pull/199353
// - recommendation is to use OTEL collector to get and export them
/** @type {metrics.View[]} */
const HOST_METRICS_VIEWS = [
// drop `system.network.*` (not in Kibana)
new View({
instrumentName: 'system.network.*',
aggregation: Aggregation.Drop(),
}),
// drop `system.cpu.time` (not in Kibana)
new View({
instrumentName: 'system.cpu.time',
aggregation: Aggregation.Drop(),
}),
// drop `process.*` (not in Kibana)
new View({
instrumentName: 'process.*',
instrumentName: 'system.*',
aggregation: Aggregation.Drop(),
}),
];
Expand Down
Loading

0 comments on commit de21ab9

Please sign in to comment.