Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chore][exporter/elasticsearch] Add more detail to version_conflict_engine_exception known issue #37150

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 30 additions & 3 deletions exporter/elasticsearchexporter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -357,8 +357,35 @@ In case the record contains `timestamp`, this value is used. Otherwise, the `obs

### version_conflict_engine_exception

When sending high traffic of metrics to a TSDB metrics data stream, e.g. using OTel mapping mode to a 8.16 Elasticsearch, it is possible to get error logs "failed to index document" with `error.type` "version_conflict_engine_exception" and `error.reason` containing "version conflict, document already exists". It is due to Elasticsearch grouping metrics with the same dimensions, whether it is the same or different metric name, using `@timestamp` in milliseconds precision as opposed to nanoseconds in elasticsearchexporter.
Symptom: elasticsearchexporter logs an error "failed to index document" with `error.type` "version_conflict_engine_exception" and `error.reason` containing "version conflict, document already exists".

This will be fixed in a future version of Elasticsearch. A possible workaround would be to use a transform processor to truncate the timestamp, but this will cause duplicate data to be dropped silently.
This happens when the target data stream is a TSDB metrics data stream (e.g. using OTel mapping mode sending to a 8.16+ Elasticsearch). See the following scenarios.

However, if `@timestamp` precision is not the problem, check your metrics pipeline setup for misconfiguration that causes an actual violation of the [single writer principle](https://opentelemetry.io/docs/specs/otel/metrics/data-model/#single-writer).
1. When sending different metrics with the same dimension (mostly made up of resource attributes, scope attributes, attributes),
a `version_conflict_engine_exception` is returned by Elasticsearch when these metrics are not grouped into the same document.
It also means that they have to be in the same batch in the exporter, as metric grouping is done per-batch in elasticsearchexporter.
To work around the issue, use a transform processor to ensure different metrics to never share the same set of dimensions. This is done at the expense of storage efficiency.

```yaml
processors:
transform/unique_dimensions:
metric_statements:
- context: datapoint
statements:
- set(attributes["metric_name"], metric.name)
```

2. If the problem persists, the error may be caused by metrics with data points in the same millisecond but not the same nanosecond, as metric grouping is done in nanoseconds but Elasticsearch checks for duplicates in milliseconds.

This will be fixed in a future version of Elasticsearch. To work around the issue, use a transform processor to truncate the timestamp, but this will cause duplicate data in the same millisecond to be dropped silently.

```yaml
processors:
transform/truncate_timestamp:
metric_statements:
- context: datapoint
statements:
- set(time, TruncateTime(time, Duration("1ms")))
```

3. If all of the above do not apply, check your metrics pipeline setup for misconfiguration that causes an actual violation of the [single writer principle](https://opentelemetry.io/docs/specs/otel/metrics/data-model/#single-writer).
Loading