docs: Benchmark and document tail-based sampling performance

We have a good benchmarking setup for general apm-server ingest performance, but tail-based sampling is a bit of a blind spot. We have done this manually in the past, but we don't have a framework for repeatable testing of TBS.

Once we have a baseline performance established, we should then add to the public documentation. This should include details about disks used, and what kinds of disks are recommended; and expectations about disk and memory usage in relation to ingest rate and sampling rate, and specifically some guidance on setting tail sampling storage limit. Documentation on TBS performance should probably follow on from https://github.com/elastic/apm-server/issues/7842.

We will need https://github.com/elastic/apm-server/issues/7845. Assuming we use `apmbench`, we will need to enable `-rewrite-ids` to ensure `trace.id` and per-trace events are not repeated, which would affect TBS.

Note to whoever works on this:
- We should look at how both disk reads and writes grow with both event ingest rate and sampling rate. Rate of writes is generally proportional to ingest rate, but rate of reads is expected to be proportional to the ingest * sampling rate.
- We should compare Badger v2 (in use at the time of writing this) vs. v4 performance (proposed) 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: Benchmark and document tail-based sampling performance #11346

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

docs: Benchmark and document tail-based sampling performance #11346

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions