-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark and document tail-based sampling performance #11346
Comments
Adding links for posterity.
To be more exact, in a multi-apm-server setup, the expectation is that "Rate of writes is generally proportional to local ingest rate", ingest rate local to the apm-server under observation. On the read side, the expectation is that "rate of reads is proportional to local ingest * sampling rate". However, before fix #13464, apm-server suffers from rate of reads proportional to global ingest * sampling rate, which means unscalable disk IO and memory usage.
Related to #11546 |
moving it from it-107 to it-108 |
9.0 8gb ram, gp2, TBS: https://github.com/elastic/apm-server/actions/runs/13698906719 WIP: Will summarize and document the differences once all numbers are ready. |
We have a good benchmarking setup for general apm-server ingest performance, but tail-based sampling is a bit of a blind spot. We have done this manually in the past, but we don't have a framework for repeatable testing of TBS.
Once we have a baseline performance established, we should then add to the public documentation. This should include details about disks used, and what kinds of disks are recommended; and expectations about disk and memory usage in relation to ingest rate and sampling rate, and specifically some guidance on setting tail sampling storage limit. Documentation on TBS performance should probably follow on from #7842.
We will need #7845. Assuming we use
apmbench
, we will need to enable-rewrite-ids
to ensuretrace.id
and per-trace events are not repeated, which would affect TBS.Note to whoever works on this:
The text was updated successfully, but these errors were encountered: