Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chore] introduce prometheus load test #305

Open
wants to merge 22 commits into
base: main
Choose a base branch
from

Conversation

bacherfl
Copy link
Contributor

@bacherfl bacherfl commented Sep 23, 2024

This scenario works a bit differently than the previously added load tests as the prometheus data sender exposes a metrics endpoint from which the collector scrapes the metrics, rather than actively sending the metrics to the collector.

For this reason, the concept using the load generator for the sender had to be changed to instead generate a fixed number of metrics and hand them over to the prometheus data sender (i.e. prometheus server). When using the previous approach the load generator would keep adding metrics that should be served at the prometheus endpoint, which quickly lead to the endpoint becoming unresponsive.
The number of exposed metrics is then read during each scrape iteration of the collector's prometheus receiver, and the overall number of received metrics by the mock backend to which the scraped metrics are forwarded to is increased by the number of exposed metrics after each scrape interval. Therefore it proved best to configure the data items per second via the scrape interval, e.g. With a fixed number of 1000 metrics exposed on the prometheus endpoint, and a scrape interval of 10ms this would relate to a potential throughput of 100k metrics per second.
The bottle neck here seems to be the prometheus servers started by the prometheus data senders, so after decreasing the scrape interval beneath a certain point actually leads to a lower throughput.
With 10 parallel scrape jobs fetching 1000 metrics every second the agent gets killed (e.g. https://github.com/Dynatrace/dynatrace-otel-collector/actions/runs/11026146904/job/30622216193#step:10:272), so I have removed that scenario for now

Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
Signed-off-by: Florian Bacher <[email protected]>
@bacherfl bacherfl marked this pull request as ready for review September 25, 2024 04:28
@bacherfl bacherfl requested a review from a team as a code owner September 25, 2024 04:28
"testing"

"github.com/Dynatrace/dynatrace-otel-collector/internal/testcommon/testutil"
"github.com/open-telemetry/opentelemetry-collector-contrib/testbed/datasenders"
"github.com/open-telemetry/opentelemetry-collector-contrib/testbed/testbed"
)

var (
metricProcessors = map[string]string{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing the order of the processors, as it is recommended to have the memory limiter before the batch processor: https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/batchprocessor

Signed-off-by: Florian Bacher <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants