[chore] introduce prometheus load test #305

bacherfl · 2024-09-23T11:41:33Z

This scenario works a bit differently than the previously added load tests as the prometheus data sender exposes a metrics endpoint from which the collector scrapes the metrics, rather than actively sending the metrics to the collector.

For this reason, the concept using the load generator for the sender had to be changed to instead generate a fixed number of metrics and hand them over to the prometheus data sender (i.e. prometheus server). When using the previous approach the load generator would keep adding metrics that should be served at the prometheus endpoint, which quickly lead to the endpoint becoming unresponsive.
The number of exposed metrics is then read during each scrape iteration of the collector's prometheus receiver, and the overall number of received metrics by the mock backend to which the scraped metrics are forwarded to is increased by the number of exposed metrics after each scrape interval. Therefore it proved best to configure the data items per second via the scrape interval, e.g. With a fixed number of 1000 metrics exposed on the prometheus endpoint, and a scrape interval of 10ms this would relate to a potential throughput of 100k metrics per second.
The bottle neck here seems to be the prometheus servers started by the prometheus data senders, so after decreasing the scrape interval beneath a certain point actually leads to a lower throughput.
With 10 parallel scrape jobs fetching 1000 metrics every second the agent gets killed (e.g. https://github.com/Dynatrace/dynatrace-otel-collector/actions/runs/11026146904/job/30622216193#step:10:272), so I have removed that scenario for now

Signed-off-by: Florian Bacher <[email protected]>

internal/testbed/load/tests/metrics_test.go

Signed-off-by: Florian Bacher <[email protected]>

internal/testbed/load/tests/metrics_test.go

bacherfl · 2024-09-27T09:12:38Z

internal/testbed/load/tests/metrics_test.go

 	"testing"

 	"github.com/Dynatrace/dynatrace-otel-collector/internal/testcommon/testutil"
+	"github.com/open-telemetry/opentelemetry-collector-contrib/testbed/datasenders"
 	"github.com/open-telemetry/opentelemetry-collector-contrib/testbed/testbed"
 )

 var (
 	metricProcessors = map[string]string{


changing the order of the processors, as it is recommended to have the memory limiter before the batch processor: https://github.com/open-telemetry/opentelemetry-collector/tree/main/processor/batchprocessor

Signed-off-by: Florian Bacher <[email protected]>

bacherfl added 18 commits September 23, 2024 13:39

poc of prometheus data sender in performance test

f800c70

Signed-off-by: Florian Bacher <[email protected]>

adjust ram limit

0db262e

Signed-off-by: Florian Bacher <[email protected]>

increase scrape job frequency

a9adb4d

Signed-off-by: Florian Bacher <[email protected]>

increase scrape job frequency

ff98610

Signed-off-by: Florian Bacher <[email protected]>

reduce scrape interval

083be4c

Signed-off-by: Florian Bacher <[email protected]>

make scrape interval configurable

1ce55b4

Signed-off-by: Florian Bacher <[email protected]>

add additional test scenarios with multiple prometheus endpoints

b307b53

Signed-off-by: Florian Bacher <[email protected]>

add more scenarios

1d2ad6b

Signed-off-by: Florian Bacher <[email protected]>

remove one of the scenarios for now

59664d1

Signed-off-by: Florian Bacher <[email protected]>

Merge branch 'main' into feat/prometheus-load-test

727fa45

re-enable multi prometheus load test scenario

7ea68ea

Signed-off-by: Florian Bacher <[email protected]>

reduce number of scrape jobs

5eb1e6d

Signed-off-by: Florian Bacher <[email protected]>

reduce number of scrape jobs

e5cc767

Signed-off-by: Florian Bacher <[email protected]>

increase number of scrape jobs

f5f7af8

Signed-off-by: Florian Bacher <[email protected]>

additional scenario

b65cc9d

Signed-off-by: Florian Bacher <[email protected]>

increase memory limit

2cc4441

Signed-off-by: Florian Bacher <[email protected]>

adapt memory limit

9f83a5a

Signed-off-by: Florian Bacher <[email protected]>

adapt memory limit

396819d

Signed-off-by: Florian Bacher <[email protected]>

bacherfl marked this pull request as ready for review September 25, 2024 04:28

bacherfl requested a review from a team as a code owner September 25, 2024 04:28

adapt memory limit for OTLP metric test

18fe9aa

Signed-off-by: Florian Bacher <[email protected]>

bacherfl commented Sep 25, 2024

View reviewed changes

internal/testbed/load/tests/metrics_test.go Outdated Show resolved Hide resolved

bacherfl added 2 commits September 27, 2024 09:07

Merge branch 'main' into feat/prometheus-load-test

c37f40d

exchange order of processors

874a693

Signed-off-by: Florian Bacher <[email protected]>

odubajDT reviewed Sep 27, 2024

View reviewed changes

internal/testbed/load/tests/metrics_test.go Outdated Show resolved Hide resolved

odubajDT approved these changes Sep 27, 2024

View reviewed changes

bacherfl commented Sep 27, 2024

View reviewed changes

move helper function

40144b0

Signed-off-by: Florian Bacher <[email protected]>

pirgeo approved these changes Sep 27, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[chore] introduce prometheus load test #305

[chore] introduce prometheus load test #305

bacherfl commented Sep 23, 2024 •

edited

Loading

bacherfl Sep 27, 2024

[chore] introduce prometheus load test #305

Are you sure you want to change the base?

[chore] introduce prometheus load test #305

Conversation

bacherfl commented Sep 23, 2024 • edited Loading

bacherfl Sep 27, 2024

Choose a reason for hiding this comment

bacherfl commented Sep 23, 2024 •

edited

Loading