move test case directory

kyma-project · Jan 30, 2024 · 9cab379 · 9cab379
1 parent 515bf9a
commit 9cab379
Show file tree

Hide file tree

Showing 15 changed files with 212 additions and 93 deletions.
diff --git a/docs/contributor/performance-tests/traces/README.md b/docs/contributor/performance-tests/traces/README.md
diff --git a/docs/contributor/telemetry-load-test/README.md b/docs/contributor/telemetry-load-test/README.md
@@ -0,0 +1,212 @@
+# Telemetry KPIs and Limit Test
+
+This document describes a reproducible test setup to determine the limits and KPis of the Kyma TracePipeline and MetricPipeline.
+
+## Prerequisites
+
+- Kyma as the target deployment environment, 2 Nodes with 4 CPU and 16G Memory (n1-standard-4 on GCP)
+- Telemetry Module installed
+- Istio Module installed
+- Kubectl > 1.22.x
+- Helm 3.x
+- curl 8.4.x
+- jq 1.6
+
+
+## Traces Test 
+
+### Assumptions
+
+The tests are executed for 20 minutes for each test case to have a stabilized output and reliable KPIs. Generated traces contain at least 2 spans, and each span has 40 attributes to simulate an average trace span size.  
+
+The following test cases are identified:
+
+1. Test average throughput end-to-end. 
+2. Test queuing and retry capabilities of TracePipeline with simulated backend outages.
+3. Test average throughput with 3 TracePipelines simultaneously end-to-end.
+4. Test queuing and retry capabilities of 3 TracePipeline with simulated backend outages.
+
+Backend outages simulated with Istio Fault Injection, 70% of traffic to the Test Backend will return `HTTP 503` to simulate service outages.
+
+### Setup
+
+The following diagram shows the test setup used for all test cases. 
+
+![Trace Gateway Test Setup](./assets/trace_perf_test_setup.drawio.svg)
+
+In all test scenarios, a preconfigured trace load generator is deployed on the test cluster. To ensure all trace gateway instances are loaded with test data, the trace load generator feeds the test TracePipeline over a pipeline service instance .
+
+A Prometheus instance is deployed on the test cluster to collect relevant metrics from trace gateway instances and to fetch the metrics at the end of the test as test scenario result.
+
+All test scenarios also have a test backend deployed to simulate end-to-end behaviour.
+
+Each test scenario has its own test scripts responsible for preparing test scenario and deploying on test cluster, running the scenario, and fetching relevant metrics/KPIs at the end of the test run. After the test, the test results are printed out.
+
+A typical test result output looks like the following example:
+
+```shell
+ Receiver accepted spans, 12867
+ Exporter exported spans, 38585
+ Exporter queue size, 0
+ Pod memory (MB), 147
+ Pod memory (MB), 160
+ Pod CPU, 1.4
+ Pod CPU, 1.4
+```
+
+### Test Script
+
+All test scenarios use a single test script [run-load-test.sh](assets/run-load-test.sh), which provides two parameters: `-m` for multi TracePipeline scenarios, and `-b` for backpressure scenarios
+1. To test the average throughput end-to-end, run:
+
+```shell
+./run-load-test.sh traces
+```
+2. To test the queuing and retry capabilities of TracePipeline with simulated backend outages, run:
+
+```shell
+./run-load-test.sh traces -b true
+```
+
+3. To test the average throughput with 3 TracePipelines simultaneously end-to-end, run:
+
+```shell
+./run-load-test.sh traces -m true
+```
+
+4. To test the queuing and retry capabilities of 3 TracePipelines with simulated backend outages, run:
+
+```shell
+./run-load-test.sh traces -m true -b true
+```
+
+### Test Results
+
+
+
+<div class="table-wrapper" markdown="block">
+
+| Version/Test |       Single Pipeline       |                             |                     |                      |               |       Multi Pipeline        |                             |                     |                      |               | Single Pipeline Backpressure |                             |                     |                      |               | Multi Pipeline Backpressure |                             |                     |                      |               |
+|-------------:|:---------------------------:|:---------------------------:|:-------------------:|:--------------------:|:-------------:|:---------------------------:|:---------------------------:|:-------------------:|:--------------------:|:-------------:|:----------------------------:|:---------------------------:|:-------------------:|:--------------------:|:-------------:|:---------------------------:|:---------------------------:|:-------------------:|:--------------------:|:-------------:|
+|              | Receiver Accepted Spans/sec | Exporter Exported Spans/sec | Exporter Queue Size | Pod Memory Usage(MB) | Pod CPU Usage | Receiver Accepted Spans/sec | Exporter Exported Spans/sec | Exporter Queue Size | Pod Memory Usage(MB) | Pod CPU Usage | Receiver Accepted Spans/sec  | Exporter Exported Spans/sec | Exporter Queue Size | Pod Memory Usage(MB) | Pod CPU Usage | Receiver Accepted Spans/sec | Exporter Exported Spans/sec | Exporter Queue Size | Pod Memory Usage(MB) | Pod CPU Usage |
+|         0.91 |          19815.05           |          19815.05           |          0          |     137, 139.92      | 0.979, 0.921  |           13158.4           |          38929.06           |          0          |      117, 98.5       | 1.307, 1.351  |            9574.4            |            1280             |         509         |     1929.4, 1726     | 0.723, 0.702  |           9663.8            |           1331.2            |         510         |     2029.8, 1686     | 0.733, 0.696  |
+|         0.92 |           21146.3           |           21146.3           |          0          |     72.37, 50.95     | 1.038, 0.926  |           12757.6           |           38212.2           |          0          |     90.3, 111.28     |  1.36, 1.19   |            3293.6            |           2918.4            |         204         |    866.07, 873.4     |  0.58, 0.61   |           9694.6            |           1399.5            |         510         |    1730.6, 1796.6    | 0.736, 0.728  |
+
+</div>
+
+
+## Metrics Test
+
+Metric test consist of two main test scenario, first scenario test Metric Gateway KPIs and second one test Metric Agent KPIs 
+
+### Metric Gateway Test and Assumptions
+
+The tests are executed for 20 minutes for each test case to have a stabilized output and reliable KPIs. Generated metrics contain 10 attributes to simulate an average metric size, test simulate `2000` individual metric producer and each one pushes metrics every `30 second` to the Metric Gateway.
+
+
+The following test cases are identified:
+
+1. Test average throughput end-to-end.
+2. Test queuing and retry capabilities of Metric Gateway with simulated backend outages.
+3. Test average throughput with 3 MetricPipelines simultaneously end-to-end.
+4. Test queuing and retry capabilities of 3 MetricPipeline with simulated backend outages.
+
+Backend outages simulated with Istio Fault Injection, 70% of traffic to the Test Backend will return `HTTP 503` to simulate service outages
+
+### Metric Agent Test and Assumptions
+
+The tests are executed for 20 minutes for each test case to have a stabilized output and reliable KPIs. 
+Contrast to Metric Gateway test, Metric Agent test deploy passive metric producer ([Avalanche Prometheus metric load generator](https://blog.freshtracks.io/load-testing-prometheus-metric-ingestion-5b878711711c)) and metrics will be scraped by Metric Agent from the producer.
+Test setup deploy 20 individual metric producer pods each of this producer produces 1000 Metrics with 20 metric series, Metric Agent collect metrics via Pod scraping as well as Service scraping to test both Metric Agent receivers configuration.
+
+
+The following test cases are identified:
+
+1. Test average throughput end-to-end.
+2. Test queuing and retry capabilities of Metric Agent with simulated backend outages.
+3. Test average throughput with 3 MetricPipelines simultaneously end-to-end.
+4. Test queuing and retry capabilities of 3 MetricPipeline with simulated backend outages.
+
+Backend outages simulated with Istio Fault Injection, 70% of traffic to the Test Backend will return `HTTP 503` to simulate service outages
+
+### Setup
+
+The following diagram shows the test setup used for all Metric test cases.
+
+![Metric Test Setup](./assets/metric_perf_test_setup.drawio.svg)
+
+
+In all test scenarios, a preconfigured trace load generator is deployed on the test cluster. To ensure all metric gateway instances are loaded with test data, the trace load generator feeds the test MetricPipeline over a pipeline service instance, in Metric Agent test, test data scraped from test data producer and pushed to the Metric Gateway. 
+
+A Prometheus instance is deployed on the test cluster to collect relevant metrics from trace gateway instances and to fetch the metrics at the end of the test as test scenario result.
+
+All test scenarios also have a test backend deployed to simulate end-to-end behaviour.
+
+Each test scenario has its own test scripts responsible for preparing test scenario and deploying on test cluster, running the scenario, and fetching relevant metrics/KPIs at the end of the test run. After the test, the test results are printed out.
+
+### Test Script
+
+All test scenarios use a single test script [run-load-test.sh](assets/run-load-test.sh), which provides two parameters: `-m` for multi TracePipeline scenarios, and `-b` for backpressure scenarios
+
+#### Metric Gateway
+
+1. To test the average throughput end-to-end, run:
+
+```shell
+./run-load-test.sh metrics
+```
+2. To test the queuing and retry capabilities of Metric Gateway with simulated backend outages, run:
+
+```shell
+./run-load-test.sh metrics -b true
+```
+
+3. To test the average throughput with 3 TracePipelines simultaneously end-to-end, run:
+
+```shell
+./run-load-test.sh metrics -m true
+```
+
+4. To test the queuing and retry capabilities of 3 TracePipelines with simulated backend outages, run:
+
+```shell
+./run-load-test.sh metrics -m true -b true
+```
+
+#### Test Results
+
+
+
+<div class="table-wrapper" markdown="block">
+
+| Version/Test |       Single Pipeline        |                              |                     |                      |               |        Multi Pipeline        |                              |                     |                      |               | Single Pipeline Backpressure  |                              |                     |                      |               | Multi Pipeline Backpressure  |                              |                     |                      |               |
+|-------------:|:----------------------------:|:----------------------------:|:-------------------:|:--------------------:|:-------------:|:----------------------------:|:----------------------------:|:-------------------:|:--------------------:|:-------------:|:-----------------------------:|:----------------------------:|:-------------------:|:--------------------:|:-------------:|:----------------------------:|:----------------------------:|:-------------------:|:--------------------:|:-------------:|
+|              | Receiver Accepted Metric/sec | Exporter Exported Metric/sec | Exporter Queue Size | Pod Memory Usage(MB) | Pod CPU Usage | Receiver Accepted Metric/sec | Exporter Exported Metric/sec | Exporter Queue Size | Pod Memory Usage(MB) | Pod CPU Usage | Receiver Accepted Metric/sec  | Exporter Exported Metric/sec | Exporter Queue Size | Pod Memory Usage(MB) | Pod CPU Usage | Receiver Accepted Metric/sec | Exporter Exported Metric/sec | Exporter Queue Size | Pod Memory Usage(MB) | Pod CPU Usage |
+|         0.92 |           21146.3            |           21146.3            |          0          |     72.37, 50.95     | 1.038, 0.926  |           12757.6            |           38212.2            |          0          |     90.3, 111.28     |  1.36, 1.19   |            3293.6             |            2918.4            |         204         |    866.07, 873.4     |  0.58, 0.61   |            9694.6            |            1399.5            |         510         |    1730.6, 1796.6    | 0.736, 0.728  |
+
+</div>
+
+#### Metric Agent
+
+1. To test the average throughput end-to-end, run:
+
+```shell
+./run-load-test.sh metricagent
+```
+2. To test the queuing and retry capabilities of Metric Agent with simulated backend outages, run:
+
+```shell
+./run-load-test.sh metricagent -b true
+```
+
+3. To test the average throughput with 3 TracePipelines simultaneously end-to-end, run:
+
+```shell
+./run-load-test.sh metricagent -m true
+```
+
+4. To test the queuing and retry capabilities of 3 TracePipelines with simulated backend outages, run:
+
+```shell
+./run-load-test.sh metricagent -m true -b true
+```
diff --git a/...ces/assets/metric-agent-max-pipeline.yaml → ...est/assets/metric-agent-max-pipeline.yaml b/...ces/assets/metric-agent-max-pipeline.yaml → ...est/assets/metric-agent-max-pipeline.yaml
diff --git a/...races/assets/metric-agent-test-setup.yaml → ...-test/assets/metric-agent-test-setup.yaml b/...races/assets/metric-agent-test-setup.yaml → ...-test/assets/metric-agent-test-setup.yaml
diff --git a/...es/assets/metric-backpressure-config.yaml → ...st/assets/metric-backpressure-config.yaml b/...es/assets/metric-backpressure-config.yaml → ...st/assets/metric-backpressure-config.yaml
diff --git a/...traces/assets/metric-load-test-setup.yaml → ...d-test/assets/metric-load-test-setup.yaml b/...traces/assets/metric-load-test-setup.yaml → ...d-test/assets/metric-load-test-setup.yaml
diff --git a/...ts/traces/assets/metric-max-pipeline.yaml → ...load-test/assets/metric-max-pipeline.yaml b/...ts/traces/assets/metric-max-pipeline.yaml → ...load-test/assets/metric-max-pipeline.yaml
diff --git a/.../assets/metric_perf_test_setup.drawio.svg → .../assets/metric_perf_test_setup.drawio.svg b/.../assets/metric_perf_test_setup.drawio.svg → .../assets/metric_perf_test_setup.drawio.svg
diff --git a/...ance-tests/traces/assets/run-load-test.sh → ...lemetry-load-test/assets/run-load-test.sh b/...ance-tests/traces/assets/run-load-test.sh → ...lemetry-load-test/assets/run-load-test.sh
diff --git a/...performance-tests/traces/assets/style.css → ...utor/telemetry-load-test/assets/style.css b/...performance-tests/traces/assets/style.css → ...utor/telemetry-load-test/assets/style.css
diff --git a/...ces/assets/trace-backpressure-config.yaml → ...est/assets/trace-backpressure-config.yaml b/...ces/assets/trace-backpressure-config.yaml → ...est/assets/trace-backpressure-config.yaml
diff --git a/.../traces/assets/trace-load-test-setup.yaml → ...ad-test/assets/trace-load-test-setup.yaml b/.../traces/assets/trace-load-test-setup.yaml → ...ad-test/assets/trace-load-test-setup.yaml
diff --git a/...sts/traces/assets/trace-max-pipeline.yaml → ...-load-test/assets/trace-max-pipeline.yaml b/...sts/traces/assets/trace-max-pipeline.yaml → ...-load-test/assets/trace-max-pipeline.yaml
diff --git a/...s/assets/trace_perf_test_setup.drawio.svg → ...t/assets/trace_perf_test_setup.drawio.svg b/...s/assets/trace_perf_test_setup.drawio.svg → ...t/assets/trace_perf_test_setup.drawio.svg
diff --git a/...rformance-tests/traces/assets/values.yaml → ...or/telemetry-load-test/assets/values.yaml b/...rformance-tests/traces/assets/values.yaml → ...or/telemetry-load-test/assets/values.yaml