Skip to content

Commit

Permalink
move test case directory
Browse files Browse the repository at this point in the history
  • Loading branch information
hisarbalik committed Jan 30, 2024
1 parent 515bf9a commit 9cab379
Show file tree
Hide file tree
Showing 15 changed files with 212 additions and 93 deletions.
93 changes: 0 additions & 93 deletions docs/contributor/performance-tests/traces/README.md

This file was deleted.

212 changes: 212 additions & 0 deletions docs/contributor/telemetry-load-test/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
# Telemetry KPIs and Limit Test

This document describes a reproducible test setup to determine the limits and KPis of the Kyma TracePipeline and MetricPipeline.

## Prerequisites

- Kyma as the target deployment environment, 2 Nodes with 4 CPU and 16G Memory (n1-standard-4 on GCP)
- Telemetry Module installed
- Istio Module installed
- Kubectl > 1.22.x
- Helm 3.x
- curl 8.4.x
- jq 1.6


## Traces Test

### Assumptions

The tests are executed for 20 minutes for each test case to have a stabilized output and reliable KPIs. Generated traces contain at least 2 spans, and each span has 40 attributes to simulate an average trace span size.

The following test cases are identified:

1. Test average throughput end-to-end.
2. Test queuing and retry capabilities of TracePipeline with simulated backend outages.
3. Test average throughput with 3 TracePipelines simultaneously end-to-end.
4. Test queuing and retry capabilities of 3 TracePipeline with simulated backend outages.

Backend outages simulated with Istio Fault Injection, 70% of traffic to the Test Backend will return `HTTP 503` to simulate service outages.

### Setup

The following diagram shows the test setup used for all test cases.

![Trace Gateway Test Setup](./assets/trace_perf_test_setup.drawio.svg)

In all test scenarios, a preconfigured trace load generator is deployed on the test cluster. To ensure all trace gateway instances are loaded with test data, the trace load generator feeds the test TracePipeline over a pipeline service instance .

A Prometheus instance is deployed on the test cluster to collect relevant metrics from trace gateway instances and to fetch the metrics at the end of the test as test scenario result.

All test scenarios also have a test backend deployed to simulate end-to-end behaviour.

Each test scenario has its own test scripts responsible for preparing test scenario and deploying on test cluster, running the scenario, and fetching relevant metrics/KPIs at the end of the test run. After the test, the test results are printed out.

A typical test result output looks like the following example:

```shell
Receiver accepted spans, 12867
Exporter exported spans, 38585
Exporter queue size, 0
Pod memory (MB), 147
Pod memory (MB), 160
Pod CPU, 1.4
Pod CPU, 1.4
```

### Test Script

All test scenarios use a single test script [run-load-test.sh](assets/run-load-test.sh), which provides two parameters: `-m` for multi TracePipeline scenarios, and `-b` for backpressure scenarios
1. To test the average throughput end-to-end, run:

```shell
./run-load-test.sh traces
```
2. To test the queuing and retry capabilities of TracePipeline with simulated backend outages, run:

```shell
./run-load-test.sh traces -b true
```

3. To test the average throughput with 3 TracePipelines simultaneously end-to-end, run:

```shell
./run-load-test.sh traces -m true
```

4. To test the queuing and retry capabilities of 3 TracePipelines with simulated backend outages, run:

```shell
./run-load-test.sh traces -m true -b true
```

### Test Results



<div class="table-wrapper" markdown="block">

| Version/Test | Single Pipeline | | | | | Multi Pipeline | | | | | Single Pipeline Backpressure | | | | | Multi Pipeline Backpressure | | | | |
|-------------:|:---------------------------:|:---------------------------:|:-------------------:|:--------------------:|:-------------:|:---------------------------:|:---------------------------:|:-------------------:|:--------------------:|:-------------:|:----------------------------:|:---------------------------:|:-------------------:|:--------------------:|:-------------:|:---------------------------:|:---------------------------:|:-------------------:|:--------------------:|:-------------:|
| | Receiver Accepted Spans/sec | Exporter Exported Spans/sec | Exporter Queue Size | Pod Memory Usage(MB) | Pod CPU Usage | Receiver Accepted Spans/sec | Exporter Exported Spans/sec | Exporter Queue Size | Pod Memory Usage(MB) | Pod CPU Usage | Receiver Accepted Spans/sec | Exporter Exported Spans/sec | Exporter Queue Size | Pod Memory Usage(MB) | Pod CPU Usage | Receiver Accepted Spans/sec | Exporter Exported Spans/sec | Exporter Queue Size | Pod Memory Usage(MB) | Pod CPU Usage |
| 0.91 | 19815.05 | 19815.05 | 0 | 137, 139.92 | 0.979, 0.921 | 13158.4 | 38929.06 | 0 | 117, 98.5 | 1.307, 1.351 | 9574.4 | 1280 | 509 | 1929.4, 1726 | 0.723, 0.702 | 9663.8 | 1331.2 | 510 | 2029.8, 1686 | 0.733, 0.696 |
| 0.92 | 21146.3 | 21146.3 | 0 | 72.37, 50.95 | 1.038, 0.926 | 12757.6 | 38212.2 | 0 | 90.3, 111.28 | 1.36, 1.19 | 3293.6 | 2918.4 | 204 | 866.07, 873.4 | 0.58, 0.61 | 9694.6 | 1399.5 | 510 | 1730.6, 1796.6 | 0.736, 0.728 |

</div>


## Metrics Test

Metric test consist of two main test scenario, first scenario test Metric Gateway KPIs and second one test Metric Agent KPIs

### Metric Gateway Test and Assumptions

The tests are executed for 20 minutes for each test case to have a stabilized output and reliable KPIs. Generated metrics contain 10 attributes to simulate an average metric size, test simulate `2000` individual metric producer and each one pushes metrics every `30 second` to the Metric Gateway.


The following test cases are identified:

1. Test average throughput end-to-end.
2. Test queuing and retry capabilities of Metric Gateway with simulated backend outages.
3. Test average throughput with 3 MetricPipelines simultaneously end-to-end.
4. Test queuing and retry capabilities of 3 MetricPipeline with simulated backend outages.

Backend outages simulated with Istio Fault Injection, 70% of traffic to the Test Backend will return `HTTP 503` to simulate service outages

### Metric Agent Test and Assumptions

The tests are executed for 20 minutes for each test case to have a stabilized output and reliable KPIs.
Contrast to Metric Gateway test, Metric Agent test deploy passive metric producer ([Avalanche Prometheus metric load generator](https://blog.freshtracks.io/load-testing-prometheus-metric-ingestion-5b878711711c)) and metrics will be scraped by Metric Agent from the producer.
Test setup deploy 20 individual metric producer pods each of this producer produces 1000 Metrics with 20 metric series, Metric Agent collect metrics via Pod scraping as well as Service scraping to test both Metric Agent receivers configuration.


The following test cases are identified:

1. Test average throughput end-to-end.
2. Test queuing and retry capabilities of Metric Agent with simulated backend outages.
3. Test average throughput with 3 MetricPipelines simultaneously end-to-end.
4. Test queuing and retry capabilities of 3 MetricPipeline with simulated backend outages.

Backend outages simulated with Istio Fault Injection, 70% of traffic to the Test Backend will return `HTTP 503` to simulate service outages

### Setup

The following diagram shows the test setup used for all Metric test cases.

![Metric Test Setup](./assets/metric_perf_test_setup.drawio.svg)


In all test scenarios, a preconfigured trace load generator is deployed on the test cluster. To ensure all metric gateway instances are loaded with test data, the trace load generator feeds the test MetricPipeline over a pipeline service instance, in Metric Agent test, test data scraped from test data producer and pushed to the Metric Gateway.

A Prometheus instance is deployed on the test cluster to collect relevant metrics from trace gateway instances and to fetch the metrics at the end of the test as test scenario result.

All test scenarios also have a test backend deployed to simulate end-to-end behaviour.

Each test scenario has its own test scripts responsible for preparing test scenario and deploying on test cluster, running the scenario, and fetching relevant metrics/KPIs at the end of the test run. After the test, the test results are printed out.

### Test Script

All test scenarios use a single test script [run-load-test.sh](assets/run-load-test.sh), which provides two parameters: `-m` for multi TracePipeline scenarios, and `-b` for backpressure scenarios

#### Metric Gateway

1. To test the average throughput end-to-end, run:

```shell
./run-load-test.sh metrics
```
2. To test the queuing and retry capabilities of Metric Gateway with simulated backend outages, run:

```shell
./run-load-test.sh metrics -b true
```

3. To test the average throughput with 3 TracePipelines simultaneously end-to-end, run:

```shell
./run-load-test.sh metrics -m true
```

4. To test the queuing and retry capabilities of 3 TracePipelines with simulated backend outages, run:

```shell
./run-load-test.sh metrics -m true -b true
```

#### Test Results



<div class="table-wrapper" markdown="block">

| Version/Test | Single Pipeline | | | | | Multi Pipeline | | | | | Single Pipeline Backpressure | | | | | Multi Pipeline Backpressure | | | | |
|-------------:|:----------------------------:|:----------------------------:|:-------------------:|:--------------------:|:-------------:|:----------------------------:|:----------------------------:|:-------------------:|:--------------------:|:-------------:|:-----------------------------:|:----------------------------:|:-------------------:|:--------------------:|:-------------:|:----------------------------:|:----------------------------:|:-------------------:|:--------------------:|:-------------:|
| | Receiver Accepted Metric/sec | Exporter Exported Metric/sec | Exporter Queue Size | Pod Memory Usage(MB) | Pod CPU Usage | Receiver Accepted Metric/sec | Exporter Exported Metric/sec | Exporter Queue Size | Pod Memory Usage(MB) | Pod CPU Usage | Receiver Accepted Metric/sec | Exporter Exported Metric/sec | Exporter Queue Size | Pod Memory Usage(MB) | Pod CPU Usage | Receiver Accepted Metric/sec | Exporter Exported Metric/sec | Exporter Queue Size | Pod Memory Usage(MB) | Pod CPU Usage |
| 0.92 | 21146.3 | 21146.3 | 0 | 72.37, 50.95 | 1.038, 0.926 | 12757.6 | 38212.2 | 0 | 90.3, 111.28 | 1.36, 1.19 | 3293.6 | 2918.4 | 204 | 866.07, 873.4 | 0.58, 0.61 | 9694.6 | 1399.5 | 510 | 1730.6, 1796.6 | 0.736, 0.728 |

</div>

#### Metric Agent

1. To test the average throughput end-to-end, run:

```shell
./run-load-test.sh metricagent
```
2. To test the queuing and retry capabilities of Metric Agent with simulated backend outages, run:

```shell
./run-load-test.sh metricagent -b true
```

3. To test the average throughput with 3 TracePipelines simultaneously end-to-end, run:

```shell
./run-load-test.sh metricagent -m true
```

4. To test the queuing and retry capabilities of 3 TracePipelines with simulated backend outages, run:

```shell
./run-load-test.sh metricagent -m true -b true
```

0 comments on commit 9cab379

Please sign in to comment.