Skip to content

Commit

Permalink
update traffic replay
Browse files Browse the repository at this point in the history
  • Loading branch information
djshow832 committed Nov 5, 2024
1 parent 049365d commit c291589
Show file tree
Hide file tree
Showing 2 changed files with 75 additions and 2 deletions.
33 changes: 33 additions & 0 deletions tiproxy/tiproxy-performance-test.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ The results are as follows:
- The row number of the query result set has a significant impact on the QPS of TiProxy, and the impact is the same as that of HAProxy.
- The performance of TiProxy increases almost linearly with the number of vCPUs. Therefore, increasing the number of vCPUs can effectively improve the QPS upper limit.
- The number of long connections and the frequency of creating short connections have minimal impact on the QPS of TiProxy.
- The higher the CPU usage of TiProxy, the greater the impact of enabling the [traffic capture](/tiproxy/tiproxy-traffic-replay.md) on QPS. When TiProxy's CPU usage is about 70%, enabling traffic capture causes the average QPS to drop by about 3% and the minimum QPS to drop by about 7%. The latter drop is due to the periodic drop in QPS caused by compressing the traffic files.

## Test environment

Expand Down Expand Up @@ -312,3 +313,35 @@ sysbench oltp_point_select \
| 100 | 95597 | 0.52 | 0.65 | 330% | 1800% |
| 200 | 94692 | 0.53 | 0.67 | 330% | 1800% |
| 300 | 94102 | 0.53 | 0.68 | 330% | 1900% |

## Traffic capture test

### Test plan

The test aims to test the impact of [traffic capture](/tiproxy/tiproxy-traffic-replay.md) on TiProxy performance. This test uses TiProxy v1.3.0. Before executing `sysbench`, traffic capture is turned off and on respectively, and concurrency is adjusted to compare QPS and TiProxy's CPU usage. Since periodic compression of traffic files can cause QPS fluctuations, this test compares both the average QPS and the minimum QPS.

Use the following command to perform the test:

```bash
sysbench oltp_read_write \
--threads=$threads \
--time=1200 \
--report-interval=5 \
--rand-type=uniform \
--db-driver=mysql \
--mysql-db=sbtest \
--mysql-host=$host \
--mysql-port=$port \
run --tables=32 --table-size=1000000
```

### Test results

| Connection count | Traffic capture | Avg QPS | Min QPS | Avg latency (ms) | P95 latency (ms) | TiProxy CPU usage |
| - |-----| --- | --- |-----------|-------------|-----------------|
| 20 | Disabled | 27653 | 26999 | 14.46 | 16.12 | 140% |
| 20 | Enabled | 27519 | 26922 | 14.53 | 16.41 | 170% |
| 50 | Disabled | 58014 | 56416 | 17.23 | 20.74 | 270% |
| 50 | Enabled | 56211 | 52236 | 17.79 | 21.89 | 280% |
| 100 | Disabled | 85107 | 84369 | 23.48 | 30.26 | 370% |
| 100 | Enabled | 79819 | 69503 | 25.04 | 31.94 | 380% |
44 changes: 42 additions & 2 deletions tiproxy/tiproxy-traffic-replay.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ Traffic replay is not suitable for the following scenarios:
> - TiProxy captures traffic on all connections, including existing and newly created ones.
> - In TiProxy primary-secondary mode, connect to the primary TiProxy instance.
> - If TiProxy is configured with a virtual IP, it is recommended to connect to the virtual IP address.
> - The higher the CPU usage of TiProxy, the greater the impact of traffic capture on QPS. To reduce the impact on the production cluster, it is recommended to reserve at least 30% of the CPU, at which point the average QPS drops by about 3%. For detailed performance data, see [traffic capture test](/tiproxy/tiproxy-performance-test.md#traffic-capture-test).
> - When capturing traffic again, the last traffic file will not be deleted automatically and needs to be deleted manually.
For example, the following command connects to the TiProxy instance at `10.0.1.10:3080`, captures traffic for one hour, and saves it to the `/tmp/traffic` directory on the TiProxy instance:

Expand Down Expand Up @@ -76,7 +78,7 @@ Traffic replay is not suitable for the following scenarios:

5. View the replay report.

After replay completion, the report is stored in the `tiproxy_traffic_report` database on the test cluster. This database contains two tables: `fail` and `other_errors`.
After replay completion, the report is stored in the `tiproxy_traffic_replay` database on the test cluster. This database contains two tables: `fail` and `other_errors`.

The `fail` table stores failed SQL statements, with the following fields:

Expand All @@ -89,16 +91,50 @@ Traffic replay is not suitable for the following scenarios:
- `sample_replay_time`: the time when the SQL statement failed during replay. You can use this to view error information in the TiDB log file.
- `count`: the number of times the SQL statement failed.

The following is example output from the `fail` table:

```sql
SELECT * FROM tiproxy_traffic_replay.fail LIMIT 1\G
```

```
*************************** 1. row ***************************
cmd_type: StmtExecute
digest: 89c5c505772b8b7e8d5d1eb49f4d47ed914daa2663ed24a85f762daa3cdff43c
sample_stmt: INSERT INTO new_order (no_o_id, no_d_id, no_w_id) VALUES (?, ?, ?) params=[3077 6 1]
sample_err_msg: ERROR 1062 (23000): Duplicate entry '1-6-3077' for key 'new_order.PRIMARY'
sample_conn_id: 1356
sample_capture_time: 2024-10-17 12:59:15
sample_replay_time: 2024-10-17 13:05:05
count: 4
```

The `other_errors` table stores unexpected errors, such as network errors or database connection errors, with the following fields:

- `err_type`: the type of error, presented as a brief error message. For example, `i/o timeout`.
- `sample_err_msg`: the complete error message when the error first occurred.
- `sample_replay_time`: the time when the error occurred during replay. You can use this to view error information in the TiDB log file.
- `count`: the number of occurrences for this error.

The following is example output from the `other_errors` table:

```sql
SELECT * FROM tiproxy_traffic_replay.other_errors LIMIT 1\G
```

```
*************************** 1. row ***************************
err_type: failed to read the connection: EOF
sample_err_msg: this is an error from the backend connection: failed to read the connection: EOF
sample_replay_time: 2024-10-17 12:57:39
count: 1
```

> **Note:**
>
> The table schema of `tiproxy_traffic_report` might change in future versions. It is not recommended to directly read data from `tiproxy_traffic_report` in your application or tool development.
> The table schema of `tiproxy_traffic_replay` might change in future versions. It is not recommended to directly read data from `tiproxy_traffic_replay` in your application or tool development.
> - Replay does not guarantee that the transaction execution order between connections is exactly the same as when it was captured, so it may report errors by mistake.
> - When replaying traffic again, the previous replay report will not be deleted automatically and needs to be deleted manually.

## Test throughput

Expand Down Expand Up @@ -151,3 +187,7 @@ For more information, see [`tiproxyctl traffic cancel`](/tiproxy/tiproxy-command
- TiProxy traffic replay does not support filtering SQL types and DML and DDL statements are replayed. Therefore, you need to restore the cluster data to its pre-replay state before replaying again.
- TiProxy traffic replay does not support testing [Resource Control](/tidb-resource-control.md) and [privilege management](/privilege-management.md) because TiProxy uses the same username to replay traffic.
- TiProxy does not support replaying [`LOAD DATA`](/sql-statements/sql-statement-load-data.md) statements.

## More resources

For more information about the traffic replay of TiProxy, see the [design document](https://github.com/pingcap/tiproxy/blob/main/docs/design/2024-08-27-traffic-replay.md).

0 comments on commit c291589

Please sign in to comment.