From c29158915ff4213c242cb493bccfdfb06be665c2 Mon Sep 17 00:00:00 2001 From: djshow832 Date: Tue, 5 Nov 2024 16:32:53 +0800 Subject: [PATCH] update traffic replay --- tiproxy/tiproxy-performance-test.md | 33 ++++++++++++++++++++++ tiproxy/tiproxy-traffic-replay.md | 44 +++++++++++++++++++++++++++-- 2 files changed, 75 insertions(+), 2 deletions(-) diff --git a/tiproxy/tiproxy-performance-test.md b/tiproxy/tiproxy-performance-test.md index 7cc0992afd228..b62b08ff59b9d 100644 --- a/tiproxy/tiproxy-performance-test.md +++ b/tiproxy/tiproxy-performance-test.md @@ -14,6 +14,7 @@ The results are as follows: - The row number of the query result set has a significant impact on the QPS of TiProxy, and the impact is the same as that of HAProxy. - The performance of TiProxy increases almost linearly with the number of vCPUs. Therefore, increasing the number of vCPUs can effectively improve the QPS upper limit. - The number of long connections and the frequency of creating short connections have minimal impact on the QPS of TiProxy. +- The higher the CPU usage of TiProxy, the greater the impact of enabling the [traffic capture](/tiproxy/tiproxy-traffic-replay.md) on QPS. When TiProxy's CPU usage is about 70%, enabling traffic capture causes the average QPS to drop by about 3% and the minimum QPS to drop by about 7%. The latter drop is due to the periodic drop in QPS caused by compressing the traffic files. ## Test environment @@ -312,3 +313,35 @@ sysbench oltp_point_select \ | 100 | 95597 | 0.52 | 0.65 | 330% | 1800% | | 200 | 94692 | 0.53 | 0.67 | 330% | 1800% | | 300 | 94102 | 0.53 | 0.68 | 330% | 1900% | + +## Traffic capture test + +### Test plan + +The test aims to test the impact of [traffic capture](/tiproxy/tiproxy-traffic-replay.md) on TiProxy performance. This test uses TiProxy v1.3.0. Before executing `sysbench`, traffic capture is turned off and on respectively, and concurrency is adjusted to compare QPS and TiProxy's CPU usage. Since periodic compression of traffic files can cause QPS fluctuations, this test compares both the average QPS and the minimum QPS. + +Use the following command to perform the test: + +```bash +sysbench oltp_read_write \ + --threads=$threads \ + --time=1200 \ + --report-interval=5 \ + --rand-type=uniform \ + --db-driver=mysql \ + --mysql-db=sbtest \ + --mysql-host=$host \ + --mysql-port=$port \ + run --tables=32 --table-size=1000000 +``` + +### Test results + +| Connection count | Traffic capture | Avg QPS | Min QPS | Avg latency (ms) | P95 latency (ms) | TiProxy CPU usage | +| - |-----| --- | --- |-----------|-------------|-----------------| +| 20 | Disabled | 27653 | 26999 | 14.46 | 16.12 | 140% | +| 20 | Enabled | 27519 | 26922 | 14.53 | 16.41 | 170% | +| 50 | Disabled | 58014 | 56416 | 17.23 | 20.74 | 270% | +| 50 | Enabled | 56211 | 52236 | 17.79 | 21.89 | 280% | +| 100 | Disabled | 85107 | 84369 | 23.48 | 30.26 | 370% | +| 100 | Enabled | 79819 | 69503 | 25.04 | 31.94 | 380% | \ No newline at end of file diff --git a/tiproxy/tiproxy-traffic-replay.md b/tiproxy/tiproxy-traffic-replay.md index 06b9f25c6c3af..69218a5f4b2b7 100644 --- a/tiproxy/tiproxy-traffic-replay.md +++ b/tiproxy/tiproxy-traffic-replay.md @@ -43,6 +43,8 @@ Traffic replay is not suitable for the following scenarios: > - TiProxy captures traffic on all connections, including existing and newly created ones. > - In TiProxy primary-secondary mode, connect to the primary TiProxy instance. > - If TiProxy is configured with a virtual IP, it is recommended to connect to the virtual IP address. + > - The higher the CPU usage of TiProxy, the greater the impact of traffic capture on QPS. To reduce the impact on the production cluster, it is recommended to reserve at least 30% of the CPU, at which point the average QPS drops by about 3%. For detailed performance data, see [traffic capture test](/tiproxy/tiproxy-performance-test.md#traffic-capture-test). + > - When capturing traffic again, the last traffic file will not be deleted automatically and needs to be deleted manually. For example, the following command connects to the TiProxy instance at `10.0.1.10:3080`, captures traffic for one hour, and saves it to the `/tmp/traffic` directory on the TiProxy instance: @@ -76,7 +78,7 @@ Traffic replay is not suitable for the following scenarios: 5. View the replay report. - After replay completion, the report is stored in the `tiproxy_traffic_report` database on the test cluster. This database contains two tables: `fail` and `other_errors`. + After replay completion, the report is stored in the `tiproxy_traffic_replay` database on the test cluster. This database contains two tables: `fail` and `other_errors`. The `fail` table stores failed SQL statements, with the following fields: @@ -89,6 +91,24 @@ Traffic replay is not suitable for the following scenarios: - `sample_replay_time`: the time when the SQL statement failed during replay. You can use this to view error information in the TiDB log file. - `count`: the number of times the SQL statement failed. + The following is example output from the `fail` table: + + ```sql + SELECT * FROM tiproxy_traffic_replay.fail LIMIT 1\G + ``` + + ``` + *************************** 1. row *************************** + cmd_type: StmtExecute + digest: 89c5c505772b8b7e8d5d1eb49f4d47ed914daa2663ed24a85f762daa3cdff43c + sample_stmt: INSERT INTO new_order (no_o_id, no_d_id, no_w_id) VALUES (?, ?, ?) params=[3077 6 1] + sample_err_msg: ERROR 1062 (23000): Duplicate entry '1-6-3077' for key 'new_order.PRIMARY' + sample_conn_id: 1356 + sample_capture_time: 2024-10-17 12:59:15 + sample_replay_time: 2024-10-17 13:05:05 + count: 4 + ``` + The `other_errors` table stores unexpected errors, such as network errors or database connection errors, with the following fields: - `err_type`: the type of error, presented as a brief error message. For example, `i/o timeout`. @@ -96,9 +116,25 @@ Traffic replay is not suitable for the following scenarios: - `sample_replay_time`: the time when the error occurred during replay. You can use this to view error information in the TiDB log file. - `count`: the number of occurrences for this error. + The following is example output from the `other_errors` table: + + ```sql + SELECT * FROM tiproxy_traffic_replay.other_errors LIMIT 1\G + ``` + + ``` + *************************** 1. row *************************** + err_type: failed to read the connection: EOF + sample_err_msg: this is an error from the backend connection: failed to read the connection: EOF + sample_replay_time: 2024-10-17 12:57:39 + count: 1 + ``` + > **Note:** > - > The table schema of `tiproxy_traffic_report` might change in future versions. It is not recommended to directly read data from `tiproxy_traffic_report` in your application or tool development. + > The table schema of `tiproxy_traffic_replay` might change in future versions. It is not recommended to directly read data from `tiproxy_traffic_replay` in your application or tool development. + > - Replay does not guarantee that the transaction execution order between connections is exactly the same as when it was captured, so it may report errors by mistake. + > - When replaying traffic again, the previous replay report will not be deleted automatically and needs to be deleted manually. ## Test throughput @@ -151,3 +187,7 @@ For more information, see [`tiproxyctl traffic cancel`](/tiproxy/tiproxy-command - TiProxy traffic replay does not support filtering SQL types and DML and DDL statements are replayed. Therefore, you need to restore the cluster data to its pre-replay state before replaying again. - TiProxy traffic replay does not support testing [Resource Control](/tidb-resource-control.md) and [privilege management](/privilege-management.md) because TiProxy uses the same username to replay traffic. - TiProxy does not support replaying [`LOAD DATA`](/sql-statements/sql-statement-load-data.md) statements. + +## More resources + +For more information about the traffic replay of TiProxy, see the [design document](https://github.com/pingcap/tiproxy/blob/main/docs/design/2024-08-27-traffic-replay.md). \ No newline at end of file