diff --git a/tidb-cloud/v8.5-performance-highlights.md b/tidb-cloud/v8.5-performance-highlights.md index f3b301f7fcaad..8f9de5e1b0321 100644 --- a/tidb-cloud/v8.5-performance-highlights.md +++ b/tidb-cloud/v8.5-performance-highlights.md @@ -83,31 +83,83 @@ TiDB v8.5.0 introduces multiple enhancements to mitigate the impact of cloud dis ### Test results -The failover time of the IO latency jitter is 30% shorter, and P99/999 latency is reduced by 70% or more. - -- Test results without IO latency jitter improvement - - | Workload | Failover time | QPS drop rate | Maximum latency (P999) during failover | Maximum latency (P99) during failover | - | --- | --- | --- | --- | --- | - | IO delay of 1 s lasts for 10 mins | 4 mins | 99% | 1 min | 56 s | - | IO delay of 500 ms lasts for 10 mins | 4 mins | 99% | 54 s | 7.8 s | - | IO delay of 100 ms lasts for 10 mins | Failover not achieved | 99% | 32 s | 26 s | - | IO delay of 50 ms lasts for 10 mins | Failover not achieved | 97% | 13.2 s | 6.7 s | - | IO delay of 10 ms lasts for 10 mins | Failover not achieved | 94% | 3 s | 1.45 s | - | IO delay of 5 ms lasts for 10 mins | Failover not achieved | 81% | 462 ms | 246 ms | - | IO delay of 2 ms lasts for 10 mins | Failover not achieved | 38% | 232 ms | 22.9 ms | - -- Test results with IO latency jitter improvement - - | Workload | Failover time | QPS drop rate | Maximum latency (P999) during failover | Maximum latency (P99) during failover | - | --- | --- | --- | --- | --- | - | IO delay of 1 s lasts for 10 mins | 3 mins | 93% | 4.66 s | 929 ms | - | IO delay of 500 ms lasts for 10 mins | 2 mins | 92% | 7.22 s | 894 ms | - | IO delay of 100 ms lasts for 10 mins | 3 mins | 80% | 7.53 s | 1.7 s | - | IO delay of 50 ms lasts for 10 mins | 3 mins | 53% | 1.36 s | 238 ms | - | IO delay of 10 ms lasts for 10 mins | 3 mins | 18% | 69 ms | 25 ms | - | IO delay of 5 ms lasts for 10 mins | 2 mins | 29% | 37.9 ms | 10 ms | - | IO delay of 2 ms lasts for 10 mins | Almost no effect | 1% | 14 ms | 7.9 ms | +Failovers are now available in multiple IO delay scenarios, and P99/999 latency during impacts is reduced by up to 98%. + +In the following table of test results, the **Current** column shows the results with improvements to reduce IO latency jitter, while the **Original** column shows the results without these improvements: + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Workload descriptionFailover timeMaximum latency during impacts (P999)Maximum latency during impacts (P99)
CurrentOriginalCurrentOriginalCurrentOriginal
IO delay of 2 ms lasts for 10 minsAlmost no effectFailover not available14 ms232 ms7.9 ms22.9 ms
IO delay of 5 ms lasts for 10 mins2 minsFailover not available37.9 ms462 ms10 ms246 ms
IO delay of 10 ms lasts for 10 mins3 minsFailover not available69 ms3 s25 ms1.45 s
IO delay of 50 ms lasts for 10 mins3 minsFailover not available1.36 s13.2 s238 ms6.7 s
IO delay of 100 ms lasts for 10 mins3 minsFailover not available7.53 s32 s1.7 s26 s
+ +### Further improvements + +Due to the inherent risk of physical disk damage, the cloud disk jitter issue is unavoidable. To mitigate its impact, TiKV introduces a [slow node detection mechanism](https://docs.pingcap.com/tidb/v8.5/pd-scheduling-best-practices#troubleshoot-tikv-node). This mechanism uses [evict-slow-store-scheduler](https://docs.pingcap.com/tidb/v8.5/pd-control#scheduler-show--add--remove--pause--resume--config--describe) to detect and manage slow nodes, reducing the effects of cloud disk jitter. + +The severity of disk jitter might also be highly related to users' workload profiles. In latency-sensitive scenarios, designing applications in conjunction with TiDB features can further minimize the impact of IO jitter on applications. For example, in read-heavy and latency-sensitive environments, adjusting the [`tikv_client_read_timeout`](/system-variables.md#tikv_client_read_timeout-new-in-v740) system variable according to latency requirements and using stale reads or follower reads can enable faster failover retries to other replica peers for KV requests sent from TiDB. This reduces the impact of IO jitter on a single TiKV node and helps improve query latency. Note that the effectiveness of this feature depends on the workload profile, which should be evaluated before implementation. + +Additionally, users [deploying TiDB on public cloud](https://docs.pingcap.com/tidb/dev/best-practices-on-public-cloud) can reduce the probability of jitter by choosing cloud disks with higher performance. ## Batch processing