Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Titan for TiDB Best Practices on Public Cloud (#18432) #18536

Merged
46 changes: 34 additions & 12 deletions best-practices-on-public-cloud.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,40 @@

Public cloud infrastructure has become an increasingly popular choice for deploying and managing TiDB. However, deploying TiDB on public cloud requires careful consideration of several critical factors, including performance tuning, cost optimization, reliability, and scalability.

This document covers various essential best practices for deploying TiDB on public cloud, such as using a dedicated disk for Raft Engine, reducing compaction I/O flow in KV RocksDB, optimizing costs for cross-AZ traffic, mitigating Google Cloud live migration events, and fine-tuning the PD server in large clusters. By following these best practices, you can maximize the performance, cost efficiency, reliability, and scalability of your TiDB deployment on public cloud.
This document covers various essential best practices for deploying TiDB on public cloud, such as reducing compaction I/O flow in KV RocksDB, using a dedicated disk for Raft Engine, optimizing costs for cross-AZ traffic, mitigating Google Cloud live migration events, and fine-tuning the PD server in large clusters. By following these best practices, you can maximize the performance, cost efficiency, reliability, and scalability of your TiDB deployment on public cloud.

## Reduce compaction I/O flow in KV RocksDB

As the storage engine of TiKV, [RocksDB](https://rocksdb.org/) is used to store user data. Because the provisioned IO throughput on cloud EBS is usually limited due to cost considerations, RocksDB might exhibit high write amplification, and the disk throughput might become the bottleneck for the workload. As a result, the total number of pending compaction bytes grows over time and triggers flow control, which indicates that TiKV lacks sufficient disk bandwidth to keep up with the foreground write flow.

Check warning on line 14 in best-practices-on-public-cloud.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [PingCAP.Ambiguous] Consider using a clearer word than 'sufficient' because it may cause confusion. Raw Output: {"message": "[PingCAP.Ambiguous] Consider using a clearer word than 'sufficient' because it may cause confusion.", "location": {"path": "best-practices-on-public-cloud.md", "range": {"start": {"line": 14, "column": 441}}}, "severity": "INFO"}

To alleviate the bottleneck caused by limited disk throughput, you can improve performance by [enabling Titan](#enable-titan). If your average row size is smaller than 512 bytes, Titan is not applicable. In this case, you can improve performance by [increasing all the compression levels](#increase-all-the-compression-levels).

### Enable Titan

[Titan](/storage-engine/titan-overview.md) is a high-performance [RocksDB](https://github.com/facebook/rocksdb) plugin for key-value separation, which can reduce write amplification in RocksDB when large values are used.

If your average row size is larger than 512 bytes, you can enable Titan to reduce the compaction I/O flow as follows, with `min-blob-size` set to `"512B"` or `"1KB"` and `blob-file-compression` set to `"zstd"`:

```toml
[rocksdb.titan]
enabled = true
[rocksdb.defaultcf.titan]
min-blob-size = "1KB"
blob-file-compression = "zstd"
```

> **Note:**
>
> When Titan is enabled, there might be a slight performance degradation for range scans on the primary key. For more information, see [Impact of `min-blob-size` on performance](/storage-engine/titan-overview.md#impact-of-min-blob-size-on-performance).

### Increase all the compression levels

If your average row size is smaller than 512 bytes, you can increase all the compression levels of the default column family to `"zstd"` as follows:

```toml
[rocksdb.defaultcf]
compression-per-level = ["zstd", "zstd", "zstd", "zstd", "zstd", "zstd", "zstd"]
```

## Use a dedicated disk for Raft Engine

Expand Down Expand Up @@ -97,17 +130,6 @@
storageSize: 512Gi
```

## Reduce compaction I/O flow in KV RocksDB

As the storage engine of TiKV, [RocksDB](https://rocksdb.org/) is used to store user data. Because the provisioned IO throughput on cloud EBS is usually limited due to cost considerations, RocksDB might exhibit high write amplification, and the disk throughput might become the bottleneck for the workload. As a result, the total number of pending compaction bytes grows over time and triggers flow control, which indicates that TiKV lacks sufficient disk bandwidth to keep up with the foreground write flow.

To alleviate the bottleneck caused by limited disk throughput, you can improve performance by increasing the compression level for RocksDB and reducing the disk throughput. For example, you can refer to the following example to increase all the compression levels of the default column family to `zstd`.

```
[rocksdb.defaultcf]
compression-per-level = ["zstd", "zstd", "zstd", "zstd", "zstd", "zstd", "zstd"]
```

## Optimize cost for cross-AZ network traffic

Deploying TiDB across multiple availability zones (AZs) can lead to increased costs due to cross-AZ data transfer fees. To optimize costs, it is important to reduce cross-AZ network traffic.
Expand Down
Loading