Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Titan for TiDB Best Practices on Public Cloud #18432

Merged
merged 9 commits into from
Aug 8, 2024
39 changes: 27 additions & 12 deletions best-practices-on-public-cloud.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,33 @@ Public cloud infrastructure has become an increasingly popular choice for deploy

This document covers various essential best practices for deploying TiDB on public cloud, such as using a dedicated disk for Raft Engine, reducing compaction I/O flow in KV RocksDB, optimizing costs for cross-AZ traffic, mitigating Google Cloud live migration events, and fine-tuning the PD server in large clusters. By following these best practices, you can maximize the performance, cost efficiency, reliability, and scalability of your TiDB deployment on public cloud.
dbsid marked this conversation as resolved.
Show resolved Hide resolved

## Reduce compaction I/O flow in KV RocksDB

As the storage engine of TiKV, [RocksDB](https://rocksdb.org/) is used to store user data. Because the provisioned IO throughput on cloud EBS is usually limited due to cost considerations, RocksDB might exhibit high write amplification, and the disk throughput might become the bottleneck for the workload. As a result, the total number of pending compaction bytes grows over time and triggers flow control, which indicates that TiKV lacks sufficient disk bandwidth to keep up with the foreground write flow.

To alleviate the bottleneck caused by limited disk throughput, you can improve performance by enabling Titan or increasing the compression level for RocksDB and reducing the disk throughput.
dbsid marked this conversation as resolved.
Show resolved Hide resolved

### Enable Titan
If the average row size is larger than 512 bytes, you can enable [Titan](https://docs.pingcap.com/tidb/stable/titan-overview) to reduce the compaction IO flow as below, the `min-blob-size` can be set to 512B or 1KB.
If the average row size is larger than 512 bytes, you can enable [Titan](https://docs.pingcap.com/tidb/stable/titan-overview) to reduce the compaction I/O flow. Set the `min-blob-size` to 512B or 1KB, `blob-file-compression` to "zstd"
dbsid marked this conversation as resolved.
Show resolved Hide resolved

```
[rocksdb.titan]
enabled = true
[rocksdb.defaultcf.titan]
min-blob-size = "1KB"
blob-file-compression = "zstd"
```
dbsid marked this conversation as resolved.
Show resolved Hide resolved

### Increase all the compression levels

If the average row size is smaller than 512 bytes and Titan is not applicable, you can increase all the compression levels of the default column family to `zstd`. Use the following configuration:
dbsid marked this conversation as resolved.
Show resolved Hide resolved

```
dbsid marked this conversation as resolved.
Show resolved Hide resolved
[rocksdb.defaultcf]
compression-per-level = ["zstd", "zstd", "zstd", "zstd", "zstd", "zstd", "zstd"]
```

## Use a dedicated disk for Raft Engine

The [Raft Engine](/glossary.md#raft-engine) in TiKV plays a critical role similar to that of a write-ahead log (WAL) in traditional databases. To achieve optimal performance and stability, it is crucial to allocate a dedicated disk for the Raft Engine when you deploy TiDB on public cloud. The following `iostat` shows the I/O characteristics on a TiKV node with a write-heavy workload.
Expand Down Expand Up @@ -96,18 +123,6 @@ tikv:
name: raft-pv-ssd
storageSize: 512Gi
```

## Reduce compaction I/O flow in KV RocksDB

As the storage engine of TiKV, [RocksDB](https://rocksdb.org/) is used to store user data. Because the provisioned IO throughput on cloud EBS is usually limited due to cost considerations, RocksDB might exhibit high write amplification, and the disk throughput might become the bottleneck for the workload. As a result, the total number of pending compaction bytes grows over time and triggers flow control, which indicates that TiKV lacks sufficient disk bandwidth to keep up with the foreground write flow.

To alleviate the bottleneck caused by limited disk throughput, you can improve performance by increasing the compression level for RocksDB and reducing the disk throughput. For example, you can refer to the following example to increase all the compression levels of the default column family to `zstd`.

```
[rocksdb.defaultcf]
compression-per-level = ["zstd", "zstd", "zstd", "zstd", "zstd", "zstd", "zstd"]
```

## Optimize cost for cross-AZ network traffic
Oreoxmt marked this conversation as resolved.
Show resolved Hide resolved

Deploying TiDB across multiple availability zones (AZs) can lead to increased costs due to cross-AZ data transfer fees. To optimize costs, it is important to reduce cross-AZ network traffic.
Expand Down