Skip to content

Commit

Permalink
titan doc update for release 7.6.0
Browse files Browse the repository at this point in the history
Signed-off-by: tonyxuqqi <[email protected]>
  • Loading branch information
tonyxuqqi committed Jan 5, 2024
1 parent 429fef4 commit 64ecfce
Show file tree
Hide file tree
Showing 3 changed files with 66 additions and 10 deletions.
36 changes: 29 additions & 7 deletions storage-engine/titan-configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,12 +39,17 @@ Titan is compatible with RocksDB, so you can directly enable Titan on the existi
enabled = true
```

After Titan is enabled, the existing data stored in RocksDB is not immediately moved to the Titan engine. As new data is written to the TiKV foreground and RocksDB performs compaction, the values are progressively separated from keys and written to Titan. You can view the **TiKV Details** -> **Titan kv** -> **blob file size** panel to confirm the size of the data stored in Titan.
After Titan is enabled, the existing data stored in RocksDB is not immediately moved to the Titan engine. As new data is written to the TiKV foreground and RocksDB performs compaction, the values are progressively separated from keys and written to Titan. It's same for the data imported from snapshot restore, PiTR restore or TiDB lightning that initially it's in RocksDB format and converted to Titan during compaction. You can view the **TiKV Details** -> **Titan kv** -> **blob file size** panel to confirm the size of the data stored in Titan.

If you want to speed up the writing process, compact data of the whole TiKV cluster manually using tikv-ctl. For details, see [manual compaction](/tikv-control.md#compact-data-of-the-whole-tikv-cluster-manually).
If you want to speed up the writing process, compact data of the whole TiKV cluster manually using tikv-ctl. For details, see [manual compaction](/tikv-control.md#compact-data-of-the-whole-tikv-cluster-manually). Because RocksDB has the Block cache and the access pattern in compaction is sequential read and thus the block cache hit rate can be pretty high. In our test, a 670 GiB TiKV data can be converted to Titan in less than 1 hour.

> **Note:**
>
> Starting from TiDB 7.6.0, the newly created empty cluster will by default enable Titan. And existing clusters' upgrade to TiDB 7.6.0 would keep the original configuration--- if the titan is not explicityly enabled, then it would still use RocksDB.
> **Warning:**
>
> When Titan is disabled, RocksDB cannot read data that has been migrated to Titan. If Titan is incorrectly disabled on a TiKV instance with Titan already enabled (mistakenly set `rocksdb.titan.enabled` to `false`), TiKV will fail to start, and the `You have disabled titan when its data directory is not empty` error appears in the TiKV log. To correctly disabled Titan, see [Disable Titan](#disable-titan).
## Parameters
Expand All @@ -64,20 +69,27 @@ To adjust Titan-related parameters using TiUP, refer to [Modify the configuratio
+ Value size threshold.
When the size of the value written to the foreground is smaller than the threshold, this value is stored in RocksDB; otherwise, this value is stored in the blob file of Titan. Based on the distribution of value sizes, if you increase the threshold, more values are stored in RocksDB and TiKV performs better in reading small values. If you decrease the threshold, more values go to Titan, which further reduces RocksDB compactions.
When the size of the value written to the foreground is smaller than the threshold, this value is stored in RocksDB; otherwise, this value is stored in the blob file of Titan. Based on the distribution of value sizes, if you increase the threshold, more values are stored in RocksDB and TiKV performs better in reading small values. If you decrease the threshold, more values go to Titan, which further reduces RocksDB compactions. In our [test](/storage-engine/titan-overview.md#min-blob-sizes-performance-implications), 1 KB is a balanced threshold which has far better write throughput with about 10% scan throughput regression compared with RocksDB.
```toml
[rocksdb.defaultcf.titan]
min-blob-size = "1KB"
```
+ The algorithm used for compressing values in Titan, which takes value as the unit.
+ The algorithm used for compressing values in Titan, which takes value as the unit. Starting from TiDB 7.6.0, the default compression is zstd.
```toml
[rocksdb.defaultcf.titan]
blob-file-compression = "lz4"
blob-file-compression = "zstd"
```
+ By default, zstd-dict-size is 0KB , which means Titan's compression is based on single value. But RocksDB compression is based on block (32 KB size by default),So when titan value's average size is less than 32 KB, Titan's comression ratio is smaller than RocksdDB。 Taking json as an example, Titan store size can be 30% ~ 50% bigger than RocksDB. The actual compression ratio depends on the value content and the similiarity among different values. A user can set zstd-dict-size (e.g. 16KB) to enable zstd dictionary compression to boost the compression ratio. Though the zstd dictionary compression can achieve similar compression ratio of RocksDB, it does leads to 10% throughput regression in a typical read-write workload.

```toml
[rocksdb.defaultcf.titan]
zstd-dict-size = "16KB"
```

+ The size of value caches in Titan.

Larger cache size means higher read performance of Titan. However, too large a cache size causes Out of Memory (OOM). It is recommended to set the value of `storage.block-cache.capacity` to the store size minus the blob file size and set `blob-cache-size` to `memory size * 50% - block cache size` according to the monitoring metrics when the database is running stably. This maximizes the blob cache size when the block cache is large enough for the whole RocksDB engine.
Expand Down Expand Up @@ -118,7 +130,7 @@ To disable Titan, you can configure the `rocksdb.defaultcf.titan.blob-run-mode`
- When the option is set to `read-only`, all newly written values are written into RocksDB, regardless of the value size.
- When the option is set to `fallback`, all newly written values are written into RocksDB, regardless of the value size. Also, all compacted values stored in the Titan blob file are automatically moved back to RocksDB.

To fully disable Titan for all existing and future data, you can follow these steps:
To fully disable Titan for all existing and future data, you can follow these steps. Note that in general you can skip step 2 as it would greatly impact online traffic performance. And in fact even without step 2, the data convertion takes extra IO and CPU and thus performance degrade (some times as large as 50%) is still observed when TiKV's IO or CPU resource reaches near limit.
1. Update the configuration of the TiKV nodes you wish to disable Titan for. You can update configuration in two methods:
Expand All @@ -131,12 +143,18 @@ To fully disable Titan for all existing and future data, you can follow these st
discardable-ratio = 1.0
```
2. Perform a full compaction using tikv-ctl. This process will consume large amount of I/O and CPU resources.
> **Note:**
>
> When `discardable-ratio=1`, it means TiKV will only recycle a Titan blob file when all its data are moved to RocksDB. That means before the convertion completes, these Titan blob files won't be deleted. And therefore, if a TiKV node does not have sufficent disk size to store both Titan and RocksDB data, the parameter should keep the default value instead of `1.0`. However if the disk size is big enough, `discardable-ratio = 1.0` can help to reduce the blob file GC and the disk IO.
>

2. [Optional] Perform a full compaction using tikv-ctl. This process will consume large amount of I/O and CPU resources.

```bash
tikv-ctl --pd <PD_ADDR> compact-cluster --bottommost force
```


3. After the compaction is finished, you should wait for the **Blob file count** metrics under **TiKV-Details**/**Titan - kv** to decrease to `0`.

4. Update the configuration of these TiKV nodes to disable Titan.
Expand All @@ -146,6 +164,10 @@ To fully disable Titan for all existing and future data, you can follow these st
enabled = false
```

### Data convertion speed from Titan to RocksDB

Because Blob cache only helps when a value is accessed more than once, in compaction scenario, it's likely not useful. As a result, the data convertion from Titan to RocksDB can be 10x slower than RocksDB to Titan. In our test, a 800 GiB TiKV takes 12 hour to completely convert its data to RocksDB.
## Level Merge (experimental)
In TiKV 4.0, [Level Merge](/storage-engine/titan-overview.md#level-merge), a new algorithm, is introduced to improve the performance of range query and to reduce the impact of Titan GC on the foreground write operations. You can enable Level Merge using the following option:
Expand Down
26 changes: 26 additions & 0 deletions storage-engine/titan-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ The prerequisites for enabling Titan are as follows:
- No range query will be performed or you do not need a high performance of range query. Because the data stored in Titan is not well-ordered, its performance of range query is poorer than that of RocksDB, especially for the query of a large range. According PingCAP's internal test, Titan's range query performance is 40% to a few times lower than that of RocksDB.
- Sufficient disk space (consider reserving a space twice of the RocksDB disk consumption with the same data volume). This is because Titan reduces write amplification at the cost of disk space. In addition, Titan compresses values one by one, and its compression rate is lower than that of RocksDB. RocksDB compresses blocks one by one. Therefore, Titan consumes more storage space than RocksDB, which is expected and normal. In some situations, Titan's storage consumption can be twice that of RocksDB.

In TiDB 7.6.0, a few optimizations are made to Titan and thus it's enabled by default for newly created clusters. Because small KV data would still be stored in RocksDB even Titan is enabled. So there's no harm to enable the Titan in the configuration.

If you want to improve the performance of Titan, see the blog post [Titan: A RocksDB Plugin to Reduce Write Amplification](https://pingcap.com/blog/titan-storage-engine-design-and-implementation/).

## Architecture and implementation
Expand Down Expand Up @@ -124,3 +126,27 @@ Range Merge is an optimized approach of GC based on Level Merge. However, the bo
![RangeMerge](/media/titan/titan-7.png)

Therefore, the Range Merge operation is needed to keep the number of sorted runs within a certain level. At the time of OnCompactionComplete, Titan counts the number of sorted runs in a range. If the number is large, Titan marks the corresponding blob file as ToMerge and rewrites it in the next compaction.

### Scale-out and Scale-in

For backward compatibility, when TiKV sends snapshot to another TiKV in either scale-out or scale-in operation, the snapshot itself is in RocksDB format. And therefore the initial data format for a newly create TiKv node is RocksDB, which means it could have smaller store size meanwhile compaction's write amplification is larger. And later during the compaction, the RocksDB data would be gradually converted to Titan.

### min-blob-size's performance implications

When Titan is enabled, if a value size is no less than `min-blob-size`, it would be stored in Titan. Otherwise, it's stored in RocksDB. Either too big or too small `min-blob-sizesize` would lead to poor performance in some workloads. Below are our test results for different `min-blob-size`'s performance under a few workloads.

| Value size(Bytes) | pointget | pointget(titan)| scan100 | scan100(titan)| scan10000 | scan10000(titan)| update | update(titan) |
| ---------------- | ---------| -------------- | --------| ------------- | --------- | --------------- | ------ | ------------ |
| 256 | 156198 | 153487 | 15961 |6489 |269 |119 |30047 |35181 |
|500 |161142 |160234 |16131 |9267 |223 |99.1 |24162 |33113 |
|1K | 159830 | 160190 | 11290 | 11129 | 115 | 98.2 | 18504 | 31969 |
| 2K | 132507 |139460 | 5751 |5712 |58.2 |58.2 |10538 |28120 |
| 4K | 106835 | 120328 | 2851 | 2836 | 29.1 | 29.1 | 5427 | 23453 |
|512K |2331 | 2331 | 23.5 | 23.5 | NA| NA|62.3| 332|
|1024K| 1165 | 1165 | 11.7 |11.7 | NA |NA |32.3 | 233 |

> **Note:**
> >
> > scan100 means scan 100 records and scan10000 means scan 10000 records.
When value size is 2 KB, Titan's performance is better in all workloads above. When the size is 1 KB, Titan lags only in scan10000 workload by 15%, but gains 50% in update. Therefore, the default value of `min-blob-size` is 1 KB. User can choose the proper `min-blob-size` according to the workloads.
14 changes: 11 additions & 3 deletions tikv-configuration-file.md
Original file line number Diff line number Diff line change
Expand Up @@ -1315,8 +1315,8 @@ Configuration items related to Titan.

### `enabled`

+ Enables or disables Titan
+ Default value: `false`
+ Enables or disables Titan.
+ Default value: In TiDB 7.6.0 or newer version, the default value is `true` in newly created cluster. In other cases, it's `false`. So if you upgrade to TiDB 7.6.0 from older TiBD version, it would keep the original setting if it's set explictly or be false if it's not set.

### `dirname`

Expand Down Expand Up @@ -1614,11 +1614,19 @@ Configuration items related to `rocksdb.defaultcf.titan`.
>
> The Snappy compressed file must be in the [official Snappy format](https://github.com/google/snappy). Other variants of Snappy compression are not supported.
### `zstd-dict-size`

+ The zstd compression dictionary size. By default it's `0KB` which means zstd compression is based on single value, while RocksDB's comression unit is block (32KB size by default). So when zstd dictionary compression is off and the average value is less than 32KB, Titan's compression ratio is smaller than RocksDB. Using Json data as an example, Titan store size could be 30% to 50% more than RocksDB size. Users could set zstd-dict-size (e.g. 16KB) to enable zstd dictionary compression which can achieve similar compression ratio of RocksDB. Though, zstd dictionary compression can lead to 10% performance regression.
+
+ Default value: "0KB"`
+ Unit: KB|MB|GB

### `blob-cache-size`

+ The cache size of a Blob file
+ Default value: `"0GB"`
+ Minimum value: `0`
+ Recommended value: After TiKv runs for a while, set the RocksDB block cache (`storage.block-cache.capacity`) to have 95%+ block cache hit rate. Then the `blob-cache-size` is set with `total memory size * 50% - block cache size`. Block cache hit rate is more important than blob cache hit rate. If `storage.block-cache.capacity` is too small, the overall performance would not be good due to low block cache hit rate.
+ Unit: KB|MB|GB

### `min-gc-batch-size`
Expand Down Expand Up @@ -1661,7 +1669,7 @@ Configuration items related to `rocksdb.defaultcf.titan`.
+ Specifies the running mode of Titan.
+ Optional values:
+ `normal`: Writes data to the blob file when the value size exceeds `min-blob-size`.
+ `read_only`: Refuses to write new data to the blob file, but still reads the original data from the blob file.
+ `read-only`: Refuses to write new data to the blob file, but still reads the original data from the blob file.
+ `fallback`: Writes data in the blob file back to LSM.
+ Default value: `normal`

Expand Down

0 comments on commit 64ecfce

Please sign in to comment.