Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/master'
Browse files Browse the repository at this point in the history
  • Loading branch information
qiancai committed Jun 4, 2024
2 parents e0be674 + b157567 commit cf0fd75
Show file tree
Hide file tree
Showing 11 changed files with 103 additions and 34 deletions.
1 change: 0 additions & 1 deletion basic-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,6 @@ You can try out TiDB features on [TiDB Playground](https://play.tidbcloud.com/?u
| [Extended statistics](/extended-statistics.md) | E | E | E | E | E | E | E | E | E |
| Statistics feedback | N | N | N | N | Deprecated | Deprecated | E | E | E |
| [Automatically update statistics](/statistics.md#automatic-update) | Y | Y | Y | Y | Y | Y | Y | Y | Y |
| [Fast Analyze](/system-variables.md#tidb_enable_fast_analyze) | Deprecated | Deprecated | E | E | E | E | E | E | E |
| [Dynamic pruning](/partitioned-table.md#dynamic-pruning-mode) | Y | Y | Y | Y | Y | E | E | E | E |
| [Collect statistics for `PREDICATE COLUMNS`](/statistics.md#collect-statistics-on-some-columns) | E | E | E | E | E | E | N | N | N |
| [Control the memory quota for collecting statistics](/statistics.md#the-memory-quota-for-collecting-statistics) | E | E | E | E | N | N | N | N | N |
Expand Down
6 changes: 0 additions & 6 deletions br/br-snapshot-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,12 +77,6 @@ tiup br restore full --pd "${PD_IP}:2379" \
--storage "s3://backup-101/snapshot-202209081330?access-key=${access-key}&secret-access-key=${secret-access-key}"
```

> **Warning:**
>
> The coarse-grained Region scatter algorithm (enabled by setting `--granularity="coarse-grained"`) is experimental. It is recommended that you use this feature to accelerate data recovery in clusters with up to 1,000 tables. Note that this feature does not support checkpoint restore.
To further improve the restore speed of large clusters, starting from v7.6.0, BR supports a coarse-grained Region scatter algorithm (experimental) for faster parallel recovery. You can enable this algorithm by specifying `--granularity="coarse-grained"`. After it is enabled, BR can quickly split the restore task into a large number of small tasks and scatter them to all TiKV nodes in batches, thus fully utilizing the resources of each TiKV node for fast recovery in parallel.

During restore, a progress bar is displayed in the terminal as shown below. When the progress bar advances to 100%, the restore task is completed and statistics such as total restore time, average restore speed, and total data size are displayed.

```shell
Expand Down
16 changes: 11 additions & 5 deletions information-schema/information-schema-tikv-region-status.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,13 @@ The `TIKV_REGION_STATUS` table shows some basic information of TiKV Regions via
>
> This table is not available on [TiDB Serverless](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters.
{{< copyable "sql" >}}

```sql
USE information_schema;
DESC tikv_region_status;
USE INFORMATION_SCHEMA;
DESC TIKV_REGION_STATUS;
```

The output is as follows:

```sql
+---------------------------+-------------+------+------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
Expand All @@ -31,6 +31,9 @@ DESC tikv_region_status;
| IS_INDEX | tinyint(1) | NO | | 0 | |
| INDEX_ID | bigint(21) | YES | | NULL | |
| INDEX_NAME | varchar(64) | YES | | NULL | |
| IS_PARTITION | tinyint(1) | NO | | 0 | |
| PARTITION_ID | bigint(21) | YES | | NULL | |
| PARTITION_NAME | varchar(64) | YES | | NULL | |
| EPOCH_CONF_VER | bigint(21) | YES | | NULL | |
| EPOCH_VERSION | bigint(21) | YES | | NULL | |
| WRITTEN_BYTES | bigint(21) | YES | | NULL | |
Expand All @@ -40,7 +43,7 @@ DESC tikv_region_status;
| REPLICATIONSTATUS_STATE | varchar(64) | YES | | NULL | |
| REPLICATIONSTATUS_STATEID | bigint(21) | YES | | NULL | |
+---------------------------+-------------+------+------+---------+-------+
17 rows in set (0.00 sec)
20 rows in set (0.00 sec)
```

The descriptions of the columns in the `TIKV_REGION_STATUS` table are as follows:
Expand All @@ -54,6 +57,9 @@ The descriptions of the columns in the `TIKV_REGION_STATUS` table are as follows
* `IS_INDEX`: Whether the Region data is an index. 0 means that it is not an index, while 1 means that it is an index. If the current Region contains both table data and index data, there will be multiple rows of records, and `IS_INDEX` is 0 and 1 respectively.
* `INDEX_ID`: The ID of the index to which the Region belongs. If `IS_INDEX` is 0, the value of this column is NULL.
* `INDEX_NAME`: The name of the index to which the Region belongs. If `IS_INDEX` is 0, the value of this column is NULL.
* `IS_PARTITION`: Whether the table to which the Region belongs is partitioned.
* `PARTITION_ID`: If the table to which the Region belongs is partitioned, this column displays the ID of the partition to which the Region belongs.
* `PARTITION_NAME`: If the table to which the Region belongs is partitioned, this column displays the name of the partition to which the Region belongs.
* `EPOCH_CONF_VER`: The version number of the Region configuration. The version number increases when a peer is added or removed.
* `EPOCH_VERSION`: The current version number of the Region. The version number increases when the Region is split or merged.
* `WRITTEN_BYTES`: The amount of data (bytes) written to the Region.
Expand Down
75 changes: 73 additions & 2 deletions optimizer-hints.md
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,10 @@ SELECT /*+ NO_MERGE_JOIN(t1, t2) */ * FROM t1, t2 WHERE t1.id = t2.id;

### INL_JOIN(t1_name [, tl_name ...])

> **Note:**
>
> In some cases, the `INL_JOIN` hint might not take effect. For more information, see [`INL_JOIN` hint does not take effect](#inl_join-hint-does-not-take-effect).
The `INL_JOIN(t1_name [, tl_name ...])` hint tells the optimizer to use the index nested loop join algorithm for the given table(s). This algorithm might consume less system resources and take shorter processing time in some scenarios and might produce an opposite result in other scenarios. If the result set is less than 10,000 rows after the outer table is filtered by the `WHERE` condition, it is recommended to use this hint. For example:

{{< copyable "sql" >}}
Expand Down Expand Up @@ -930,7 +934,74 @@ The warning is as follows:

In this case, you need to place the hint directly after the `SELECT` keyword. For more details, see the [Syntax](#syntax) section.

### INL_JOIN hint does not take effect due to collation incompatibility
### `INL_JOIN` hint does not take effect

#### `INL_JOIN` hint does not take effect when built-in functions are used on columns for joining tables

In some cases, if you use a built-in function on a column that joins tables, the optimizer might fail to choose the `IndexJoin` plan, resulting in the `INL_JOIN` hint not taking effect either.

For example, the following query uses the built-in function `substr` on the column `tname` that joins tables:

```sql
CREATE TABLE t1 (id varchar(10) primary key, tname varchar(10));
CREATE TABLE t2 (id varchar(10) primary key, tname varchar(10));
EXPLAIN SELECT /*+ INL_JOIN(t1, t2) */ * FROM t1, t2 WHERE t1.id=t2.id and SUBSTR(t1.tname,1,2)=SUBSTR(t2.tname,1,2);
```

The execution plan is as follows:

```sql
+------------------------------+----------+-----------+---------------+-----------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------+----------+-----------+---------------+-----------------------------------------------------------------------+
| HashJoin_12 | 12500.00 | root | | inner join, equal:[eq(test.t1.id, test.t2.id) eq(Column#5, Column#6)] |
| ├─Projection_17(Build) | 10000.00 | root | | test.t2.id, test.t2.tname, substr(test.t2.tname, 1, 2)->Column#6 |
| │ └─TableReader_19 | 10000.00 | root | | data:TableFullScan_18 |
| │ └─TableFullScan_18 | 10000.00 | cop[tikv] | table:t2 | keep order:false, stats:pseudo |
| └─Projection_14(Probe) | 10000.00 | root | | test.t1.id, test.t1.tname, substr(test.t1.tname, 1, 2)->Column#5 |
| └─TableReader_16 | 10000.00 | root | | data:TableFullScan_15 |
| └─TableFullScan_15 | 10000.00 | cop[tikv] | table:t1 | keep order:false, stats:pseudo |
+------------------------------+----------+-----------+---------------+-----------------------------------------------------------------------+
7 rows in set, 1 warning (0.01 sec)
```

```sql
SHOW WARNINGS;
```

```
+---------+------+------------------------------------------------------------------------------------+
| Level | Code | Message |
+---------+------+------------------------------------------------------------------------------------+
| Warning | 1815 | Optimizer Hint /*+ INL_JOIN(t1, t2) */ or /*+ TIDB_INLJ(t1, t2) */ is inapplicable |
+---------+------+------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
```
As you can see from the preceding example, the `INL_JOIN` hint does not take effect. This is due to a limitation of the optimizer that prevents using the `Projection` or `Selection` operator as the probe side of `IndexJoin`.
Starting from TiDB v8.0.0, you can avoid this issue by setting [`tidb_enable_inl_join_inner_multi_pattern`](/system-variables.md#tidb_enable_inl_join_inner_multi_pattern-new-in-v700) to `ON`.
```sql
SET @@tidb_enable_inl_join_inner_multi_pattern=ON;
Query OK, 0 rows affected (0.00 sec)
EXPLAIN SELECT /*+ INL_JOIN(t1, t2) */ * FROM t1, t2 WHERE t1.id=t2.id AND SUBSTR(t1.tname,1,2)=SUBSTR(t2.tname,1,2);
+------------------------------+--------------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------+--------------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------+
| IndexJoin_18 | 12500.00 | root | | inner join, inner:Projection_14, outer key:test.t1.id, inner key:test.t2.id, equal cond:eq(Column#5, Column#6), eq(test.t1.id, test.t2.id) |
| ├─Projection_32(Build) | 10000.00 | root | | test.t1.id, test.t1.tname, substr(test.t1.tname, 1, 2)->Column#5 |
| │ └─TableReader_34 | 10000.00 | root | | data:TableFullScan_33 |
| │ └─TableFullScan_33 | 10000.00 | cop[tikv] | table:t1 | keep order:false, stats:pseudo |
| └─Projection_14(Probe) | 100000000.00 | root | | test.t2.id, test.t2.tname, substr(test.t2.tname, 1, 2)->Column#6 |
| └─TableReader_13 | 10000.00 | root | | data:TableRangeScan_12 |
| └─TableRangeScan_12 | 10000.00 | cop[tikv] | table:t2 | range: decided by [eq(test.t2.id, test.t1.id)], keep order:false, stats:pseudo |
+------------------------------+--------------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------+
7 rows in set (0.00 sec)
```

#### `INL_JOIN` hint does not take effect due to collation incompatibility

When the collation of the join key is incompatible between two tables, the `IndexJoin` operator cannot be utilized to execute the query. In this case, the [`INL_JOIN` hint](#inl_joint1_name--tl_name-) does not take effect. For example:

Expand Down Expand Up @@ -967,7 +1038,7 @@ SHOW WARNINGS;
1 row in set (0.00 sec)
```

### `INL_JOIN` hint does not take effect because of join order
#### `INL_JOIN` hint does not take effect due to join order

The [`INL_JOIN(t1, t2)`](#inl_joint1_name--tl_name-) or `TIDB_INLJ(t1, t2)` hint semantically instructs `t1` and `t2` to act as inner tables in an `IndexJoin` operator to join with other tables, rather than directly joining them using an `IndexJoin` operator. For example:

Expand Down
2 changes: 1 addition & 1 deletion resources/doc-templates/patch_release_note_template_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ summary: 了解 TiDB x.y.z 版本的兼容性变更、改进提升,以及错

# TiDB x.y.z Release Notes

发版日期:2023 年 x 月 x 日
发版日期:2024 年 x 月 x 日

TiDB 版本:x.y.z

Expand Down
4 changes: 3 additions & 1 deletion scripts/release_notes_update_pr_author_info_add_dup.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
import openpyxl
import os
import shutil
import requests

version = '6.5.3' # Specifies the target TiDB version
release_note_excel = r'/Users/userid/Downloads/download_tirelease_tmp_patch_6.5.3_release_note_2023-06-06.xlsx' # Specifies the path of release note table with PR links and issue links
Expand Down Expand Up @@ -131,7 +132,8 @@ def update_pr_author_and_release_notes(excel_path):
# If pr_author is ti-chi-bot or ti-srebot
current_pr_author = row[pr_author_index]
current_formated_rn= row[pr_formated_rn_index]
if current_pr_author in ['ti-chi-bot', 'ti-srebot']:
pr_response = requests.get(row[pr_link_index])
if (current_pr_author in ['ti-chi-bot', 'ti-srebot']) and (pr_response.status_code == 200):
print ("Replacing the author info for row " + str(row_index) + ".")
actual_pr_author = get_pr_info_from_github(row[pr_link_index], row[pr_title_index], current_pr_author) # Get the PR author according to the cherry-pick PR
pr_author_cell = sheet.cell(row=row_index, column=pr_author_index+1, value = actual_pr_author)#Fill in the pr_author_cell
Expand Down
14 changes: 1 addition & 13 deletions statistics.md
Original file line number Diff line number Diff line change
Expand Up @@ -241,21 +241,9 @@ Before v5.3.0, TiDB uses the reservoir sampling method to collect statistics. Si

The current sampling rate is calculated based on an adaptive algorithm. When you can observe the number of rows in a table using [`SHOW STATS_META`](/sql-statements/sql-statement-show-stats-meta.md), you can use this number of rows to calculate the sampling rate corresponding to 100,000 rows. If you cannot observe this number, you can use the sum of all the values in the `APPROXIMATE_KEYS` column in the results of [`SHOW TABLE REGIONS`](/sql-statements/sql-statement-show-table-regions.md) of the table as another reference to calculate the sampling rate.

<CustomContent platform="tidb">

> **Note:**
>
> Normally, `STATS_META` is more credible than `APPROXIMATE_KEYS`. However, after importing data through the methods like [TiDB Lightning](https://docs.pingcap.com/tidb/stable/tidb-lightning-overview), the result of `STATS_META` is `0`. To handle this situation, you can use `APPROXIMATE_KEYS` to calculate the sampling rate when the result of `STATS_META` is much smaller than the result of `APPROXIMATE_KEYS`.

</CustomContent>

<CustomContent platform="tidb-cloud">

> **Note:**
>
> Normally, `STATS_META` is more credible than `APPROXIMATE_KEYS`. However, after importing data through TiDB Cloud console (see [Import Sample Data](/tidb-cloud/import-sample-data.md)), the result of `STATS_META` is `0`. To handle this situation, you can use `APPROXIMATE_KEYS` to calculate the sampling rate when the result of `STATS_META` is much smaller than the result of `APPROXIMATE_KEYS`.

</CustomContent>
> Normally, `STATS_META` is more credible than `APPROXIMATE_KEYS`. However, when the result of `STATS_META` is much smaller than the result of `APPROXIMATE_KEYS`, it is recommended that you use `APPROXIMATE_KEYS` to calculate the sampling rate.

### The memory quota for collecting statistics

Expand Down
4 changes: 2 additions & 2 deletions ticdc/ticdc-alert-rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr

- Alert rule:

`(time() - ticdc_owner_checkpoint_ts / 1000) > 600`
`ticdc_owner_checkpoint_ts_lag > 600`

- Description:

Expand All @@ -30,7 +30,7 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr

- Alert rule:

`(time() - ticdc_owner_resolved_ts / 1000) > 300`
`ticdc_owner_resolved_ts_lag > 300`

- Description:

Expand Down
2 changes: 1 addition & 1 deletion ticdc/ticdc-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ Update t1 set b = 4 where b = 2;
TiCDC generates the following two SQL statements based on the data change information, and writes them to the downstream:

```sql
INSERT INTO `test.t1` (`A`,`B`) VALUES (1,1),(2,2),(3,3);
INSERT INTO `test.t1` (`A`,`B`) VALUES (1,2),(2,2),(3,3);
UPDATE `test`.`t1`
SET `A` = CASE
WHEN `A` = 1 THEN 1
Expand Down
10 changes: 9 additions & 1 deletion ticdc/ticdc-sink-to-kafka.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,17 @@ Info: {"sink-uri":"kafka://127.0.0.1:9092/topic-name?protocol=canal-json&kafka-v
Sink URI is used to specify the connection information of the TiCDC target system. The format is as follows:

```shell
[scheme]://[userinfo@][host]:[port][/path]?[query_parameters]
[scheme]://[host]:[port][/path]?[query_parameters]
```

> **Tip:**
>
> If there are multiple hosts or ports for the downstream Kafka, you can configure multiple `[host]:[port]` in the sink URI. For example:
>
> ```shell
> [scheme]://[host]:[port],[host]:[port],[host]:[port][/path]?[query_parameters]
> ```
Sample configuration:
```shell
Expand Down
3 changes: 2 additions & 1 deletion tikv-configuration-file.md
Original file line number Diff line number Diff line change
Expand Up @@ -1039,7 +1039,8 @@ Configuration items related to Raftstore.
+ Set the specific times that TiKV initiates periodic full compaction. You can specify multiple time schedules in an array. For example:
+ `periodic-full-compact-start-times = ["03:00", "23:00"]` indicates that TiKV performs full compaction daily at 03:00 AM and 11:00 PM, based on the local time zone of the TiKV node.
+ `periodic-full-compact-start-times = ["03:00 +0000", "23:00 +0000"]` indicates that TiKV performs full compaction daily at 03:00 AM and 11:00 PM in UTC time.
+ `periodic-full-compact-start-times = ["03:00 +0000", "23:00 +0000"]` indicates that TiKV performs full compaction daily at 03:00 AM and 11:00 PM in UTC timezone.
+ `periodic-full-compact-start-times = ["03:00 +0800", "23:00 +0800"]` indicates that TiKV performs full compaction daily at 03:00 AM and 11:00 PM in UTC+08:00 timezone.
+ Default value: `[]`, which means periodic full compaction is disabled by default.

### `periodic-full-compact-start-max-cpu` <span class="version-mark">New in v7.6.0</span>
Expand Down

0 comments on commit cf0fd75

Please sign in to comment.