Merge remote-tracking branch 'upstream/master'

qiancai · Jun 4, 2024 · cf0fd75 · cf0fd75
2 parents e0be674 + b157567
commit cf0fd75
Show file tree

Hide file tree

Showing 11 changed files with 103 additions and 34 deletions.
diff --git a/basic-features.md b/basic-features.md
@@ -176,7 +176,6 @@ You can try out TiDB features on [TiDB Playground](https://play.tidbcloud.com/?u
 | [Extended statistics](/extended-statistics.md) | E | E | E | E | E | E | E | E | E |
 | Statistics feedback | N | N | N | N | Deprecated | Deprecated | E | E | E |
 | [Automatically update statistics](/statistics.md#automatic-update) | Y | Y | Y | Y | Y | Y | Y | Y | Y |
-| [Fast Analyze](/system-variables.md#tidb_enable_fast_analyze) | Deprecated | Deprecated | E | E | E | E | E | E | E |
 | [Dynamic pruning](/partitioned-table.md#dynamic-pruning-mode) | Y | Y | Y | Y | Y | E | E | E | E |
 | [Collect statistics for `PREDICATE COLUMNS`](/statistics.md#collect-statistics-on-some-columns) | E | E | E | E | E | E | N | N | N |
 | [Control the memory quota for collecting statistics](/statistics.md#the-memory-quota-for-collecting-statistics) | E | E | E | E | N | N | N | N | N |

diff --git a/br/br-snapshot-guide.md b/br/br-snapshot-guide.md
@@ -77,12 +77,6 @@ tiup br restore full --pd "${PD_IP}:2379" \
 --storage "s3://backup-101/snapshot-202209081330?access-key=${access-key}&secret-access-key=${secret-access-key}"
 ```
 
-> **Warning:**
-> 
-> The coarse-grained Region scatter algorithm (enabled by setting `--granularity="coarse-grained"`) is experimental. It is recommended that you use this feature to accelerate data recovery in clusters with up to 1,000 tables. Note that this feature does not support checkpoint restore.
-
-To further improve the restore speed of large clusters, starting from v7.6.0, BR supports a coarse-grained Region scatter algorithm (experimental) for faster parallel recovery. You can enable this algorithm by specifying `--granularity="coarse-grained"`. After it is enabled, BR can quickly split the restore task into a large number of small tasks and scatter them to all TiKV nodes in batches, thus fully utilizing the resources of each TiKV node for fast recovery in parallel.
-
 During restore, a progress bar is displayed in the terminal as shown below. When the progress bar advances to 100%, the restore task is completed and statistics such as total restore time, average restore speed, and total data size are displayed.
 
 ```shell

diff --git a/information-schema/information-schema-tikv-region-status.md b/information-schema/information-schema-tikv-region-status.md
@@ -11,13 +11,13 @@ The `TIKV_REGION_STATUS` table shows some basic information of TiKV Regions via
 >
 > This table is not available on [TiDB Serverless](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters.
 
-{{< copyable "sql" >}}
-
 ```sql
-USE information_schema;
-DESC tikv_region_status;
+USE INFORMATION_SCHEMA;
+DESC TIKV_REGION_STATUS;
 ```
 
+The output is as follows:
+
 ```sql
 +---------------------------+-------------+------+------+---------+-------+
 | Field                     | Type        | Null | Key  | Default | Extra |
@@ -31,6 +31,9 @@ DESC tikv_region_status;
 | IS_INDEX                  | tinyint(1)  | NO   |      | 0       |       |
 | INDEX_ID                  | bigint(21)  | YES  |      | NULL    |       |
 | INDEX_NAME                | varchar(64) | YES  |      | NULL    |       |
+| IS_PARTITION              | tinyint(1)  | NO   |      | 0       |       |
+| PARTITION_ID              | bigint(21)  | YES  |      | NULL    |       |
+| PARTITION_NAME            | varchar(64) | YES  |      | NULL    |       |
 | EPOCH_CONF_VER            | bigint(21)  | YES  |      | NULL    |       |
 | EPOCH_VERSION             | bigint(21)  | YES  |      | NULL    |       |
 | WRITTEN_BYTES             | bigint(21)  | YES  |      | NULL    |       |
@@ -40,7 +43,7 @@ DESC tikv_region_status;
 | REPLICATIONSTATUS_STATE   | varchar(64) | YES  |      | NULL    |       |
 | REPLICATIONSTATUS_STATEID | bigint(21)  | YES  |      | NULL    |       |
 +---------------------------+-------------+------+------+---------+-------+
-17 rows in set (0.00 sec)
+20 rows in set (0.00 sec)
 ```
 
 The descriptions of the columns in the `TIKV_REGION_STATUS` table are as follows:
@@ -54,6 +57,9 @@ The descriptions of the columns in the `TIKV_REGION_STATUS` table are as follows
 * `IS_INDEX`: Whether the Region data is an index. 0 means that it is not an index, while 1 means that it is an index. If the current Region contains both table data and index data, there will be multiple rows of records, and `IS_INDEX` is 0 and 1 respectively.
 * `INDEX_ID`: The ID of the index to which the Region belongs. If `IS_INDEX` is 0, the value of this column is NULL.
 * `INDEX_NAME`: The name of the index to which the Region belongs. If `IS_INDEX` is 0, the value of this column is NULL.
+* `IS_PARTITION`: Whether the table to which the Region belongs is partitioned.
+* `PARTITION_ID`: If the table to which the Region belongs is partitioned, this column displays the ID of the partition to which the Region belongs.
+* `PARTITION_NAME`: If the table to which the Region belongs is partitioned, this column displays the name of the partition to which the Region belongs.
 * `EPOCH_CONF_VER`: The version number of the Region configuration. The version number increases when a peer is added or removed.
 * `EPOCH_VERSION`: The current version number of the Region. The version number increases when the Region is split or merged.
 * `WRITTEN_BYTES`: The amount of data (bytes) written to the Region.

diff --git a/optimizer-hints.md b/optimizer-hints.md
@@ -139,6 +139,10 @@ SELECT /*+ NO_MERGE_JOIN(t1, t2) */ * FROM t1, t2 WHERE t1.id = t2.id;
 
 ### INL_JOIN(t1_name [, tl_name ...])
 
+> **Note:**
+>
+> In some cases, the `INL_JOIN` hint might not take effect. For more information, see [`INL_JOIN` hint does not take effect](#inl_join-hint-does-not-take-effect).
+
 The `INL_JOIN(t1_name [, tl_name ...])` hint tells the optimizer to use the index nested loop join algorithm for the given table(s). This algorithm might consume less system resources and take shorter processing time in some scenarios and might produce an opposite result in other scenarios. If the result set is less than 10,000 rows after the outer table is filtered by the `WHERE` condition, it is recommended to use this hint. For example:
 
 {{< copyable "sql" >}}
@@ -930,7 +934,74 @@ The warning is as follows:
 
 In this case, you need to place the hint directly after the `SELECT` keyword. For more details, see the [Syntax](#syntax) section.
 
-### INL_JOIN hint does not take effect due to collation incompatibility
+### `INL_JOIN` hint does not take effect
+
+#### `INL_JOIN` hint does not take effect when built-in functions are used on columns for joining tables
+
+In some cases, if you use a built-in function on a column that joins tables, the optimizer might fail to choose the `IndexJoin` plan, resulting in the `INL_JOIN` hint not taking effect either.
+
+For example, the following query uses the built-in function `substr` on the column `tname` that joins tables:
+
+```sql
+CREATE TABLE t1 (id varchar(10) primary key, tname varchar(10));
+CREATE TABLE t2 (id varchar(10) primary key, tname varchar(10));
+EXPLAIN SELECT /*+ INL_JOIN(t1, t2) */ * FROM t1, t2 WHERE t1.id=t2.id and SUBSTR(t1.tname,1,2)=SUBSTR(t2.tname,1,2);
+```
+
+The execution plan is as follows:
+
+```sql
++------------------------------+----------+-----------+---------------+-----------------------------------------------------------------------+
+| id                           | estRows  | task      | access object | operator info                                                         |
++------------------------------+----------+-----------+---------------+-----------------------------------------------------------------------+
+| HashJoin_12                  | 12500.00 | root      |               | inner join, equal:[eq(test.t1.id, test.t2.id) eq(Column#5, Column#6)] |
+| ├─Projection_17(Build)       | 10000.00 | root      |               | test.t2.id, test.t2.tname, substr(test.t2.tname, 1, 2)->Column#6      |
+| │ └─TableReader_19           | 10000.00 | root      |               | data:TableFullScan_18                                                 |
+| │   └─TableFullScan_18       | 10000.00 | cop[tikv] | table:t2      | keep order:false, stats:pseudo                                        |
+| └─Projection_14(Probe)       | 10000.00 | root      |               | test.t1.id, test.t1.tname, substr(test.t1.tname, 1, 2)->Column#5      |
+|   └─TableReader_16           | 10000.00 | root      |               | data:TableFullScan_15                                                 |
+|     └─TableFullScan_15       | 10000.00 | cop[tikv] | table:t1      | keep order:false, stats:pseudo                                        |
++------------------------------+----------+-----------+---------------+-----------------------------------------------------------------------+
+7 rows in set, 1 warning (0.01 sec)
+```
+
+```sql
+SHOW WARNINGS;
+```
+
+```
++---------+------+------------------------------------------------------------------------------------+
+| Level   | Code | Message                                                                            |
++---------+------+------------------------------------------------------------------------------------+
+| Warning | 1815 | Optimizer Hint /*+ INL_JOIN(t1, t2) */ or /*+ TIDB_INLJ(t1, t2) */ is inapplicable |
++---------+------+------------------------------------------------------------------------------------+
+1 row in set (0.00 sec)
+```
+
+As you can see from the preceding example, the `INL_JOIN` hint does not take effect. This is due to a limitation of the optimizer that prevents using the `Projection` or `Selection` operator as the probe side of `IndexJoin`.
+
+Starting from TiDB v8.0.0, you can avoid this issue by setting [`tidb_enable_inl_join_inner_multi_pattern`](/system-variables.md#tidb_enable_inl_join_inner_multi_pattern-new-in-v700) to `ON`.
+
+```sql
+SET @@tidb_enable_inl_join_inner_multi_pattern=ON;
+Query OK, 0 rows affected (0.00 sec)
+
+EXPLAIN SELECT /*+ INL_JOIN(t1, t2) */ * FROM t1, t2 WHERE t1.id=t2.id AND SUBSTR(t1.tname,1,2)=SUBSTR(t2.tname,1,2);
++------------------------------+--------------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------+
+| id                           | estRows      | task      | access object | operator info                                                                                                                              |
++------------------------------+--------------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------+
+| IndexJoin_18                 | 12500.00     | root      |               | inner join, inner:Projection_14, outer key:test.t1.id, inner key:test.t2.id, equal cond:eq(Column#5, Column#6), eq(test.t1.id, test.t2.id) |
+| ├─Projection_32(Build)       | 10000.00     | root      |               | test.t1.id, test.t1.tname, substr(test.t1.tname, 1, 2)->Column#5                                                                           |
+| │ └─TableReader_34           | 10000.00     | root      |               | data:TableFullScan_33                                                                                                                      |
+| │   └─TableFullScan_33       | 10000.00     | cop[tikv] | table:t1      | keep order:false, stats:pseudo                                                                                                             |
+| └─Projection_14(Probe)       | 100000000.00 | root      |               | test.t2.id, test.t2.tname, substr(test.t2.tname, 1, 2)->Column#6                                                                           |
+|   └─TableReader_13           | 10000.00     | root      |               | data:TableRangeScan_12                                                                                                                     |
+|     └─TableRangeScan_12      | 10000.00     | cop[tikv] | table:t2      | range: decided by [eq(test.t2.id, test.t1.id)], keep order:false, stats:pseudo                                                             |
++------------------------------+--------------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------+
+7 rows in set (0.00 sec)
+```
+
+#### `INL_JOIN` hint does not take effect due to collation incompatibility
 
 When the collation of the join key is incompatible between two tables, the `IndexJoin` operator cannot be utilized to execute the query. In this case, the [`INL_JOIN` hint](#inl_joint1_name--tl_name-) does not take effect. For example:
 
@@ -967,7 +1038,7 @@ SHOW WARNINGS;
 1 row in set (0.00 sec)
 ```
 
-### `INL_JOIN` hint does not take effect because of join order
+#### `INL_JOIN` hint does not take effect due to join order
 
 The [`INL_JOIN(t1, t2)`](#inl_joint1_name--tl_name-) or `TIDB_INLJ(t1, t2)` hint semantically instructs `t1` and `t2` to act as inner tables in an `IndexJoin` operator to join with other tables, rather than directly joining them using an `IndexJoin` operator. For example:
 

diff --git a/resources/doc-templates/patch_release_note_template_zh.md b/resources/doc-templates/patch_release_note_template_zh.md
@@ -5,7 +5,7 @@ summary: 了解 TiDB x.y.z 版本的兼容性变更、改进提升，以及错
 
 # TiDB x.y.z Release Notes
 
-发版日期：2023 年 x 月 x 日
+发版日期：2024 年 x 月 x 日
 
 TiDB 版本：x.y.z
 

diff --git a/scripts/release_notes_update_pr_author_info_add_dup.py b/scripts/release_notes_update_pr_author_info_add_dup.py
@@ -12,6 +12,7 @@
 import openpyxl
 import os
 import shutil
+import requests
 
 version = '6.5.3' # Specifies the target TiDB version
 release_note_excel = r'/Users/userid/Downloads/download_tirelease_tmp_patch_6.5.3_release_note_2023-06-06.xlsx' # Specifies the path of release note table with PR links and issue links
@@ -131,7 +132,8 @@ def update_pr_author_and_release_notes(excel_path):
         # If pr_author is ti-chi-bot or ti-srebot
         current_pr_author = row[pr_author_index]
         current_formated_rn= row[pr_formated_rn_index]
-        if current_pr_author in ['ti-chi-bot', 'ti-srebot']:
+        pr_response = requests.get(row[pr_link_index])
+        if (current_pr_author in ['ti-chi-bot', 'ti-srebot']) and (pr_response.status_code == 200):
            print ("Replacing the author info for row " + str(row_index) + ".")
            actual_pr_author = get_pr_info_from_github(row[pr_link_index], row[pr_title_index], current_pr_author) # Get the PR author according to the cherry-pick PR
            pr_author_cell = sheet.cell(row=row_index, column=pr_author_index+1, value = actual_pr_author)#Fill in the pr_author_cell

diff --git a/statistics.md b/statistics.md
@@ -241,21 +241,9 @@ Before v5.3.0, TiDB uses the reservoir sampling method to collect statistics. Si
 
 The current sampling rate is calculated based on an adaptive algorithm. When you can observe the number of rows in a table using [`SHOW STATS_META`](/sql-statements/sql-statement-show-stats-meta.md), you can use this number of rows to calculate the sampling rate corresponding to 100,000 rows. If you cannot observe this number, you can use the sum of all the values in the `APPROXIMATE_KEYS` column in the results of [`SHOW TABLE REGIONS`](/sql-statements/sql-statement-show-table-regions.md) of the table as another reference to calculate the sampling rate.
 
-<CustomContent platform="tidb">
-
 > **Note:**
 >
-> Normally, `STATS_META` is more credible than `APPROXIMATE_KEYS`. However, after importing data through the methods like [TiDB Lightning](https://docs.pingcap.com/tidb/stable/tidb-lightning-overview), the result of `STATS_META` is `0`. To handle this situation, you can use `APPROXIMATE_KEYS` to calculate the sampling rate when the result of `STATS_META` is much smaller than the result of `APPROXIMATE_KEYS`.
-
-</CustomContent>
-
-<CustomContent platform="tidb-cloud">
-
-> **Note:**
->
-> Normally, `STATS_META` is more credible than `APPROXIMATE_KEYS`. However, after importing data through TiDB Cloud console (see [Import Sample Data](/tidb-cloud/import-sample-data.md)), the result of `STATS_META` is `0`. To handle this situation, you can use `APPROXIMATE_KEYS` to calculate the sampling rate when the result of `STATS_META` is much smaller than the result of `APPROXIMATE_KEYS`.
-
-</CustomContent>
+> Normally, `STATS_META` is more credible than `APPROXIMATE_KEYS`. However, when the result of `STATS_META` is much smaller than the result of `APPROXIMATE_KEYS`, it is recommended that you use `APPROXIMATE_KEYS` to calculate the sampling rate.
 
 ### The memory quota for collecting statistics
 

diff --git a/ticdc/ticdc-alert-rules.md b/ticdc/ticdc-alert-rules.md
@@ -16,7 +16,7 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr
 
 - Alert rule:
 
-    `(time() - ticdc_owner_checkpoint_ts / 1000) > 600`
+    `ticdc_owner_checkpoint_ts_lag > 600`
 
 - Description:
 
@@ -30,7 +30,7 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr
 
 - Alert rule:
 
-    `(time() - ticdc_owner_resolved_ts / 1000) > 300`
+    `ticdc_owner_resolved_ts_lag > 300`
 
 - Description:
 

diff --git a/ticdc/ticdc-overview.md b/ticdc/ticdc-overview.md
@@ -130,7 +130,7 @@ Update t1 set b = 4 where b = 2;
 TiCDC generates the following two SQL statements based on the data change information, and writes them to the downstream:
 
 ```sql
-INSERT INTO `test.t1` (`A`,`B`) VALUES (1,1),(2,2),(3,3);
+INSERT INTO `test.t1` (`A`,`B`) VALUES (1,2),(2,2),(3,3);
 UPDATE `test`.`t1`
 SET `A` = CASE
         WHEN `A` = 1 THEN 1

diff --git a/ticdc/ticdc-sink-to-kafka.md b/ticdc/ticdc-sink-to-kafka.md
@@ -36,9 +36,17 @@ Info: {"sink-uri":"kafka://127.0.0.1:9092/topic-name?protocol=canal-json&kafka-v
 Sink URI is used to specify the connection information of the TiCDC target system. The format is as follows:
 
 ```shell
-[scheme]://[userinfo@][host]:[port][/path]?[query_parameters]
+[scheme]://[host]:[port][/path]?[query_parameters]
 ```
 
+> **Tip:**
+> 
+> If there are multiple hosts or ports for the downstream Kafka, you can configure multiple `[host]:[port]` in the sink URI. For example:
+>
+> ```shell
+> [scheme]://[host]:[port],[host]:[port],[host]:[port][/path]?[query_parameters]
+> ```
+
 Sample configuration:
 
 ```shell

diff --git a/tikv-configuration-file.md b/tikv-configuration-file.md
@@ -1039,7 +1039,8 @@ Configuration items related to Raftstore.
 
 + Set the specific times that TiKV initiates periodic full compaction. You can specify multiple time schedules in an array. For example:
     + `periodic-full-compact-start-times = ["03:00", "23:00"]` indicates that TiKV performs full compaction daily at 03:00 AM and 11:00 PM, based on the local time zone of the TiKV node.
-    + `periodic-full-compact-start-times = ["03:00 +0000", "23:00 +0000"]` indicates that TiKV performs full compaction daily at 03:00 AM and 11:00 PM in UTC time.
+    + `periodic-full-compact-start-times = ["03:00 +0000", "23:00 +0000"]` indicates that TiKV performs full compaction daily at 03:00 AM and 11:00 PM in UTC timezone.
+    + `periodic-full-compact-start-times = ["03:00 +0800", "23:00 +0800"]` indicates that TiKV performs full compaction daily at 03:00 AM and 11:00 PM in UTC+08:00 timezone.
 + Default value: `[]`, which means periodic full compaction is disabled by default.
 
 ### `periodic-full-compact-start-max-cpu` <span class="version-mark">New in v7.6.0</span>