Skip to content

Commit 7207635

Browse files
committed
Merge commit 'c3ec7513da16bfb6c84af93213c0006040dbd867' into wenxuan/fts-1
* commit 'c3ec7513da16bfb6c84af93213c0006040dbd867': ticdc: add [[filter.event-filters]] to the toml example (#20921) (#20925) Add tidb index usage limitation for valid stats (#20833) (#20912) tidb-cloud: update LOAD DATA INFILE description (#20917) (#20918) Update pad attribute where needed (#20823) (#20910) Add restriction for IMPORT INTO regarding empty (#20562) (#20908) tikv: recorrect the settings of some configs and supplement missing annotations for several configs. (#20871) (#20899) planner: add doc for `tidb_ignore_inlist_plan_digest`. (#20870) (#20890) JDBC URL: update the letter case for defaultFetchSize (#20884) (#20887) fix(pd): add missing configuration items for PD (#20883) (#20894) hardware-and-software-requirements: update Kylin Euler to Kylin (#20874) (#20881) add FAQs about collation for JDBC connections (#20848) (#20869) pd configuration: add dashboard.disable-custom-prom-addr (#20853) (#20857) releases: add one br entry to v8.1.2 (#20861) (#20864) toc: add TiDB release support policy (#19119) (#20859)
2 parents 400d339 + c3ec751 commit 7207635

20 files changed

+255
-50
lines changed

TOC.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1086,6 +1086,7 @@
10861086
- [All Releases](/releases/release-notes.md)
10871087
- [Release Timeline](/releases/release-timeline.md)
10881088
- [TiDB Versioning](/releases/versioning.md)
1089+
- [Release Support Policy](https://www.pingcap.com/tidb-release-support-policy/)
10891090
- [TiDB Installation Packages](/binary-package.md)
10901091
- v8.5
10911092
- [8.5.1](/releases/release-8.5.1.md)

character-set-and-collation.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ SELECT
7676

7777
### Character set and collation naming
7878

79-
A character set can have multiple collations, named in the `<character_set>_<collation_properties>` format. For example, the `utf8mb4` character set has a collation called `utf8mb4_bin`, which is a binary collation for `utf8mb4`. Multiple collation properties can be included in the name, separated by underscores (`_`).
79+
A character set can have multiple collations, named in the `<character_set>_<collation_properties>` format. For example, the `utf8mb4` character set has a collation called `utf8mb4_bin`, which is a binary collation for `utf8mb4`. Multiple collation properties can be included in the name, separated by underscores (`_`).
8080

8181
The following table shows the common collation properties and meanings.
8282

@@ -158,16 +158,16 @@ SHOW COLLATION WHERE Charset = 'utf8mb4';
158158
```
159159

160160
```sql
161-
+--------------------+---------+------+---------+----------+---------+
162-
| Collation | Charset | Id | Default | Compiled | Sortlen |
163-
+--------------------+---------+------+---------+----------+---------+
164-
| utf8mb4_0900_ai_ci | utf8mb4 | 255 | | Yes | 1 |
165-
| utf8mb4_0900_bin | utf8mb4 | 309 | | Yes | 1 |
166-
| utf8mb4_bin | utf8mb4 | 46 | Yes | Yes | 1 |
167-
| utf8mb4_general_ci | utf8mb4 | 45 | | Yes | 1 |
168-
| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 1 |
169-
+--------------------+---------+------+---------+----------+---------+
170-
5 rows in set (0.00 sec)
161+
+--------------------+---------+-----+---------+----------+---------+---------------+
162+
| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute |
163+
+--------------------+---------+-----+---------+----------+---------+---------------+
164+
| utf8mb4_0900_ai_ci | utf8mb4 | 255 | | Yes | 0 | NO PAD |
165+
| utf8mb4_0900_bin | utf8mb4 | 309 | | Yes | 1 | NO PAD |
166+
| utf8mb4_bin | utf8mb4 | 46 | Yes | Yes | 1 | PAD SPACE |
167+
| utf8mb4_general_ci | utf8mb4 | 45 | | Yes | 1 | PAD SPACE |
168+
| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 | PAD SPACE |
169+
+--------------------+---------+-----+---------+----------+---------+---------------+
170+
5 rows in set (0.001 sec)
171171
```
172172

173173
For details about the TiDB support of the GBK character set, see [GBK](/character-set-gbk.md).

develop/dev-guide-sample-application-java-jdbc.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,9 +13,23 @@ In this tutorial, you can learn how to use TiDB and JDBC to accomplish the follo
1313
- Connect to your TiDB cluster using JDBC.
1414
- Build and run your application. Optionally, you can find [sample code snippets](#sample-code-snippets) for basic CRUD operations.
1515

16+
<CustomContent platform="tidb">
17+
1618
> **Note:**
1719
>
18-
> This tutorial works with TiDB Cloud Serverless, TiDB Cloud Dedicated, and TiDB Self-Managed.
20+
> - This tutorial works with TiDB Cloud Serverless, TiDB Cloud Dedicated, and TiDB Self-Managed.
21+
> - Starting from TiDB v7.4, if `connectionCollation` is not configured, and `characterEncoding` is either not configured or set to `UTF-8` in the JDBC URL, the collation used in a JDBC connection depends on the JDBC driver version. For more information, see [Collation used in JDBC connections](/faq/sql-faq.md#collation-used-in-jdbc-connections).
22+
23+
</CustomContent>
24+
25+
<CustomContent platform="tidb-cloud">
26+
27+
> **Note:**
28+
>
29+
> - This tutorial works with TiDB Cloud Serverless, TiDB Cloud Dedicated, and TiDB Self-Managed.
30+
> - Starting from TiDB v7.4, if `connectionCollation` is not configured, and `characterEncoding` is either not configured or set to `UTF-8` in the JDBC URL, the collation used in a JDBC connection depends on the JDBC driver version. For more information, see [Collation used in JDBC connections](https://docs.pingcap.com/tidb/stable/sql-faq#collation-used-in-jdbc-connections).
31+
32+
</CustomContent>
1933

2034
## Prerequisites
2135

faq/sql-faq.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -337,6 +337,73 @@ Whether your cluster is a new cluster or an upgraded cluster from an earlier ver
337337
- If the owner does not exist, try manually triggering owner election with: `curl -X POST http://{TiDBIP}:10080/ddl/owner/resign`.
338338
- If the owner exists, export the Goroutine stack and check for the possible stuck location.
339339

340+
## Collation used in JDBC connections
341+
342+
This section lists questions related to collations used in JDBC connections. For information about character sets and collations supported by TiDB, see [Character Set and Collation](/character-set-and-collation.md).
343+
344+
### What collation is used in a JDBC connection when `connectionCollation` is not configured in the JDBC URL?
345+
346+
When `connectionCollation` is not configured in the JDBC URL, there are two scenarios:
347+
348+
**Scenario 1**: Neither `connectionCollation` nor `characterEncoding` is configured in the JDBC URL
349+
350+
- For Connector/J 8.0.25 and earlier versions, the JDBC driver attempts to use the server's default character set. Because the default character set of TiDB is `utf8mb4`, the driver uses `utf8mb4_bin` as the connection collation.
351+
- For Connector/J 8.0.26 and later versions, the JDBC driver uses the `utf8mb4` character set and automatically selects the collation based on the return value of `SELECT VERSION()`.
352+
353+
- When the return value is less than `8.0.1`, the driver uses `utf8mb4_general_ci` as the connection collation. TiDB follows the driver and uses `utf8mb4_general_ci` as the collation.
354+
- When the return value is greater than or equal to `8.0.1`, the driver uses `utf8mb4_0900_ai_ci` as the connection collation. TiDB v7.4.0 and later versions follow the driver and use `utf8mb4_0900_ai_ci` as the collation, while TiDB versions earlier than v7.4.0 fall back to using the default collation `utf8mb4_bin` because the `utf8mb4_0900_ai_ci` collation is not supported in these versions.
355+
356+
**Scenario 2**: `characterEncoding=utf8` is configured in the JDBC URL but `connectionCollation` is not configured. The JDBC driver uses the `utf8mb4` character set according to the mapping rules. The collation is determined according to the rules described in scenario 1.
357+
358+
### How to handle collation changes after upgrading TiDB?
359+
360+
In TiDB v7.4 and earlier versions, if `connectionCollation` is not configured, and `characterEncoding` is either not configured or set to `UTF-8` in the JDBC URL, the TiDB [`collation_connection`](/system-variables.md#collation_connection) variable defaults to the `utf8mb4_bin` collation.
361+
362+
Starting from TiDB v7.4, if `connectionCollation` is not configured, and `characterEncoding` is either not configured or set to `UTF-8` in the JDBC URL, the value of the [`collation_connection`](/system-variables.md#collation_connection) variable depends on the JDBC driver version. For example, for Connector/J 8.0.26 and later versions, the JDBC driver defaults to the `utf8mb4` character set and uses `utf8mb4_general_ci` as the connection collation. TiDB follows the driver, and the [`collation_connection`](/system-variables.md#collation_connection) variable uses the `utf8mb4_0900_ai_ci` collation. For more information, see [Collation used in JDBC connections](#what-collation-is-used-in-a-jdbc-connection-when-connectioncollation-is-not-configured-in-the-jdbc-url).
363+
364+
When upgrading from an earlier version to v7.4 or later (for example, from v6.5 to v7.5), if you need to maintain the `collation_connection` as `utf8mb4_bin` for JDBC connections, it is recommended to configure the `connectionCollation` parameter in the JDBC URL.
365+
366+
The following is a common JDBC URL configuration in TiDB v6.5:
367+
368+
```
369+
spring.datasource.url=JDBC:mysql://{TiDBIP}:{TiDBPort}/{DBName}?characterEncoding=UTF-8&useSSL=false&useServerPrepStmts=true&cachePrepStmts=true&prepStmtCacheSqlLimit=10000&prepStmtCacheSize=1000&useConfigs=maxPerformance&rewriteBatchedStatements=true&defaultFetchSize=-2147483648&allowMultiQueries=true
370+
```
371+
372+
After upgrading to TiDB v7.5 or a later version, it is recommended to configure the `connectionCollation` parameter in the JDBC URL:
373+
374+
```
375+
spring.datasource.url=JDBC:mysql://{TiDBIP}:{TiDBPort}/{DBName}?characterEncoding=UTF-8&connectionCollation=utf8mb4_bin&useSSL=false&useServerPrepStmts=true&cachePrepStmts=true&prepStmtCacheSqlLimit=10000&prepStmtCacheSize=1000&useConfigs=maxPerformance&rewriteBatchedStatements=true&defaultFetchSize=-2147483648&allowMultiQueries=true
376+
```
377+
378+
### What are the differences between the `utf8mb4_bin` and `utf8mb4_0900_ai_ci` collations?
379+
380+
| Collation | Case-sensitive | Ignore trailing spaces | Accent-sensitive | Comparison method |
381+
|----------------------|----------------|------------------|--------------|------------------------|
382+
| `utf8mb4_bin` | Yes | Yes | Yes | Compare binary values |
383+
| `utf8mb4_0900_ai_ci` | No | No | No | Use Unicode sorting algorithm |
384+
385+
For example:
386+
387+
```sql
388+
-- utf8mb4_bin is case-sensitive
389+
SELECT 'apple' = 'Apple' COLLATE utf8mb4_bin; -- Returns 0 (FALSE)
390+
391+
-- utf8mb4_0900_ai_ci is case-insensitive
392+
SELECT 'apple' = 'Apple' COLLATE utf8mb4_0900_ai_ci; -- Returns 1 (TRUE)
393+
394+
-- utf8mb4_bin ignores trailing spaces
395+
SELECT 'Apple ' = 'Apple' COLLATE utf8mb4_bin; -- Returns 1 (TRUE)
396+
397+
-- utf8mb4_0900_ai_ci does not ignore trailing spaces
398+
SELECT 'Apple ' = 'Apple' COLLATE utf8mb4_0900_ai_ci; -- Returns 0 (FALSE)
399+
400+
-- utf8mb4_bin is accent-sensitive
401+
SELECT 'café' = 'cafe' COLLATE utf8mb4_bin; -- Returns 0 (FALSE)
402+
403+
-- utf8mb4_0900_ai_ci is accent-insensitive
404+
SELECT 'café' = 'cafe' COLLATE utf8mb4_0900_ai_ci; -- Returns 1 (TRUE)
405+
```
406+
340407
## SQL optimization
341408
342409
### TiDB execution plan description

faq/upgrade-faq.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,12 @@ It is not recommended to upgrade TiDB using the binary. Instead, it is recommend
3535

3636
This section lists some FAQs and their solutions after you upgrade TiDB.
3737

38+
### The collation in JDBC connections changes after upgrading TiDB
39+
40+
When upgrading from an earlier version to v7.4 or later, if the `connectionCollation` is not configured, and the `characterEncoding` is either not configured or configured as `UTF-8` in the JDBC URL, the default collation in your JDBC connections might change from `utf8mb4_bin` to `utf8mb4_0900_ai_ci` after upgrading. If you need to maintain the collation as `utf8mb4_bin`, configure `connectionCollation=utf8mb4_bin` in the JDBC URL.
41+
42+
For more information, see [Collation used in JDBC connections](/faq/sql-faq.md#collation-used-in-jdbc-connections).
43+
3844
### The character set (charset) errors when executing DDL operations
3945

4046
In v2.1.0 and earlier versions (including all versions of v2.0), the character set of TiDB is UTF-8 by default. But starting from v2.1.1, the default character set has been changed into UTF8MB4.

hardware-and-software-requirements.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ In v8.5 LTS, TiDB ensures multi-level quality standards for various combinations
4646
<td><ul><li>x86_64</li><li>ARM 64</li></ul></td>
4747
</tr>
4848
<tr>
49-
<td>Kylin Euler V10 SP1/SP2/SP3 (SP3 is supported starting from v7.5.5)</td>
49+
<td>Kylin V10 SP1/SP2/SP3 (SP3 is supported starting from v7.5.5)</td>
5050
<td><ul><li>x86_64</li><li>ARM 64</li></ul></td>
5151
</tr>
5252
<tr>

information-schema/information-schema-collations.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -7,8 +7,6 @@ summary: Learn the `COLLATIONS` information_schema table.
77

88
The `COLLATIONS` table provides a list of collations that correspond to character sets in the `CHARACTER_SETS` table. Currently, this table is included only for compatibility with MySQL.
99

10-
{{< copyable "sql" >}}
11-
1210
```sql
1311
USE information_schema;
1412
DESC collations;
@@ -20,29 +18,30 @@ DESC collations;
2018
+--------------------+-------------+------+------+---------+-------+
2119
| COLLATION_NAME | varchar(32) | YES | | NULL | |
2220
| CHARACTER_SET_NAME | varchar(32) | YES | | NULL | |
23-
| ID | bigint(11) | YES | | NULL | |
21+
| ID | bigint | YES | | NULL | |
2422
| IS_DEFAULT | varchar(3) | YES | | NULL | |
2523
| IS_COMPILED | varchar(3) | YES | | NULL | |
26-
| SORTLEN | bigint(3) | YES | | NULL | |
24+
| SORTLEN | bigint | YES | | NULL | |
25+
| PAD_ATTRIBUTE | varchar(9) | YES | | NULL | |
2726
+--------------------+-------------+------+------+---------+-------+
28-
6 rows in set (0.00 sec)
27+
7 rows in set (0.001 sec)
2928
```
3029

31-
{{< copyable "sql" >}}
32-
3330
```sql
3431
SELECT * FROM collations WHERE character_set_name='utf8mb4';
3532
```
3633

3734
```sql
38-
+--------------------+--------------------+------+------------+-------------+---------+
39-
| COLLATION_NAME | CHARACTER_SET_NAME | ID | IS_DEFAULT | IS_COMPILED | SORTLEN |
40-
+--------------------+--------------------+------+------------+-------------+---------+
41-
| utf8mb4_bin | utf8mb4 | 46 | Yes | Yes | 1 |
42-
| utf8mb4_general_ci | utf8mb4 | 45 | | Yes | 1 |
43-
| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 1 |
44-
+--------------------+--------------------+------+------------+-------------+---------+
45-
3 rows in set (0.001 sec)
35+
+--------------------+--------------------+------+------------+-------------+---------+---------------+
36+
| COLLATION_NAME | CHARACTER_SET_NAME | ID | IS_DEFAULT | IS_COMPILED | SORTLEN | PAD_ATTRIBUTE |
37+
+--------------------+--------------------+------+------------+-------------+---------+---------------+
38+
| utf8mb4_0900_ai_ci | utf8mb4 | 255 | | Yes | 0 | NO PAD |
39+
| utf8mb4_0900_bin | utf8mb4 | 309 | | Yes | 1 | NO PAD |
40+
| utf8mb4_bin | utf8mb4 | 46 | Yes | Yes | 1 | PAD SPACE |
41+
| utf8mb4_general_ci | utf8mb4 | 45 | | Yes | 1 | PAD SPACE |
42+
| utf8mb4_unicode_ci | utf8mb4 | 224 | | Yes | 8 | PAD SPACE |
43+
+--------------------+--------------------+------+------------+-------------+---------+---------------+
44+
5 rows in set (0.001 sec)
4645
```
4746

4847
The description of columns in the `COLLATIONS` table is as follows:
@@ -53,6 +52,7 @@ The description of columns in the `COLLATIONS` table is as follows:
5352
* `IS_DEFAULT`: Whether this collation is the default collation of the character set it belongs to.
5453
* `IS_COMPILED`: Whether the character set is compiled into the server.
5554
* `SORTLEN`: The minimum length of memory allocated when the collation sorts characters.
55+
* `PAD_ATTRIBUTE`: Whether trailing spaces are ignored during string comparison. `PAD SPACE` means that trailing spaces are ignored (for example, `'abc'` equals `'abc '`), while `NO PAD` means that trailing spaces are significant (for example, `'abc'` does not equal `'abc '`).
5656

5757
## See also
5858

information-schema/information-schema-tidb-index-usage.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,7 @@ The output is as follows:
101101

102102
- The data in the `TIDB_INDEX_USAGE` table might be delayed by up to 5 minutes.
103103
- After TiDB restarts, the data in the `TIDB_INDEX_USAGE` table is cleared.
104+
- TiDB records index usage for a table only when the table has valid statistics.
104105

105106
## Read more
106107

pd-configuration-file.md

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,22 @@ This document only describes parameters that are not included in command-line pa
9696
+ The time interval for automatic compaction of the meta-information database when `auto-compaction-retention` is `periodic`. When the compaction mode is set to `revision`, this parameter indicates the version number for the automatic compaction.
9797
+ Default value: 1h
9898
99+
### `tick-interval`
100+
101+
+ Equivalent to the `heartbeat-interval` configuration item of etcd. It controls the Raft heartbeat interval between embedded etcd instances in different PD nodes. A smaller value accelerates failure detection but increases network load.
102+
+ Default value: `500ms`
103+
104+
### `election-interval`
105+
106+
+ Equivalent to the `election-timeout` configuration item of etcd. It controls the election timeout for embedded etcd instances in PD nodes. If an etcd instance does not receive a valid heartbeat from other etcd instances within this period, it initiates a Raft election.
107+
+ Default value: `3000ms`
108+
+ This value must be at least five times the [`tick-interval`](#tick-interval). For example, if `tick-interval` is `500ms`, `election-interval` must be greater than or equal to `2500ms`.
109+
110+
### `enable-prevote`
111+
112+
+ Equivalent to the `pre-vote` configuration item of etcd. It controls whether the embedded etcd in the PD node enables Raft pre-vote. When enabled, etcd performs an additional election phase to check whether enough votes can be obtained to win the election, minimizing service disruption.
113+
+ Default value: `true`
114+
99115
### `force-new-cluster`
100116
101117
+ Determines whether to force PD to start as a new cluster and modify the number of Raft members to `1`
@@ -412,6 +428,16 @@ Configuration items related to scheduling
412428
+ Specifies how many days the hot Region information is retained.
413429
+ Default value: `7`
414430
431+
### `enable-heartbeat-breakdown-metrics` <span class="version-mark">New in v8.0.0</span>
432+
433+
+ Controls whether to enable breakdown metrics for Region heartbeats. These metrics measure the time consumed in each stage of Region heartbeat processing, facilitating analysis through monitoring.
434+
+ Default value: `true`
435+
436+
### `enable-heartbeat-concurrent-runner` <span class="version-mark">New in v8.0.0</span>
437+
438+
+ Controls whether to enable asynchronous concurrent processing for Region heartbeats. When enabled, an independent executor handles Region heartbeat requests asynchronously and concurrently, which can improve heartbeat processing throughput and reduce latency.
439+
+ Default value: `true`
440+
415441
## `replication`
416442
417443
Configuration items related to replicas
@@ -466,6 +492,12 @@ Configuration items related to labels, which only support the `reject-leader` ty
466492
467493
Configuration items related to the [TiDB Dashboard](/dashboard/dashboard-intro.md) built in PD.
468494
495+
### `disable-custom-prom-addr`
496+
497+
+ Whether to disable configuring a custom Prometheus data source address in [TiDB Dashboard](/dashboard/dashboard-intro.md).
498+
+ Default value: `false`
499+
+ When it is set to `true`, if you configure a custom Prometheus data source address in TiDB Dashboard, TiDB Dashboard reports an error.
500+
469501
### `tidb-cacert-path`
470502
471503
+ The path of the root CA certificate file. You can configure this path when you connect to TiDB's SQL services using TLS.
@@ -540,4 +572,4 @@ The following are the configuration items about the [Request Unit (RU)](/tidb-re
540572
541573
+ Basis factor for conversion from CPU to RU
542574
+ Default value: 1/3
543-
+ 1 RU = 3 millisecond CPU time
575+
+ 1 RU = 3 millisecond CPU time

0 commit comments

Comments
 (0)