diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 188dcbce0c819..3d9ccc98b1ad2 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -4,14 +4,7 @@ Welcome to [TiDB](https://github.com/pingcap/tidb) documentation! We are excited ## What you can contribute -🚀 To provide you with better TiDB documentation, we sincerely invite you to participate in the [2024 TiDB Docs Dash](https://www.pingcap.com/event/tidb-docs-dash/). In this event, you'll have a chance to work with other members of the community while making a meaningful impact on [TiDB documentation](https://docs.pingcap.com/tidb/stable/) and [TiDB Cloud documentation](https://docs.pingcap.com/tidbcloud/). - -- **Dates/Time:** January 9 at 08:00 UTC ([your local time](https://www.timeanddate.com/worldclock/fixedtime.html?msg=TiDB+Docs+Dash+2024%3A+Start&iso=20240109T08&p1=1440)) – January 12 at 07:59 UTC ([your local time](https://www.timeanddate.com/worldclock/fixedtime.html?msg=TiDB+Docs+Dash+2024%3A+End&iso=20240112T0759&p1=1440)) -- **Event details**: -- **Issue list**: -- **Participation introduction**: - -In addition to the issues and tasks in the event, you can also start from any one of the following items to help improve [TiDB Docs at the PingCAP website](https://docs.pingcap.com/tidb/stable): +You can start from any one of the following items to help improve [TiDB documentation at the PingCAP website](https://docs.pingcap.com/tidb/stable): - Fix typos or format (punctuation, space, indentation, code block, etc.) - Fix or update inappropriate or outdated descriptions @@ -161,6 +154,59 @@ If your change fits one of the following situations, **CHOOSE THE AFFECTED RELEA - Fixes format to resolve a display error - Fixes broken links +## Guideline for contributing to TiDB Cloud documentation + +Currently, the [TiDB Cloud documentation](https://docs.pingcap.com/tidbcloud/) is available only in English, and it is stored in the [release-7.5](https://github.com/pingcap/docs/tree/release-7.5/tidb-cloud) branch of this repository for reusing SQL documents and development documents of TiDB v7.5. Hence, to create a pull request for TiDB Cloud documentation, make sure that your PR is based on the [release-7.5](https://github.com/pingcap/docs/tree/release-7.5) branch. + +> **Tip:** +> +> To learn which TiDB document is reused by TiDB Cloud, check the [TOC file of TiDB Cloud documentation](https://github.com/pingcap/docs/blob/release-7.5/TOC-tidb-cloud.md?plain=1). +> +> - If the path of a document in this file starts with `/tidb-cloud/`, it means that this document is only for TiDB Cloud. +> - If the path of a document in this file does not start with `/tidb-cloud/`, it means that this TiDB document is reused by TiDB Cloud. + +In some TiDB documents that are reused by TiDB Cloud, you might notice `CustomContent` tags. These `CustomContent` tags are used to show the dedicated content of TiDB or TiDB Cloud. + +For example: + +```Markdown +## Restrictions + + + +* The TiDB memory limit on the `INSERT INTO SELECT` statement can be adjusted using the system variable [`tidb_mem_quota_query`](/system-variables.md#tidb_mem_quota_query). Starting from v6.5.0, it is not recommended to use [`txn-total-size-limit`](/tidb-configuration-file.md#txn-total-size-limit) to control transaction memory size. + + For more information, see [TiDB memory control](/configure-memory-usage.md). + + + + + +* The TiDB memory limit on the `INSERT INTO SELECT` statement can be adjusted using the system variable [`tidb_mem_quota_query`](/system-variables.md#tidb_mem_quota_query). Starting from v6.5.0, it is not recommended to use [`txn-total-size-limit`](https://docs.pingcap.com/tidb/stable/tidb-configuration-file#txn-total-size-limit) to control transaction memory size. + + For more information, see [TiDB memory control](https://docs.pingcap.com/tidb/stable/configure-memory-usage). + + + +* TiDB has no hard limit on the concurrency of the `INSERT INTO SELECT` statement, but it is recommended to consider the following practices: + + * When a "write transaction" is large, such as close to 1 GiB, it is recommended to control concurrency to no more than 10. + * When a "write transaction" is small, such as less than 100 MiB, it is recommended to control concurrency to no more than 30. + * Determine the concurrency based on testing results and specific circumstances. +``` + +In the example: + +- The content within the `` tag is only applicable to TiDB and will not be displayed on the [TiDB Cloud documentation](https://docs.pingcap.com/tidbcloud/) website. +- The content within the ``tag is only applicable to TiDB Cloud and will not be displayed on the [TiDB documentation](https://docs.pingcap.com/tidb/stable) website. +- The content that are not wrapped by `` tag are applicable to both TiDB and TiDB Cloud and will be displayed on both documentation websites. + +## Guideline for previewing EBNF diagrams + +[TiDB documentation](https://docs.pingcap.com/tidb/stable) provides a lot of SQL synopsis diagrams to help users understand the SQL syntax. For example, you can find the synopsis diagrams for the `ALTER INDEX` statement [here](https://docs.pingcap.com/tidb/stable/sql-statement-alter-index#synopsis). + +The source of these synopsis diagrams is written using [extended Backus–Naur form (EBNF)](https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form). When preparing the EBNF code for a SQL statement, you can easily preview the EBNF diagram by copying the code to and clicking **Render**. + ## Contact Join [Discord](https://discord.gg/DQZ2dy3cuc?utm_source=doc) for discussion. diff --git a/OWNERS b/OWNERS index ec95e7e30cb68..928908987c355 100644 --- a/OWNERS +++ b/OWNERS @@ -10,6 +10,7 @@ approvers: - dragonly - en-jin19 - hfxsd + - Icemap - jackysp - kissmydb - lance6716 @@ -40,7 +41,6 @@ reviewers: - ericsyh - glkappe - GMHDBJD - - Icemap - Joyinqin - junlan-zhang - KanShiori diff --git a/TOC-tidb-cloud.md b/TOC-tidb-cloud.md index 1dbb4039cdf8f..b50aae1a4cdce 100644 --- a/TOC-tidb-cloud.md +++ b/TOC-tidb-cloud.md @@ -585,6 +585,7 @@ - [`TABLE_STORAGE_STATS`](/information-schema/information-schema-table-storage-stats.md) - [`TIDB_HOT_REGIONS_HISTORY`](/information-schema/information-schema-tidb-hot-regions-history.md) - [`TIDB_INDEXES`](/information-schema/information-schema-tidb-indexes.md) + - [`TIDB_INDEX_USAGE`](/information-schema/information-schema-tidb-index-usage.md) - [`TIDB_SERVERS_INFO`](/information-schema/information-schema-tidb-servers-info.md) - [`TIDB_TRX`](/information-schema/information-schema-tidb-trx.md) - [`TIFLASH_REPLICA`](/information-schema/information-schema-tiflash-replica.md) @@ -602,7 +603,7 @@ - [`SESSION_CONNECT_ATTRS`](/performance-schema/performance-schema-session-connect-attrs.md) - [Metadata Lock](/metadata-lock.md) - [Use UUIDs](/best-practices/uuid.md) - - [TiDB DDL V2](/ddl-v2.md) + - [TiDB Accelerated Table Creation](/accelerated-table-creation.md) - [System Variables](/system-variables.md) - [Server Status Variables](/status-variables.md) - Storage Engines diff --git a/TOC.md b/TOC.md index 94ce25a8736ee..ed8ec3c253e6d 100644 --- a/TOC.md +++ b/TOC.md @@ -328,6 +328,7 @@ - [Use Load Base Split](/configure-load-base-split.md) - [Use Store Limit](/configure-store-limit.md) - [DDL Execution Principles and Best Practices](/ddl-introduction.md) + - [Use PD Microservices](/pd-microservices.md) - TiDB Tools - [Overview](/ecosystem-tool-user-guide.md) - [Use Cases](/ecosystem-tool-user-case.md) @@ -446,6 +447,7 @@ - [Binlog Event Filter](/dm/dm-binlog-event-filter.md) - [Filter DMLs Using SQL Expressions](/dm/feature-expression-filter.md) - [Online DDL Tool Support](/dm/dm-online-ddl-tool-support.md) + - [Customize a Secret Key for Encryption and Decryption](/dm/dm-customized-secret-key.md) - Manage a Data Migration Task - [Precheck a Task](/dm/dm-precheck.md) - [Create a Task](/dm/dm-create-task.md) @@ -575,8 +577,10 @@ - Output Protocols - [TiCDC Avro Protocol](/ticdc/ticdc-avro-protocol.md) - [TiCDC Canal-JSON Protocol](/ticdc/ticdc-canal-json.md) - - [TiCDC Open Protocol](/ticdc/ticdc-open-protocol.md) - [TiCDC CSV Protocol](/ticdc/ticdc-csv.md) + - [TiCDC Debezium Protocol](/ticdc/ticdc-debezium.md) + - [TiCDC Open Protocol](/ticdc/ticdc-open-protocol.md) + - [TiCDC Simple Protocol](/ticdc/ticdc-simple-protocol.md) - [TiCDC Open API v2](/ticdc/ticdc-open-api-v2.md) - [TiCDC Open API v1](/ticdc/ticdc-open-api.md) - TiCDC Data Consumption @@ -960,6 +964,7 @@ - [`TIDB_HOT_REGIONS`](/information-schema/information-schema-tidb-hot-regions.md) - [`TIDB_HOT_REGIONS_HISTORY`](/information-schema/information-schema-tidb-hot-regions-history.md) - [`TIDB_INDEXES`](/information-schema/information-schema-tidb-indexes.md) + - [`TIDB_INDEX_USAGE`](/information-schema/information-schema-tidb-index-usage.md) - [`TIDB_SERVERS_INFO`](/information-schema/information-schema-tidb-servers-info.md) - [`TIDB_TRX`](/information-schema/information-schema-tidb-trx.md) - [`TIFLASH_REPLICA`](/information-schema/information-schema-tiflash-replica.md) @@ -976,8 +981,9 @@ - PERFORMANCE_SCHEMA - [Overview](/performance-schema/performance-schema.md) - [`SESSION_CONNECT_ATTRS`](/performance-schema/performance-schema-session-connect-attrs.md) + - [`SYS`](/sys-schema.md) - [Metadata Lock](/metadata-lock.md) - - [TiDB DDL V2](/ddl-v2.md) + - [TiDB Accelerated Table Creation](/accelerated-table-creation.md) - UI - TiDB Dashboard - [Overview](/dashboard/dashboard-intro.md) @@ -1035,6 +1041,7 @@ - v7.6 - [7.6.0](/releases/release-7.6.0.md) - v7.5 + - [7.5.1](/releases/release-7.5.1.md) - [7.5.0](/releases/release-7.5.0.md) - v7.4 - [7.4.0-DMR](/releases/release-7.4.0.md) @@ -1043,6 +1050,7 @@ - v7.2 - [7.2.0-DMR](/releases/release-7.2.0.md) - v7.1 + - [7.1.4](/releases/release-7.1.4.md) - [7.1.3](/releases/release-7.1.3.md) - [7.1.2](/releases/release-7.1.2.md) - [7.1.1](/releases/release-7.1.1.md) diff --git a/accelerated-table-creation.md b/accelerated-table-creation.md new file mode 100644 index 0000000000000..f97eec9d5cfd0 --- /dev/null +++ b/accelerated-table-creation.md @@ -0,0 +1,57 @@ +--- +title: TiDB Accelerated Table Creation +summary: Learn the concept, principles, and implementation details of performance optimization for creating tables in TiDB. +aliases: ['/tidb/dev/ddl-v2/'] +--- + +# TiDB Accelerated Table Creation + +TiDB v7.6.0 introduces the system variable [`tidb_ddl_version`](https://docs.pingcap.com/tidb/v7.6/system-variables#tidb_enable_fast_create_table-new-in-v800) to support accelerating table creation, which improves the efficiency of bulk table creation. Starting from v8.0.0, this system variable is renamed to [`tidb_enable_fast_create_table`](/system-variables.md#tidb_enable_fast_create_table-new-in-v800). + +TiDB uses the online asynchronous schema change algorithm to change the metadata. All DDL jobs are submitted to the `mysql.tidb_ddl_job` table, and the owner node pulls the DDL job to execute. After executing each phase of the online DDL algorithm, the DDL job is marked as completed and moved to the `mysql.tidb_ddl_history` table. Therefore, DDL statements can only be executed on the owner node and cannot be linearly extended. + +However, for some DDL statements, it is not necessary to strictly follow the online DDL algorithm. For example, the `CREATE TABLE` statement only has two states for the job: `none` and `public`. Therefore, TiDB can simplify the execution process of DDL, and executes the `CREATE TABLE` statement on a non-owner node to accelerate table creation. + +> **Warning:** +> +> This feature is currently an experimental feature and it is not recommended to use in a production environment. This feature might change or be removed without prior notice. If you find a bug, please give feedback by raising an [issue](https://github.com/pingcap/tidb/issues) on GitHub. + +## Compatibility with TiDB tools + +- [TiCDC](https://docs.pingcap.com/tidb/stable/ticdc-overview) does not support replicating the tables that are created by `tidb_enable_fast_create_table`. + +## Limitation + +You can now use performance optimization for table creation only in the [`CREATE TABLE`](/sql-statements/sql-statement-create-table.md) statement, and this statement must not include any foreign key constraints. + +## Use `tidb_enable_fast_create_table` to accelerate table creation + +You can enable or disable performance optimization for creating tables by specifying the value of the system variable [`tidb_enable_fast_create_table`](/system-variables.md#tidb_enable_fast_create_table-new-in-v800). + +To enable performance optimization for creating tables, set the value of this variable to `ON`: + +```sql +SET GLOBAL tidb_enable_fast_create_table = ON; +``` + +To disable performance optimization for creating tables, set the value of this variable to `OFF`: + +```sql +SET GLOBAL tidb_enable_fast_create_table = OFF; +``` + +## Implementation principle + +The detailed implementation principle of performance optimization for table creation is as follows: + +1. Create a `CREATE TABLE` Job. + + The corresponding DDL Job is generated by parsing the `CREATE TABLE` statement. + +2. Execute the `CREATE TABLE` job. + + The TiDB node that receives the `CREATE TABLE` statement executes it directly, and then persists the table structure to TiKV. At the same time, the `CREATE TABLE` job is marked as completed and inserted into the `mysql.tidb_ddl_history` table. + +3. Synchronize the table information. + + TiDB notifies other nodes to synchronize the newly created table structure. diff --git a/basic-features.md b/basic-features.md index 6a68246b8b849..0821d194c9a63 100644 --- a/basic-features.md +++ b/basic-features.md @@ -132,7 +132,7 @@ You can try out TiDB features on [TiDB Playground](https://play.tidbcloud.com/?u | [Metadata lock](/metadata-lock.md) | Y | Y | Y | Y | N | N | N | N | N | N | N | | [`FLASHBACK CLUSTER`](/sql-statements/sql-statement-flashback-cluster.md) | Y | Y | Y | Y | N | N | N | N | N | N | N | | [Pause](/sql-statements/sql-statement-admin-pause-ddl.md)/[Resume](/sql-statements/sql-statement-admin-resume-ddl.md) DDL | Y | Y | N | N | N | N | N | N | N | N | N | -| [TiDB DDL V2](/ddl-v2.md) | E | N | N | N | N | N | N | N | N | N | N | +| [TiDB Accelerated Table Creation](/accelerated-table-creation.md) | N | N | N | N | N | N | N | N | N | N | N | ## Transactions @@ -180,7 +180,7 @@ You can try out TiDB features on [TiDB Playground](https://play.tidbcloud.com/?u | [Collect statistics for `PREDICATE COLUMNS`](/statistics.md#collect-statistics-on-some-columns) | E | E | E | E | E | E | E | N | N | N | N | | [Control the memory quota for collecting statistics](/statistics.md#the-memory-quota-for-collecting-statistics) | E | E | E | E | E | N | N | N | N | N | N | | [Randomly sample about 10000 rows of data to quickly build statistics](/system-variables.md#tidb_enable_fast_analyze) | Deprecated | Deprecated | E | E | E | E | E | E | E | E | E | -| [Lock statistics](/statistics.md#lock-statistics) | E | E | E | E | N | N | N | N | N | N | N | +| [Lock statistics](/statistics.md#lock-statistics) | Y | Y | E | E | N | N | N | N | N | N | N | | [Lightweight statistics initialization](/statistics.md#load-statistics) | Y | Y | E | N | N | N | N | N | N | N | N | | [Show the progress of collecting statistics](/sql-statements/sql-statement-show-analyze-status.md) | Y | Y | N | N | N | N | N | N | N | N | N | @@ -194,7 +194,7 @@ You can try out TiDB features on [TiDB Playground](https://play.tidbcloud.com/?u | [Certificate-based authentication](/certificate-authentication.md) | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | Y | | [`caching_sha2_password` authentication](/system-variables.md#default_authentication_plugin) | Y | Y | Y | Y | Y | Y | Y | Y | N | N | N | | [`tidb_sm3_password` authentication](/system-variables.md#default_authentication_plugin) | Y | Y | Y | Y | N | N | N | N | N | N | N | -| [`tidb_auth_token` authentication](/system-variables.md#default_authentication_plugin) | Y | Y | Y | Y | N | N | N | N | N | N | N | +| [`tidb_auth_token` authentication](/security-compatibility-with-mysql.md#tidb_auth_token) | Y | Y | Y | Y | N | N | N | N | N | N | N | | [`authentication_ldap_sasl` authentication](/system-variables.md#default_authentication_plugin) | Y | Y | N | N | N | N | N | N | N | N | | [`authentication_ldap_simple` authentication](/system-variables.md#default_authentication_plugin) | Y | Y | Y | N | N | N | N | N | N | N | N | | [Password management](/password-management.md) | Y | Y | Y | Y | N | N | N | N | N | N | N | diff --git a/best-practices/haproxy-best-practices.md b/best-practices/haproxy-best-practices.md index 1b476f29d05e8..061a2bdac305c 100644 --- a/best-practices/haproxy-best-practices.md +++ b/best-practices/haproxy-best-practices.md @@ -35,12 +35,12 @@ Before you deploy HAProxy, make sure that you meet the hardware and software req ### Hardware requirements -For your server, it is recommended to meet the following hardware requirements. You can also improve server specifications according to the load balancing environment. +According to the [HAProxy documentation](https://www.haproxy.com/documentation/haproxy-enterprise/getting-started/installation/linux/), the minimum hardware configuration for HAProxy is shown in the following table. Under the Sysbench `oltp_read_write` workload, the maximum QPS for this configuration is about 50K. You can increase the server configuration according to your load balancing environment. | Hardware resource | Minimum specification | | :--------------------- | :-------------------- | | CPU | 2 cores, 3.5 GHz | -| Memory | 16 GB | +| Memory | 4 GB | | Storage | 50 GB (SATA) | | Network Interface Card | 10G Network Card | diff --git a/br/backup-and-restore-storages.md b/br/backup-and-restore-storages.md index eb25c40104a9e..09cd3352ec01f 100644 --- a/br/backup-and-restore-storages.md +++ b/br/backup-and-restore-storages.md @@ -260,4 +260,10 @@ BR supports specifying the Azure server-side encryption scope or providing the e ## Other features supported by the storage service -BR v6.3.0 supports AWS [S3 Object Lock](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lock.html). You can enable this feature to prevent backup data from being tampered with or deleted. +Amazon [S3 Object Lock](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-lock.html) can help prevent backup data from accidental or intentional deletion during a specified retention period, enhancing the security and integrity of data. Starting from v6.3.0, BR supports Amazon S3 Object Lock for snapshot backups, adding an additional layer of security for full backups. Starting from v8.0.0, PITR also supports Amazon S3 Object Lock. Whether for full backups or log data backups, the Object Lock feature ensures more reliable data protection, further strengthening the security of data backup and recovery and meeting regulatory requirements. + +BR and PITR automatically detect whether the Amazon S3 Object Lock feature is enabled or disabled. You do not need to perform any additional operations. + +> **Warning:** +> +> If the Object Lock feature is enabled during the snapshot backup or PITR log backup process, the snapshot backup or log backup might fail. You need to restart the snapshot backup or PITR log backup task to continue the backup. diff --git a/certificate-authentication.md b/certificate-authentication.md index 7f7dd5b160a13..56e6d22296b91 100644 --- a/certificate-authentication.md +++ b/certificate-authentication.md @@ -241,7 +241,7 @@ ssl-ca="path/to/ca-cert.pem" Start TiDB and check logs. If the following information is displayed in the log, the configuration is successful: ``` -[INFO] [server.go:264] ["secure connection is enabled"] ["client verification enabled"=true] +[INFO] [server.go:286] ["mysql protocol server secure connection is enabled"] ["client verification enabled"=true] ``` ### Configure the client to use client certificate @@ -266,9 +266,9 @@ First, connect TiDB using the client to configure the login verification. Then, ### Get user certificate information -The user certificate information can be specified by `require subject`, `require issuer`, `require san`, and `require cipher`, which are used to check the X509 certificate attributes. +The user certificate information can be specified by `REQUIRE SUBJECT`, `REQUIRE ISSUER`, `REQUIRE SAN`, and `REQUIRE CIPHER`, which are used to check the X.509 certificate attributes. -+ `require subject`: Specifies the `subject` information of the client certificate when you log in. With this option specified, you do not need to configure `require ssl` or x509. The information to be specified is consistent with the entered `subject` information in [Generate client keys and certificates](#generate-client-key-and-certificate). ++ `REQUIRE SUBJECT`: Specifies the subject information of the client certificate when you log in. With this option specified, you do not need to configure `require ssl` or x509. The information to be specified is consistent with the entered subject information in [Generate client keys and certificates](#generate-client-key-and-certificate). To get this option, execute the following command: @@ -290,7 +290,7 @@ The user certificate information can be specified by `require subject`, `require + `require san`: Specifies the `Subject Alternative Name` information of the CA certificate that issues the user certificate. The information to be specified is consistent with the [`alt_names` of the `openssl.cnf` configuration file](https://docs.pingcap.com/tidb/stable/generate-self-signed-certificates) used to generate the client certificate. - + Execute the following command to get the information of the `require san` item in the generated certificate: + + Execute the following command to get the information of the `REQUIRE SAN` item in the generated certificate: {{< copyable "shell-regular" >}} @@ -298,25 +298,23 @@ The user certificate information can be specified by `require subject`, `require openssl x509 -noout -extensions subjectAltName -in client.crt ``` - + `require san` currently supports the following `Subject Alternative Name` check items: + + `REQUIRE SAN` currently supports the following `Subject Alternative Name` check items: - URI - IP - DNS - + Multiple check items can be configured after they are connected by commas. For example, configure `require san` as follows for the `u1` user: + + Multiple check items can be configured after they are connected by commas. For example, configure `REQUIRE SAN` as follows for the `u1` user: {{< copyable "sql" >}} ```sql - create user 'u1'@'%' require san 'DNS:d1,URI:spiffe://example.org/myservice1,URI:spiffe://example.org/myservice2'; + CREATE USER 'u1'@'%' REQUIRE SAN 'DNS:d1,URI:spiffe://example.org/myservice1,URI:spiffe://example.org/myservice2'; ``` The above configuration only allows the `u1` user to log in to TiDB using the certificate with the URI item `spiffe://example.org/myservice1` or `spiffe://example.org/myservice2` and the DNS item `d1`. -+ `require cipher`: Checks the cipher method supported by the client. Use the following statement to check the list of supported cipher methods: - - {{< copyable "sql" >}} ++ `REQUIRE CIPHER`: Checks the cipher method supported by the client. Use the following statement to check the list of supported cipher methods: ```sql SHOW SESSION STATUS LIKE 'Ssl_cipher_list'; @@ -324,24 +322,16 @@ The user certificate information can be specified by `require subject`, `require ### Configure user certificate information -After getting the user certificate information (`require subject`, `require issuer`, `require san`, `require cipher`), configure these information to be verified when creating a user, granting privileges, or altering a user. Replace `` with the corresponding information in the following statements. +After getting the user certificate information (`REQUIRE SUBJECT`, `REQUIRE ISSUER`, `REQUIRE SAN`, `REQUIRE CIPHER`), configure these information to be verified when creating a user, granting privileges, or altering a user. Replace `` with the corresponding information in the following statements. You can configure one option or multiple options using the space or `and` as the separator. -+ Configure user certificate when creating a user (`create user`): - - {{< copyable "sql" >}} - - ```sql - create user 'u1'@'%' require issuer '' subject '' san '' cipher ''; - ``` - -+ Configure user certificate when granting privileges: ++ Configure user certificate when creating a user (`CREATE USER`): {{< copyable "sql" >}} ```sql - grant all on *.* to 'u1'@'%' require issuer '' subject '' san '' cipher ''; + CREATE USER 'u1'@'%' REQUIRE ISSUER '' SUBJECT '' SAN '' CIPHER ''; ``` + Configure user certificate when altering a user: @@ -349,22 +339,20 @@ You can configure one option or multiple options using the space or `and` as the {{< copyable "sql" >}} ```sql - alter user 'u1'@'%' require issuer '' subject '' san '' cipher ''; + ALTER USER 'u1'@'%' REQUIRE ISSUER '' SUBJECT '' SAN '' CIPHER ''; ``` After the above configuration, the following items will be verified when you log in: + SSL is used; the CA that issues the client certificate is consistent with the CA configured in the server. -+ The `issuer` information of the client certificate matches the information specified in `require issuer`. -+ The `subject` information of the client certificate matches the information specified in `require cipher`. -+ The `Subject Alternative Name` information of the client certificate matches the information specified in `require san`. ++ The `issuer` information of the client certificate matches the information specified in `REQUIRE ISSUER`. ++ The `subject` information of the client certificate matches the information specified in `REQUIRE CIPHER`. ++ The `Subject Alternative Name` information of the client certificate matches the information specified in `REQUIRE SAN`. You can log into TiDB only after all the above items are verified. Otherwise, the `ERROR 1045 (28000): Access denied` error is returned. You can use the following command to check the TLS version, the cipher algorithm and whether the current connection uses the certificate for the login. Connect the MySQL client and execute the following statement: -{{< copyable "sql" >}} - ```sql \s ``` @@ -373,20 +361,18 @@ The output: ``` -------------- -mysql Ver 15.1 Distrib 10.4.10-MariaDB, for Linux (x86_64) using readline 5.1 +mysql Ver 8.3.0 for Linux on x86_64 (MySQL Community Server - GPL) Connection id: 1 Current database: test Current user: root@127.0.0.1 -SSL: Cipher in use is TLS_AES_256_GCM_SHA384 +SSL: Cipher in use is TLS_AES_128_GCM_SHA256 ``` Then execute the following statement: -{{< copyable "sql" >}} - ```sql -show variables like '%ssl%'; +SHOW VARIABLES LIKE '%ssl%'; ``` The output: @@ -395,13 +381,14 @@ The output: +---------------+----------------------------------+ | Variable_name | Value | +---------------+----------------------------------+ -| ssl_cert | /path/to/server-cert.pem | -| ssl_ca | /path/to/ca-cert.pem | -| have_ssl | YES | | have_openssl | YES | +| have_ssl | YES | +| ssl_ca | /path/to/ca-cert.pem | +| ssl_cert | /path/to/server-cert.pem | +| ssl_cipher | | | ssl_key | /path/to/server-key.pem | +---------------+----------------------------------+ -6 rows in set (0.067 sec) +6 rows in set (0.06 sec) ``` ## Update and replace certificate diff --git a/choose-index.md b/choose-index.md index c4d239a2b60e1..6ef39b09e17b0 100644 --- a/choose-index.md +++ b/choose-index.md @@ -404,3 +404,75 @@ mysql> EXPLAIN SELECT /*+ use_index_merge(t3, idx) */ * FROM t3 WHERE ((1 member +-------------------------+----------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 3 rows in set, 2 warnings (0.00 sec) ``` + +### Multi-valued indexes and plan cache + +A query plan that uses `member of` to choose multi-valued indexes can be cached. A query plan that uses the `JSON_CONTAINS()` or `JSON_OVERLAPS()` function to choose multi-valued indexes cannot be cached. + +The following are some examples that query plans can be cached: + +```sql +mysql> CREATE TABLE t5 (j1 JSON, j2 JSON, INDEX idx1((CAST(j1 AS SIGNED ARRAY)))); +Query OK, 0 rows affected (0.04 sec) + +mysql> PREPARE st FROM 'SELECT /*+ use_index(t5, idx1) */ * FROM t5 WHERE (? member of (j1))'; +Query OK, 0 rows affected (0.00 sec) + +mysql> SET @a=1; +Query OK, 0 rows affected (0.00 sec) + +mysql> EXECUTE st USING @a; +Empty set (0.01 sec) + +mysql> EXECUTE st USING @a; +Empty set (0.00 sec) + +mysql> SELECT @@last_plan_from_cache; ++------------------------+ +| @@last_plan_from_cache | ++------------------------+ +| 1 | ++------------------------+ +1 row in set (0.00 sec) + +mysql> PREPARE st FROM 'SELECT /*+ use_index(t5, idx1) */ * FROM t5 WHERE (? member of (j1)) AND JSON_CONTAINS(j2, ?)'; +Query OK, 0 rows affected (0.00 sec) + +mysql> SET @a=1, @b='[1,2]'; +Query OK, 0 rows affected (0.00 sec) + +mysql> EXECUTE st USING @a, @b; +Empty set (0.00 sec) + +mysql> EXECUTE st USING @a, @b; +Empty set (0.00 sec) + +mysql> SELECT @@LAST_PLAN_FROM_CACHE; -- can hit plan cache if the JSON_CONTAINS doesn't impact index selection ++------------------------+ +| @@LAST_PLAN_FROM_CACHE | ++------------------------+ +| 1 | ++------------------------+ +1 row in set (0.00 sec) +``` + +The following are some examples that query plans cannot be cached: + +```sql +mysql> PREPARE st2 FROM 'SELECT /*+ use_index(t5, idx1) */ * FROM t5 WHERE JSON_CONTAINS(j1, ?)'; +Query OK, 0 rows affected (0.00 sec) + +mysql> SET @a='[1,2]'; +Query OK, 0 rows affected (0.01 sec) + +mysql> EXECUTE st2 USING @a; +Empty set, 1 warning (0.00 sec) + +mysql> SHOW WARNINGS; -- cannot hit plan cache since the JSON_CONTAINS predicate might affect index selection ++---------+------+-------------------------------------------------------------------------------------------------------+ +| Level | Code | Message | ++---------+------+-------------------------------------------------------------------------------------------------------+ +| Warning | 1105 | skip prepared plan-cache: json_contains function with immutable parameters can affect index selection | ++---------+------+-------------------------------------------------------------------------------------------------------+ +1 row in set (0.01 sec) +``` \ No newline at end of file diff --git a/command-line-flags-for-tidb-configuration.md b/command-line-flags-for-tidb-configuration.md index e4b5b936f2e7e..1981b1c94f778 100644 --- a/command-line-flags-for-tidb-configuration.md +++ b/command-line-flags-for-tidb-configuration.md @@ -79,6 +79,12 @@ When you start the TiDB cluster, you can use command-line options or environment - Default: `""` - If this option is not set, logs are output to "stderr". If this option is set, logs are output to the corresponding file. +## `--log-general` + ++ The filename of the [General Log](/system-variables.md#tidb_general_log) ++ Default: `""` ++ If this option is not set, the general log is written to the file specified by [`--log-file`](#--log-file) by default. + ## `--log-slow-query` - The directory for the slow query log diff --git a/configure-memory-usage.md b/configure-memory-usage.md index ebc11d7382155..de90b3dbde9d9 100644 --- a/configure-memory-usage.md +++ b/configure-memory-usage.md @@ -130,6 +130,10 @@ The following example constructs a memory-intensive SQL statement that triggers 5. By checking the directory of status files (In the preceding example, the directory is `/tiup/deploy/tidb-4000/log/oom_record`), you can see a record directory with the corresponding timestamp (for example, `record2022-10-09T17:18:38+08:00`). The record directory includes three files: `goroutinue`, `heap`, and `running_sql`. These three files are suffixed with the time when status files are logged. They respectively record goroutine stack information, the usage status of heap memory, and the running SQL information when the alarm is triggered. For the content in `running_sql`, refer to [`expensive-queries`](/identify-expensive-queries.md). +## Reduce the memory usage for write transactions in tidb-server + +The transaction model used by TiDB requires that all write operations of transactions are first cached in memory before being committed. When TiDB writes large transactions, memory usage might increase and become a bottleneck. To reduce or avoid high memory usage by large transactions under various constraints, you can adjust the [`tidb_dml_type`](/system-variables.md#tidb_dml_type-new-in-v800) system variable to `"bulk"` or use [Non-transactional DML statements](/non-transactional-dml.md). + ## Other memory control behaviors of tidb-server ### Flow control @@ -145,8 +149,8 @@ TiDB supports disk spill for execution operators. When the memory usage of a SQL - The disk spill behavior is jointly controlled by the following parameters: [`tidb_mem_quota_query`](/system-variables.md#tidb_mem_quota_query), [`tidb_enable_tmp_storage_on_oom`](/system-variables.md#tidb_enable_tmp_storage_on_oom), [`tmp-storage-path`](/tidb-configuration-file.md#tmp-storage-path), and [`tmp-storage-quota`](/tidb-configuration-file.md#tmp-storage-quota). - When the disk spill is triggered, TiDB outputs a log containing the keywords `memory exceeds quota, spill to disk now` or `memory exceeds quota, set aggregate mode to spill-mode`. -- Disk spill for the Sort, MergeJoin, and HashJoin operator is introduced in v4.0.0; disk spill for the HashAgg operator is introduced in v5.2.0. -- When the SQL executions containing Sort, MergeJoin, or HashJoin cause OOM, TiDB triggers disk spill by default. When SQL executions containing HashAgg cause OOM, TiDB does not trigger disk spill by default. You can configure the system variable `tidb_executor_concurrency = 1` to trigger disk spill for HashAgg. +- Disk spill for the Sort, MergeJoin, and HashJoin operators is introduced in v4.0.0; disk spill for the non-concurrent algorithm of the HashAgg operator is introduced in v5.2.0; disk spill for the concurrent algorithm of the HashAgg operator is introduced in v8.0.0. +- When the SQL executions containing Sort, MergeJoin, HashJoin, or HashAgg cause OOM, TiDB triggers disk spill by default. > **Note:** > @@ -178,15 +182,7 @@ The following example uses a memory-consuming SQL statement to demonstrate the d ERROR 1105 (HY000): Out Of Memory Quota![conn_id=3] ``` -4. Configure the system variable `tidb_executor_concurrency` to 1. With this configuration, when out of memory, HashAgg automatically tries to trigger disk spill. - - {{< copyable "sql" >}} - - ```sql - SET tidb_executor_concurrency = 1; - ``` - -5. Execute the same SQL statement. You can find that this time, the statement is successfully executed and no error message is returned. From the following detailed execution plan, you can see that HashAgg has used 600 MB of hard disk space. +4. Execute the same SQL statement. You can find that this time, the statement is successfully executed and no error message is returned. From the following detailed execution plan, you can see that HashAgg has used 600 MB of hard disk space. {{< copyable "sql" >}} diff --git a/data-type-default-values.md b/data-type-default-values.md index be1e547958f85..7ac7fa12a2188 100644 --- a/data-type-default-values.md +++ b/data-type-default-values.md @@ -8,7 +8,7 @@ aliases: ['/docs/dev/data-type-default-values/','/docs/dev/reference/sql/data-ty The `DEFAULT` value clause in a data type specification indicates a default value for a column. The default value must be a constant and cannot be a function or an expression. But for the time type, you can specify the `NOW`, `CURRENT_TIMESTAMP`, `LOCALTIME`, and `LOCALTIMESTAMP` functions as the default for `TIMESTAMP` and `DATETIME` columns. -The `BLOB`, `TEXT`, and `JSON` columns __cannot__ be assigned a default value. +Starting from v8.0.0, TiDB supports [specifying expressions as default values](#specify-expressions-as-default-values) for [`BLOB`](/data-type-string.md#blob-type), [`TEXT`](/data-type-string.md#text-type), and [`JSON`](/data-type-json.md#json-type) data types. If a column definition includes no explicit `DEFAULT` value, TiDB determines the default value as follows: @@ -25,3 +25,42 @@ Implicit defaults are defined as follows: - For numeric types, the default is 0. If declared with the `AUTO_INCREMENT` attribute, the default is the next value in the sequence. - For date and time types other than `TIMESTAMP`, the default is the appropriate "zero" value for the type. For `TIMESTAMP`, the default value is the current date and time. - For string types other than `ENUM`, the default value is the empty string. For `ENUM`, the default is the first enumeration value. + +## Specify expressions as default values + +> **Warning:** +> +> Currently, this feature is experimental. It is not recommended that you use it in production environments. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. + +Starting from 8.0.13, MySQL supports specifying expressions as default values in the `DEFAULT` clause. For more information, see [Explicit default handling as of MySQL 8.0.13](https://dev.mysql.com/doc/refman/8.0/en/data-type-defaults.html#data-type-defaults-explicit). + +TiDB has implemented this feature and supports specifying some expressions as default values in the `DEFAULT` clause. Starting from v8.0.0, TiDB supports assigning default values to `BLOB`, `TEXT`, and `JSON` data types. However, these default values can only be set as expressions. The following is an example of `BLOB`: + +```sql +CREATE TABLE t2 (b BLOB DEFAULT (RAND())); +``` + +TiDB currently supports the following expressions: + +* [`RAND()`](/functions-and-operators/numeric-functions-and-operators.md) +* [`UUID()`](/functions-and-operators/miscellaneous-functions.md) +* [`UUID_TO_BIN()`](/functions-and-operators/miscellaneous-functions.md) + +Starting from TiDB v8.0.0, the `DEFAULT` clause supports using the following expressions to set default values. + +* `UPPER(SUBSTRING_INDEX(USER(), '@', 1))` + +* `REPLACE(UPPER(UUID()), '-', '')` + +* The `DATE_FORMAT` supports the following formats: + + * `DATE_FORMAT(NOW(), '%Y-%m')` + * `DATE_FORMAT(NOW(), '%Y-%m-%d')` + * `DATE_FORMAT(NOW(), '%Y-%m-%d %H.%i.%s')` + * `DATE_FORMAT(NOW(), '%Y-%m-%d %H:%i:%s')` + +* `STR_TO_DATE('1980-01-01', '%Y-%m-%d')` + +> **Note:** +> +> Currently, the `ADD COLUMN` statement does not support using expressions as default values. diff --git a/ddl-v2.md b/ddl-v2.md deleted file mode 100644 index 59db024d00c94..0000000000000 --- a/ddl-v2.md +++ /dev/null @@ -1,56 +0,0 @@ ---- -title: Use TiDB DDL V2 to Accelerate Table Creation -summary: Learn the concept, principles, and implementation details of TiDB DDL V2 for acceleration table creation. ---- - -# Use TiDB DDL V2 to Accelerate Table Creation - -Starting from v7.6.0, the new version V2 of TiDB DDL supports creating tables quickly, which improves the efficiency of bulk table creation. - -TiDB uses the online asynchronous schema change algorithm to change the metadata. All DDL jobs are submitted to the `mysql.tidb_ddl_job` table, and the owner node pulls the DDL job to execute. After executing each phase of the online DDL algorithm, the DDL job is marked as completed and moved to the `mysql.tidb_ddl_history` table. Therefore, DDL statements can only be executed on the owner node and cannot be linearly extended. - -However, for some DDL statements, it is not necessary to strictly follow the online DDL algorithm. For example, the `CREATE TABLE` statement only has two states for the job: `none` and `public`. Therefore, TiDB can simplify the execution process of DDL, and executes the `CREATE TABLE` statement on a non-owner node to accelerate table creation. - -> **Warning:** -> -> This feature is currently an experimental feature and it is not recommended to use in a production environment. This feature might change or be removed without prior notice. If you find a bug, please give feedback by raising an [issue](https://github.com/pingcap/tidb/issues) on GitHub. - -## Compatibility with TiDB tools - -- [TiCDC](https://docs.pingcap.com/tidb/stable/ticdc-overview) does not support replicating the tables that are created by TiDB DDL V2. - -## Limitation - -You can now use TiDB DDL V2 only in the [`CREATE TABLE`](/sql-statements/sql-statement-create-table.md) statement, and this statement must not include any foreign key constraints. - -## Use TiDB DDL V2 - -You can enable or disable TiDB DDL V2 by specifying the value of the system variable [`tidb_ddl_version`](/system-variables.md#tidb_ddl_version-new-in-v760) . - -To enable TiDB DDL V2, set the value of this variable to `2`: - -```sql -SET GLOBAL tidb_ddl_version = 2; -``` - -To disable TiDB DDL V2, set the value of this variable to `1`: - -```sql -SET GLOBAL tidb_ddl_version = 1; -``` - -## Implementation principle - -The detailed implementation principle of TiDB DDL V2 for accelerating table creation is as follows: - -1. Create a `CREATE TABLE` Job. - - This step is the same as that of the V1 implementation. The corresponding DDL Job is generated by parsing the `CREATE TABLE` statement. - -2. Execute the `CREATE TABLE` job. - - Different from V1, in TiDB DDL V2, the TiDB node that receives the `CREATE TABLE` statement executes it directly, and then persists the table structure to TiKV. At the same time, the `CREATE TABLE` job is marked as completed and inserted into the `mysql.tidb_ddl_history` table. - -3. Synchronize the table information. - - TiDB notifies other nodes to synchronize the newly created table structure. diff --git a/develop/dev-guide-gui-mysql-workbench.md b/develop/dev-guide-gui-mysql-workbench.md index 29e7daaadc374..fb9b3ac832766 100644 --- a/develop/dev-guide-gui-mysql-workbench.md +++ b/develop/dev-guide-gui-mysql-workbench.md @@ -149,6 +149,19 @@ Connect to your TiDB cluster depending on the TiDB deployment option you have se +## FAQs + +### How to handle the connection timeout error "Error Code: 2013. Lost connection to MySQL server during query"? + +This error indicates that the query execution time exceeds the timeout limit. To resolve this issue, you can adjust the timeout settings by the following steps: + +1. Launch MySQL Workbench and navigate to the **Workbench Preferences** page. +2. In the **SQL Editor** > **MySQL Session** section, configure the **DBMS connection read timeout interval (in seconds)** option. This sets the maximum amount of time (in seconds) that a query can take before MySQL Workbench disconnects from the server. + + ![MySQL Workbench: adjust timeout option in SQL Editor settings](/media/develop/mysql-workbench-adjust-sqleditor-read-timeout.jpg) + +For more information, see [MySQL Workbench frequently asked questions](https://dev.mysql.com/doc/workbench/en/workbench-faq.html). + ## Next steps - Learn more usage of MySQL Workbench from [the documentation of MySQL Workbench](https://dev.mysql.com/doc/workbench/en/). diff --git a/develop/dev-guide-sample-application-nodejs-prisma.md b/develop/dev-guide-sample-application-nodejs-prisma.md index d594cc21296fc..e3fe11a0416f7 100644 --- a/develop/dev-guide-sample-application-nodejs-prisma.md +++ b/develop/dev-guide-sample-application-nodejs-prisma.md @@ -101,7 +101,7 @@ Connect to your TiDB cluster depending on the TiDB deployment option you've sele 6. Edit the `.env` file, set up the environment variable `DATABASE_URL` as follows, and replace the corresponding placeholders `{}` with the connection string in the connection dialog: ```dotenv - DATABASE_URL={connection_string} + DATABASE_URL='{connection_string}' ``` > **Note** @@ -138,7 +138,7 @@ Connect to your TiDB cluster depending on the TiDB deployment option you've sele 5. Edit the `.env` file, set up the environment variable `DATABASE_URL` as follows, replace the corresponding placeholders `{}` with connection parameters on the connection dialog: ```dotenv - DATABASE_URL=mysql://{user}:{password}@{host}:4000/test?sslaccept=strict&sslcert={downloaded_ssl_ca_path} + DATABASE_URL='mysql://{user}:{password}@{host}:4000/test?sslaccept=strict&sslcert={downloaded_ssl_ca_path}' ``` > **Note** @@ -167,7 +167,7 @@ Connect to your TiDB cluster depending on the TiDB deployment option you've sele 2. Edit the `.env` file, set up the environment variable `DATABASE_URL` as follows, replace the corresponding placeholders `{}` with connection parameters of your TiDB cluster: ```dotenv - DATABASE_URL=mysql://{user}:{password}@{host}:4000/test + DATABASE_URL='mysql://{user}:{password}@{host}:4000/test' ``` If you are running TiDB locally, the default host address is `127.0.0.1`, and the password is empty. diff --git a/develop/dev-guide-sample-application-ruby-rails.md b/develop/dev-guide-sample-application-ruby-rails.md index 83732dbaa02e2..4d8056af2772e 100644 --- a/develop/dev-guide-sample-application-ruby-rails.md +++ b/develop/dev-guide-sample-application-ruby-rails.md @@ -97,7 +97,7 @@ Connect to your TiDB cluster depending on the TiDB deployment option you've sele 6. Edit the `.env` file, set up the `DATABASE_URL` environment variable as follows, and copy the connection string from the connection dialog as the variable value. ```dotenv - DATABASE_URL=mysql2://{user}:{password}@{host}:{port}/{database_name}?ssl_mode=verify_identity + DATABASE_URL='mysql2://{user}:{password}@{host}:{port}/{database_name}?ssl_mode=verify_identity' ``` > **Note** @@ -126,7 +126,7 @@ Connect to your TiDB cluster depending on the TiDB deployment option you've sele 5. Edit the `.env` file, set up the `DATABASE_URL` environment variable as follows, copy the connection string from the connection dialog as the variable value, and set the `sslca` query parameter to the file path of the CA certificate downloaded from the connection dialog: ```dotenv - DATABASE_URL=mysql2://{user}:{password}@{host}:{port}/{database}?ssl_mode=verify_identity&sslca=/path/to/ca.pem + DATABASE_URL='mysql2://{user}:{password}@{host}:{port}/{database}?ssl_mode=verify_identity&sslca=/path/to/ca.pem' ``` > **Note** @@ -149,7 +149,7 @@ Connect to your TiDB cluster depending on the TiDB deployment option you've sele 2. Edit the `.env` file, set up the `DATABASE_URL` environment variable as follows, and replace the `{user}`, `{password}`, `{host}`, `{port}`, and `{database}` with your own TiDB connection information: ```dotenv - DATABASE_URL=mysql2://{user}:{password}@{host}:{port}/{database} + DATABASE_URL='mysql2://{user}:{password}@{host}:{port}/{database}' ``` If you are running TiDB locally, the default host address is `127.0.0.1`, and the password is empty. diff --git a/dm/deploy-a-dm-cluster-using-tiup-offline.md b/dm/deploy-a-dm-cluster-using-tiup-offline.md index 69c70e862bb88..5930e27ca9eea 100644 --- a/dm/deploy-a-dm-cluster-using-tiup-offline.md +++ b/dm/deploy-a-dm-cluster-using-tiup-offline.md @@ -127,7 +127,7 @@ alertmanager_servers: > > - Use `.` to indicate the subcategory of the configuration, such as `log.slow-threshold`. For more formats, see [TiUP configuration template](https://github.com/pingcap/tiup/blob/master/embed/examples/dm/topology.example.yaml). > -> - For more parameter description, see [master `config.toml.example`](https://github.com/pingcap/dm/blob/master/dm/master/dm-master.toml) and [worker `config.toml.example`](https://github.com/pingcap/dm/blob/master/dm/worker/dm-worker.toml). +> - For more parameter description, see [master `config.toml.example`](https://github.com/pingcap/tiflow/blob/master/dm/master/dm-master.toml) and [worker `config.toml.example`](https://github.com/pingcap/tiflow/blob/master/dm/worker/dm-worker.toml). > > - Make sure that the ports among the following components are interconnected: > - The `peer_port` (`8291` by default) among the DM-master nodes are interconnected. diff --git a/dm/deploy-a-dm-cluster-using-tiup.md b/dm/deploy-a-dm-cluster-using-tiup.md index 2d9e061141c6b..26b6cdb054b24 100644 --- a/dm/deploy-a-dm-cluster-using-tiup.md +++ b/dm/deploy-a-dm-cluster-using-tiup.md @@ -144,7 +144,7 @@ alertmanager_servers: > - The TiUP nodes can connect to the `port` of all DM-master nodes (`8261` by default). > - The TiUP nodes can connect to the `port` of all DM-worker nodes (`8262` by default). -For more `master_servers.host.config` parameter description, refer to [master parameter](https://github.com/pingcap/dm/blob/master/dm/master/dm-master.toml). For more `worker_servers.host.config` parameter description, refer to [worker parameter](https://github.com/pingcap/dm/blob/master/dm/worker/dm-worker.toml). +For more `master_servers.host.config` parameter description, refer to [master parameter](https://github.com/pingcap/tiflow/blob/master/dm/master/dm-master.toml). For more `worker_servers.host.config` parameter description, refer to [worker parameter](https://github.com/pingcap/tiflow/blob/master/dm/worker/dm-worker.toml). ## Step 3: Execute the deployment command diff --git a/dm/dm-command-line-flags.md b/dm/dm-command-line-flags.md index e9712359c3c35..33759c9305483 100644 --- a/dm/dm-command-line-flags.md +++ b/dm/dm-command-line-flags.md @@ -75,6 +75,12 @@ This document introduces DM's command-line flags. - The default value is `"http://127.0.0.1:8291"` - Required flag +### `--secret-key-path` + +- The path of the customized secret key for encryption and decryption +- The default value is `""` +- Optional flag + ## DM-worker ### `--advertise-addr` @@ -132,15 +138,3 @@ This document introduces DM's command-line flags. - The `{advertise-addr}` of any DM-master node in the cluster to be connected by dmctl - The default value is `""` - It is a required flag when dmctl interacts with DM-master - -### `--encrypt` - -- Encrypts the plaintext database password into ciphertext -- The default value is `""` -- When this flag is specified, it is only used to encrypt the plaintext without interacting with the DM-master - -### `--decrypt` - -- Decrypts ciphertext encrypted with dmctl into plaintext -- The default value is `""` -- When this flag is specified, it is only used to decrypt the ciphertext without interacting with the DM-master diff --git a/dm/dm-customized-secret-key.md b/dm/dm-customized-secret-key.md new file mode 100644 index 0000000000000..af71bb0bdf4bc --- /dev/null +++ b/dm/dm-customized-secret-key.md @@ -0,0 +1,36 @@ +--- +title: Customize a Secret Key for DM Encryption and Decryption +summary: Learn how to customize a secret key to encrypt and decrypt passwords used in the DM(Data Migration)data source and migration task configurations. +--- + +# Customize a Secret Key for DM Encryption and Decryption + +Before v8.0.0, [DM](/dm/dm-overview.md) uses a [fixed AES-256 secret key](https://github.com/pingcap/tiflow/blob/1252979421fc83ffa2a1548d981e505f7fc0b909/dm/pkg/encrypt/encrypt.go#L27) to encrypt and decrypt passwords in the data source and migration task configurations. However, using a fixed secret key might pose security risks, especially in environments where security is crucial. To enhance security, starting from v8.0.0, DM removes the fixed secret key and enables you to customize a secret key. + +## Usage + +1. Create a custom key file, which must contain a 64-character hexadecimal AES-256 secret key. +2. In the DM-master [command-line flags](/dm/dm-command-line-flags.md) or [configuration file](/dm/dm-master-configuration-file.md), specify `secret-key-path` as the path of your custom key file. + +## Upgrade from a version earlier than v8.0.0 + +Because DM no longer uses the fixed secret key starting from v8.0.0, pay attention to the following when upgrading DM from versions earlier than v8.0.0: + +- If plaintext passwords are used in both [data source configurations](/dm/dm-source-configuration-file.md) and [migration task configurations](/dm/task-configuration-file-full.md), no additional steps are required for the upgrade. +- If encrypted passwords are used in [data source configurations](/dm/dm-source-configuration-file.md) and [migration task configurations](/dm/task-configuration-file-full.md) or if you want to use encrypted passwords in the future, you need to do the following: + 1. Add the `secret-key-path` parameter to the [DM-master configuration file](/dm/dm-master-configuration-file.md) and specify it as the path of your custom key file. The file must contain a 64-character hexadecimal AES-256 key. If the [fixed AES-256 secret key](https://github.com/pingcap/tiflow/blob/1252979421fc83ffa2a1548d981e505f7fc0b909/dm/pkg/encrypt/encrypt.go#L27) was used for encryption before upgrading, you can copy this secret key to your key file. Make sure all DM-master nodes use the same secret key configuration. + 2. Perform a rolling upgrade of DM-master first, followed by a rolling upgrade of DM-worker. For more information, see [Rolling upgrade](/dm/maintain-dm-using-tiup.md#rolling-upgrade). + +## Update the secret key for encryption and decryption + +To update the secret key used for encryption and decryption, take the following steps: + +1. Update `secret-key-path` in the [DM-master configuration file](/dm/dm-master-configuration-file.md). + + > **Note:** + > + > - Make sure all DM-master nodes are updated to the same secret key configuration. + > - During the secret key update, do not create new [data source configuration files](/dm/dm-source-configuration-file.md) or [migration task configuration files](/dm/task-configuration-file-full.md). + +2. Perform a rolling restart of DM-master. +3. Use the passwords encrypted with `tiup dmctl encrypt` (dmctl version >= v8.0.0) when you create new [data source configuration files](/dm/dm-source-configuration-file.md) and [migration task configuration files](/dm/task-configuration-file-full.md). \ No newline at end of file diff --git a/dm/dm-error-handling.md b/dm/dm-error-handling.md index 6069d8a255233..2ee6a5045c587 100644 --- a/dm/dm-error-handling.md +++ b/dm/dm-error-handling.md @@ -70,7 +70,7 @@ In the error system, usually, the information of a specific error is as follows: Whether DM outputs the error stack information depends on the error severity and the necessity. The error stack records the complete stack call information when the error occurs. If you cannot figure out the error cause based on the basic information and the error message, you can trace the execution path of the code when the error occurs using the error stack. -For the complete list of error codes, refer to the [error code lists](https://github.com/pingcap/dm/blob/master/_utils/terror_gen/errors_release.txt). +For the complete list of error codes, refer to the [error code lists](https://github.com/pingcap/tiflow/blob/master/dm/_utils/terror_gen/errors_release.txt). ## Troubleshooting diff --git a/dm/dm-export-import-config.md b/dm/dm-export-import-config.md index a67d4095a8345..50d6f609f8b80 100644 --- a/dm/dm-export-import-config.md +++ b/dm/dm-export-import-config.md @@ -9,7 +9,7 @@ summary: Learn how to export and import data sources and task configuration of c > **Note:** > -> For clusters earlier than v2.0.5, you can use dmctl v2.0.5 or later to export and import the data source and task configuration files. +> For clusters earlier than v2.0.5, you can use dmctl (>= v2.0.5 and < v8.0.0) to export and import the data source and task configuration files. {{< copyable "" >}} diff --git a/dm/dm-faq.md b/dm/dm-faq.md index 12e5821c848c0..c6075787a505d 100644 --- a/dm/dm-faq.md +++ b/dm/dm-faq.md @@ -25,7 +25,7 @@ Currently, DM does not support it and only supports the regular expressions of t ## If a statement executed upstream contains multiple DDL operations, does DM support such migration? -DM will attempt to split a single statement containing multiple DDL change operations into multiple statements containing only one DDL operation, but might not cover all cases. It is recommended to include only one DDL operation in a statement executed upstream, or verify it in the test environment. If it is not supported, you can file an [issue](https://github.com/pingcap/dm/issues) to the DM repository. +DM will attempt to split a single statement containing multiple DDL change operations into multiple statements containing only one DDL operation, but might not cover all cases. It is recommended to include only one DDL operation in a statement executed upstream, or verify it in the test environment. If it is not supported, you can file an [issue](https://github.com/pingcap/tiflow/issues) to the `pingcap/tiflow` repository. ## How to handle incompatible DDL statements? @@ -55,7 +55,7 @@ When an exception occurs during data migration and the data migration task canno ## How to handle the error returned by the DDL operation related to the gh-ost table, after `online-ddl: true` is set? ``` -[unit=Sync] ["error information"="{\"msg\":\"[code=36046:class=sync-unit:scope=internal:level=high] online ddls on ghost table `xxx`.`_xxxx_gho`\\ngithub.com/pingcap/dm/pkg/terror.(*Error).Generate ...... +[unit=Sync] ["error information"="{\"msg\":\"[code=36046:class=sync-unit:scope=internal:level=high] online ddls on ghost table `xxx`.`_xxxx_gho`\\ngithub.com/pingcap/tiflow/pkg/terror.(*Error).Generate ...... ``` The above error can be caused by the following reason: diff --git a/dm/dm-manage-source.md b/dm/dm-manage-source.md index 511df3caa1986..c84223c314c61 100644 --- a/dm/dm-manage-source.md +++ b/dm/dm-manage-source.md @@ -11,10 +11,14 @@ This document introduces how to manage data source configurations, including enc In DM configuration files, it is recommended to use the password encrypted with dmctl. For one original password, the encrypted password is different after each encryption. +> **Note:** +> +> Starting from v8.0.0, you must configure [`secret-key-path`](/dm/dm-master-configuration-file.md) for DM-master before using the `dmctl encrypt` command. + {{< copyable "shell-regular" >}} ```bash -./dmctl -encrypt 'abc!@#123' +./dmctl encrypt 'abc!@#123' ``` ``` diff --git a/dm/dm-master-configuration-file.md b/dm/dm-master-configuration-file.md index d32617c5b591f..ec58677c0ab18 100644 --- a/dm/dm-master-configuration-file.md +++ b/dm/dm-master-configuration-file.md @@ -34,7 +34,9 @@ join = "" ssl-ca = "/path/to/ca.pem" ssl-cert = "/path/to/cert.pem" ssl-key = "/path/to/key.pem" -cert-allowed-cn = ["dm"] +cert-allowed-cn = ["dm"] + +secret-key-path = "/path/to/secret/key" ``` ## Configuration parameters @@ -58,3 +60,4 @@ This section introduces the configuration parameters of DM-master. | `ssl-cert` | The path of the file that contains X509 certificate in PEM format for DM-master to connect with other components. | | `ssl-key` | The path of the file that contains X509 key in PEM format for DM-master to connect with other components. | | `cert-allowed-cn` | Common Name list. | +| `secret-key-path` | The file path of the secret key, which is used to encrypt and decrypt upstream and downstream passwords. The file must contain a 64-character hexadecimal AES-256 secret key. | \ No newline at end of file diff --git a/dm/dmctl-introduction.md b/dm/dmctl-introduction.md index a14f4598b1d76..b3d479de9957d 100644 --- a/dm/dmctl-introduction.md +++ b/dm/dmctl-introduction.md @@ -45,7 +45,6 @@ Available Commands: binlog-schema manage or show table schema in schema tracker check-task Checks the configuration file of the task config manage config operations - decrypt Decrypts cipher text to plain text encrypt Encrypts plain text to cipher text help Gets help about any command list-member Lists member information @@ -98,7 +97,6 @@ Available Commands: binlog-schema manage or show table schema in schema tracker check-task Checks the configuration file of the task config manage config operations - decrypt Decrypts cipher text to plain text encrypt Encrypts plain text to cipher text help Gets help about any command list-member Lists member information diff --git a/dm/maintain-dm-using-tiup.md b/dm/maintain-dm-using-tiup.md index 49d50316ae334..5707edc917d46 100644 --- a/dm/maintain-dm-using-tiup.md +++ b/dm/maintain-dm-using-tiup.md @@ -183,7 +183,7 @@ For example, to scale out a DM-worker node in the `prod-cluster` cluster, take t > > Before upgrading, you can use `config export` to export the configuration files of clusters. After upgrading, if you need to downgrade to an earlier version, you can first redeploy the earlier cluster and then use `config import` to import the previous configuration files. > -> For clusters earlier than v2.0.5, you can use dmctl v2.0.5 or later to export and import the data source and task configuration files. +> For clusters earlier than v2.0.5, you can use dmctl (>= v2.0.5 and < v8.0.0) to export and import the data source and task configuration files. > > For clusters later than v2.0.2, currently, it is not supported to automatically import the configuration related to relay worker. You can use `start-relay` command to manually [start relay log](/dm/relay-log.md#enable-and-disable-relay-log). @@ -193,6 +193,10 @@ The rolling upgrade process is made as transparent as possible to the applicatio You can run the `tiup dm upgrade` command to upgrade a DM cluster. For example, the following command upgrades the cluster to `${version}`. Modify `${version}` to your needed version before running this command: +> **Note:** +> +> Starting from v8.0.0, DM removes the fixed secret key for encryption and decryption and enables you to customize a secret key for encryption and decryption. If encrypted passwords are used in [data source configurations](/dm/dm-source-configuration-file.md) and [migration task configurations](/dm/task-configuration-file-full.md) before the upgrade, you need to refer to the upgrade steps in [Customize a Secret Key for DM Encryption and Decryption](/dm/dm-customized-secret-key.md) for additional operations. + {{< copyable "shell-regular" >}} ```bash diff --git a/dm/quick-start-create-source.md b/dm/quick-start-create-source.md index 23886f4eaf8b9..a3af12981f8a6 100644 --- a/dm/quick-start-create-source.md +++ b/dm/quick-start-create-source.md @@ -19,6 +19,8 @@ A data source contains the information for accessing the upstream migration task In DM configuration files, it is recommended to use the password encrypted with dmctl. You can follow the example below to obtain the encrypted password of the data source, which can be used to write the configuration file later. + Starting from v8.0.0, you must configure [`secret-key-path`](/dm/dm-master-configuration-file.md) for DM-master before using the `tiup dmctl encrypt` command. + {{< copyable "shell-regular" >}} ```bash diff --git a/dm/quick-start-create-task.md b/dm/quick-start-create-task.md index 42828fbf32a9b..e4888d3dab432 100644 --- a/dm/quick-start-create-task.md +++ b/dm/quick-start-create-task.md @@ -97,6 +97,10 @@ Before starting a data migration task, you need to configure the MySQL data sour For safety reasons, it is recommended to configure and use encrypted passwords. You can use dmctl to encrypt the MySQL/TiDB password. Suppose the password is "123456": +> **Note:** +> +> Starting from v8.0.0, you must configure [`secret-key-path`](/dm/dm-master-configuration-file.md) for DM-master before using the `dmctl encrypt` command. + {{< copyable "shell-regular" >}} ```bash diff --git a/dm/quick-start-with-dm.md b/dm/quick-start-with-dm.md index 75231181e152d..3386f01ffa96f 100644 --- a/dm/quick-start-with-dm.md +++ b/dm/quick-start-with-dm.md @@ -6,7 +6,7 @@ aliases: ['/docs/tidb-data-migration/dev/get-started/'] # Quick Start Guide for TiDB Data Migration -This document describes how to migrate data from MySQL to TiDB using [TiDB Data Migration](https://github.com/pingcap/dm) (DM). This guide is a quick demo of DM features and is not recommended for any production environment. +This document describes how to migrate data from MySQL to TiDB using [TiDB Data Migration (DM)](/dm/dm-overview.md). This guide is a quick demo of DM features and is not recommended for any production environment. ## Step 1: Deploy a DM cluster @@ -48,7 +48,7 @@ You can use one or multiple MySQL instances as an upstream data source. from: host: "127.0.0.1" user: "root" - password: "fCxfQ9XKCezSzuCD0Wf5dUD+LsKegSg=" # encrypt with `tiup dmctl --encrypt "123456"` + password: "fCxfQ9XKCezSzuCD0Wf5dUD+LsKegSg=" port: 3306 ``` diff --git a/dm/shard-merge-best-practices.md b/dm/shard-merge-best-practices.md index 507f276605019..637f97716cc06 100644 --- a/dm/shard-merge-best-practices.md +++ b/dm/shard-merge-best-practices.md @@ -6,7 +6,7 @@ aliases: ['/docs/tidb-data-migration/dev/shard-merge-best-practices/'] # Best Practices of Data Migration in the Shard Merge Scenario -This document describes the features and limitations of [TiDB Data Migration](https://github.com/pingcap/dm) (DM) in the shard merge scenario and provides a data migration best practice guide for your application (the default "pessimistic" mode is used). +This document describes the features and limitations of [TiDB Data Migration (DM)](/dm/dm-overview.md) in the shard merge scenario and provides a data migration best practice guide for your application (the default "pessimistic" mode is used). ## Use a separate data migration task diff --git a/dumpling-overview.md b/dumpling-overview.md index 61944d0c25edb..df2d02a47f7d8 100644 --- a/dumpling-overview.md +++ b/dumpling-overview.md @@ -404,6 +404,7 @@ SET GLOBAL tidb_gc_life_time = '10m'; | `--csv-separator` | Separator of each value in CSV files. It is not recommended to use the default ','. It is recommended to use '\|+\|' or other uncommon character combinations| ',' | ',' | | `--csv-null-value` | Representation of null values in CSV files | "\\N" | | `--csv-line-terminator` | The terminator at the end of a line for CSV files. When exporting data to a CSV file, you can specify the desired terminator with this option. This option supports "\\r\\n" and "\\n". The default value is "\\r\\n", which is consistent with the earlier versions. Because quotes in bash have different escaping rules, if you want to specify LF (linefeed) as a terminator, you can use a syntax similar to `--csv-line-terminator $'\n'`. | "\\r\\n" | +| `--csv-output-dialect` | Indicates that the source data can be exported to a CSV file in a specific required format for the database. The option value can be `""`, `"snowflake"`, `"redshift"`, or `"bigquery"`. The default value is `""`, which means to encode and export the source data according to UTF-8. If you set the option to `"snowflake"` or `"redshift"`, the binary data type in the source data will be converted to hexadecimal, but the `0x` prefix will be removed. For example, `0x61` will be represented as `61`. If you set the option to `"bigquery"`, the binary data type will be encoded using base64. In some cases, the binary strings might contain garbled characters. | `""` | | `--escape-backslash` | Use backslash (`\`) to escape special characters in the export file | true | | `--output-filename-template` | The filename templates represented in the format of [golang template](https://golang.org/pkg/text/template/#hdr-Arguments)
Support the `{{.DB}}`, `{{.Table}}`, and `{{.Index}}` arguments
The three arguments represent the database name, table name, and chunk ID of the data file | `{{.DB}}.{{.Table}}.{{.Index}}` | | `--status-addr` | Dumpling's service address, including the address for Prometheus to pull metrics and pprof debugging | ":8281" | diff --git a/dynamic-config.md b/dynamic-config.md index d006a759ae8e6..cff4e08482ead 100644 --- a/dynamic-config.md +++ b/dynamic-config.md @@ -213,6 +213,9 @@ The following TiKV configuration items can be modified dynamically: | `{db-name}.{cf-name}.soft-pending-compaction-bytes-limit` | The soft limit on the pending compaction bytes | | `{db-name}.{cf-name}.hard-pending-compaction-bytes-limit` | The hard limit on the pending compaction bytes | | `{db-name}.{cf-name}.titan.blob-run-mode` | The mode of processing blob files | +| `{db-name}.{cf-name}.titan.min-blob-size` | The threshold at which data is stored in Titan. Data is stored in a Titan blob file when its value reaches this threshold. | +| `{db-name}.{cf-name}.titan.blob-file-compression` | The compression algorithm used by Titan blob files | +| `{db-name}.{cf-name}.titan.discardable-ratio` | The threshold of garbage data ratio in Titan data files for GC. When the ratio of useless data in a blob file exceeds the threshold, Titan GC is triggered. | | `server.grpc-memory-pool-quota` | Limits the memory size that can be used by gRPC | | `server.max-grpc-send-msg-len` | Sets the maximum length of a gRPC message that can be sent | | `server.snap-io-max-bytes-per-sec` | Sets the maximum allowable disk bandwidth when processing snapshots | diff --git a/enable-tls-between-clients-and-servers.md b/enable-tls-between-clients-and-servers.md index 34724f5cfbb93..3e47532a9bb88 100644 --- a/enable-tls-between-clients-and-servers.md +++ b/enable-tls-between-clients-and-servers.md @@ -1,16 +1,16 @@ --- title: Enable TLS Between TiDB Clients and Servers -summary: Use the encrypted connection to ensure data security. +summary: Use secure connections to ensure data security. aliases: ['/docs/dev/enable-tls-between-clients-and-servers/','/docs/dev/how-to/secure/enable-tls-clients/','/docs/dev/encrypted-connections-with-tls-protocols/'] --- # Enable TLS between TiDB Clients and Servers -Non-encrypted connection between TiDB's server and clients is allowed by default, which enables third parties that monitor channel traffic to know the data sent and received between the server and the client, including query content and query results. If a channel is untrustworthy (such as if the client is connected to the TiDB server via a public network), then a non-encrypted connection is prone to information leakage. In this case, for security reasons, it is recommended to require an encrypted connection. +By default, TiDB allows insecure connections between the server and clients. This enables third parties that monitor channel traffic to know and possibly modify the data sent and received between the server and the client, including query content and query results. If a channel is untrustworthy (such as if the client is connected to the TiDB server via a public network), an insecure connection is prone to information leakage. In this case, for security reasons, it is recommended to require a connection that is secured with TLS. -The TiDB server supports the encrypted connection based on the TLS (Transport Layer Security). The protocol is consistent with MySQL encrypted connections and is directly supported by existing MySQL clients such as MySQL Client, MySQL Shell and MySQL drivers. TLS is sometimes referred to as SSL (Secure Sockets Layer). Because the SSL protocol has [known security vulnerabilities](https://en.wikipedia.org/wiki/Transport_Layer_Security), TiDB does not support SSL. TiDB supports the following protocols: TLSv1.0, TLSv1.1, TLSv1.2 and TLSv1.3. +The TiDB server supports secure connections based on the TLS (Transport Layer Security) protocol. The protocol is consistent with MySQL secure connections and is directly supported by existing MySQL clients such as MySQL Client, MySQL Shell and MySQL drivers. TLS is sometimes referred to as SSL (Secure Sockets Layer). Because the SSL protocol has [known security vulnerabilities](https://en.wikipedia.org/wiki/Transport_Layer_Security), TiDB does not support SSL. TiDB supports the following protocols: TLSv1.2 and TLSv1.3. -When an encrypted connection is used, the connection has the following security properties: +When a TLS secured connection is used, the connection has the following security properties: - Confidentiality: the traffic plaintext is encrypted to avoid eavesdropping - Integrity: the traffic plaintext cannot be tampered @@ -20,8 +20,8 @@ To use connections secured with TLS, you first need to configure the TiDB server Similar to MySQL, TiDB allows TLS and non-TLS connections on the same TCP port. For a TiDB server with TLS enabled, you can choose to securely connect to the TiDB server through an encrypted connection, or to use an unencrypted connection. You can use the following ways to require the use of secure connections: -+ Configure the system variable `require_secure_transport` to require secure connections to the TiDB server for all users. -+ Specify `REQUIRE SSL` when you create a user (`create user`), or modify an existing user (`alter user`), which is to specify that specified users must use the encrypted connection to access TiDB. The following is an example of creating a user: ++ Configure the system variable [`require_secure_transport`](/system-variables.md#require_secure_transport-new-in-v610) to require secure connections to the TiDB server for all users. ++ Specify `REQUIRE SSL` when you create a user (`create user`), or modify an existing user (`alter user`), which is to specify that specified users must use TLS connections to access TiDB. The following is an example of creating a user: {{< copyable "sql" >}} @@ -47,27 +47,27 @@ See the following descriptions about the related parameters to enable secure con To enable secure connections with your own certificates in the TiDB server, you must specify both of the `ssl-cert` and `ssl-key` parameters in the configuration file when you start the TiDB server. You can also specify the `ssl-ca` parameter for client authentication (see [Enable authentication](#enable-authentication)). -All the files specified by the parameters are in PEM (Privacy Enhanced Mail) format. Currently, TiDB does not support the import of a password-protected private key, so it is required to provide a private key file without a password. If the certificate or private key is invalid, the TiDB server starts as usual, but the client cannot connect to the TiDB server through an encrypted connection. +All the files specified by the parameters are in PEM (Privacy Enhanced Mail) format. Currently, TiDB does not support the import of a password-protected private key, so it is required to provide a private key file without a password. If the certificate or private key is invalid, the TiDB server starts as usual, but the client cannot connect to the TiDB server through a TLS connection. -If the certificate parameters are correct, TiDB outputs `secure connection is enabled` when started; otherwise, it outputs `secure connection is NOT ENABLED`. +If the certificate parameters are correct, TiDB outputs `mysql protocol server secure connection is enabled` to the logs on `"INFO"` level when started. -For TiDB versions earlier than v5.2.0, you can use `mysql_ssl_rsa_setup --datadir=./certs` to generate certficates. The `mysql_ssl_rsa_setup` tool is a part of MySQL Server. +## Configure the MySQL client to use TLS connections -## Configure the MySQL client to use encrypted connections - -The client of MySQL 5.7 or later versions attempts to establish an encrypted connection by default. If the server does not support encrypted connections, it automatically returns to unencrypted connections. The client of MySQL earlier than version 5.7 uses the unencrypted connection by default. +The client of MySQL 5.7 or later versions attempts to establish a TLS connection by default. If the server does not support TLS connections, it automatically returns to unencrypted connections. The client of MySQL earlier than version 5.7 uses the non-TLS connections by default. You can change the connection behavior of the client using the following `--ssl-mode` parameters: -- `--ssl-mode=REQUIRED`: The client requires an encrypted connection. The connection cannot be established if the server side does not support encrypted connections. -- In the absence of the `--ssl-mode` parameter: The client attempts to use an encrypted connection, but the encrypted connection cannot be established if the server side does not support encrypted connections. Then the client uses an unencrypted connection. +- `--ssl-mode=REQUIRED`: The client requires a TLS connection. The connection cannot be established if the server side does not support TLS connections. +- In the absence of the `--ssl-mode` parameter: The client attempts to use a TLS connection, but the encrypted connection cannot be established if the server side does not support encrypted connections. Then the client uses an unencrypted connection. - `--ssl-mode=DISABLED`: The client uses an unencrypted connection. -MySQL 8.0 clients have two SSL modes in addition to this parameter: +MySQL 8.x clients have two SSL modes in addition to this parameter: - `--ssl-mode=VERIFY_CA`: Validates the certificate from the server against the CA that requires `--ssl-ca`. - `--ssl-mode=VERIFY_IDENTITY`: The same as `VERIFY_CA`, but also validating whether the hostname you are connecting to matches the certificate. +For MySQL 5.7 and MariaDB clients and earlier you can use `--ssl-verify-server-cert` to enable validation of the server certificate. + For more information, see [Client-Side Configuration for Encrypted Connections](https://dev.mysql.com/doc/refman/8.0/en/using-encrypted-connections.html#using-encrypted-connections-client-side-configuration) in MySQL. ## Enable authentication @@ -87,17 +87,15 @@ If the `ssl-ca` parameter is not specified in the TiDB server or MySQL client, t - To perform mutual authentication, meet both of the above requirements. -By default, the server-to-client authentication is optional. Even if the client does not present its certificate of identification during the TLS handshake, the TLS connection can be still established. You can also require the client to be authenticated by specifying `require x509` when creating a user (`create user`), granting permissions (`grant`), or modifying an existing user (`alter user`). The following is an example of creating a user: - -{{< copyable "sql" >}} +By default, the server-to-client authentication is optional. Even if the client does not present its certificate of identification during the TLS handshake, the TLS connection can be still established. You can also require the client to be authenticated by specifying `REQUIRE x509` when creating a user (`CREATE USER`), or modifying an existing user (`ALTER USER`). The following is an example of creating a user: ```sql -create user 'u1'@'%' require x509; +CREATE USER 'u1'@'%' REQUIRE X509; ``` > **Note:** > -> If the login user has configured using the [TiDB Certificate-Based Authentication for Login](/certificate-authentication.md#configure-the-user-certificate-information-for-login-verification), the user is implicitly required to enable the encrypted connection to TiDB. +> If the login user has configured using the [TiDB Certificate-Based Authentication for Login](/certificate-authentication.md#configure-the-user-certificate-information-for-login-verification), the user is implicitly required to enable the TLS connection to TiDB. ## Check whether the current connection uses encryption @@ -105,13 +103,22 @@ Use the `SHOW STATUS LIKE "%Ssl%";` statement to get the details of the current See the following example of the result in an encrypted connection. The results change according to different TLS versions or encryption protocols supported by the client. +```sql +SHOW STATUS LIKE "Ssl%"; +``` + ``` -mysql> SHOW STATUS LIKE "%Ssl%"; -...... -| Ssl_verify_mode | 5 | -| Ssl_version | TLSv1.2 | -| Ssl_cipher | ECDHE-RSA-AES128-GCM-SHA256 | -...... ++-----------------------+-------------------------------------------------------> +| Variable_name | Value > ++-----------------------+-------------------------------------------------------> +| Ssl_cipher | TLS_AES_128_GCM_SHA256 > +| Ssl_cipher_list | RC4-SHA:DES-CBC3-SHA:AES128-SHA:AES256-SHA:AES128-SHA2> +| Ssl_server_not_after | Apr 23 07:59:47 2024 UTC > +| Ssl_server_not_before | Jan 24 07:59:47 2024 UTC > +| Ssl_verify_mode | 5 > +| Ssl_version | TLSv1.3 > ++-----------------------+-------------------------------------------------------> +6 rows in set (0.0062 sec) ``` For the official MySQL client, you can also use the `STATUS` or `\s` statement to view the connection status: @@ -119,24 +126,22 @@ For the official MySQL client, you can also use the `STATUS` or `\s` statement t ``` mysql> \s ... -SSL: Cipher in use is ECDHE-RSA-AES128-GCM-SHA256 +SSL: Cipher in use is TLS_AES_128_GCM_SHA256 ... ``` ## Supported TLS versions, key exchange protocols, and encryption algorithms -The TLS versions, key exchange protocols and encryption algorithms supported by TiDB are determined by the official Golang libraries. +The TLS versions, key exchange protocols and encryption algorithms supported by TiDB are determined by the official Go libraries. The crypto policy for your operating system and the client library you are using might also impact the list of supported protocols and cipher suites. ### Supported TLS versions -- TLSv1.0 (disabled by default) -- TLSv1.1 (disabled by default) - TLSv1.2 - TLSv1.3 -The `tls-version` configuration option can be used to limit the TLS versions that can be used. +You can use the [`tls-version`](/tidb-configuration-file.md#tls-version) configuration option to limit the TLS versions that can be used. The actual TLS versions that can be used depend on the OS crypto policy, MySQL client version and the SSL/TLS library that is used by the client. diff --git a/encryption-at-rest.md b/encryption-at-rest.md index 14ec3d239083e..8671963d3118f 100644 --- a/encryption-at-rest.md +++ b/encryption-at-rest.md @@ -22,7 +22,7 @@ When a TiDB cluster is deployed, the majority of user data is stored on TiKV and TiKV supports encryption at rest. This feature allows TiKV to transparently encrypt data files using [AES](https://en.wikipedia.org/wiki/Advanced_Encryption_Standard) or [SM4](https://en.wikipedia.org/wiki/SM4_(cipher)) in [CTR](https://en.wikipedia.org/wiki/Block_cipher_mode_of_operation) mode. To enable encryption at rest, an encryption key must be provided by the user and this key is called master key. TiKV automatically rotates data keys that it used to encrypt actual data files. Manually rotating the master key can be done occasionally. Note that encryption at rest only encrypts data at rest (namely, on disk) and not while data is transferred over network. It is advised to use TLS together with encryption at rest. -Optionally, you can use AWS KMS for both cloud and self-hosted deployments. You can also supply the plaintext master key in a file. +You can use Key Management Service (KMS) for both cloud and self-hosted deployments or supply the plaintext master key in a file. TiKV currently does not exclude encryption keys and user data from core dumps. It is advised to disable core dumps for the TiKV process when using encryption at rest. This is not currently handled by TiKV itself. @@ -59,7 +59,7 @@ TiKV currently supports encrypting data using AES128, AES192, AES256, or SM4 (on * Master key. The master key is provided by user and is used to encrypt the data keys TiKV generates. Management of master key is external to TiKV. * Data key. The data key is generated by TiKV and is the key actually used to encrypt data. -The same master key can be shared by multiple instances of TiKV. The recommended way to provide a master key in production is via AWS KMS. Create a customer master key (CMK) through AWS KMS, and then provide the CMK key ID to TiKV in the configuration file. The TiKV process needs access to the KMS CMK while it is running, which can be done by using an [IAM role](https://aws.amazon.com/iam/). If TiKV fails to get access to the KMS CMK, it will fail to start or restart. Refer to AWS documentation for [KMS](https://docs.aws.amazon.com/kms/index.html) and [IAM](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction.html) usage. +The same master key can be shared by multiple instances of TiKV. The recommended way to provide a master key in production is via KMS. Currently, TiKV supports KMS encryption on [AWS](https://docs.aws.amazon.com/kms/index.html), [Google Cloud](https://cloud.google.com/security/products/security-key-management?hl=en), and [Azure](https://learn.microsoft.com/en-us/azure/key-vault/). To enable KMS encryption, you need to create a customer master key (CMK) through KMS, and then provide the CMK key ID to TiKV using the configuration file. If TiKV fails to get access to the KMS CMK, it will fail to start or restart. Alternatively, if using custom key is desired, supplying the master key via file is also supported. The file must contain a 256 bits (or 32 bytes) key encoded as hex string, end with a newline (namely, `\n`), and contain nothing else. Persisting the key on disk, however, leaks the key, so the key file is only suitable to be stored on the `tempfs` in RAM. @@ -67,9 +67,44 @@ Data keys are passed to the underlying storage engine (namely, RocksDB). All fil Regardless of data encryption method, data keys are encrypted using AES256 in GCM mode for additional authentication. This required the master key to be 256 bits (32 bytes), when passing from file instead of KMS. -### Key creation +### Configure encryption + +To enable encryption, you can add the encryption section in the configuration files of TiKV and PD: + +``` +[security.encryption] +data-encryption-method = "aes128-ctr" +data-key-rotation-period = "168h" # 7 days +``` + +- `data-encryption-method` specifies the encryption algorithm. The possible values are `"aes128-ctr"`, `"aes192-ctr"`, `"aes256-ctr"`, `"sm4-ctr"` (only for v6.3.0 and later versions), and `"plaintext"`. The default value is `"plaintext"`, which means that encryption is disabled by default. + + - For a new TiKV cluster or an existing TiKV cluster, only data written after encryption has been enabled is guaranteed to be encrypted. + - To disable encryption after it is enabled, remove `data-encryption-method` from the configuration file or set its value to `"plaintext"`, and then restart TiKV. + - To change the encryption algorithm, replace the value of `data-encryption-method` with a supported encryption algorithm, and then restart TiKV. After the replacement, as new data is written in, the encryption files generated by the previous encryption algorithm are gradually rewritten to files generated by the new encryption algorithm. + +- `data-key-rotation-period` specifies how often TiKV rotates keys. + +If encryption is enabled (that is, the value of `data-encryption-method` is not `"plaintext"`), you must specify a master key in either of the following ways: + +- [Specify a master key via KMS](#specify-a-master-key-via-kms) +- [Specify a master key via a file](#specify-a-master-key-via-a-file) -To create a key on AWS, follow these steps: +#### Specify a master key via KMS + +TiKV supports KMS encryption for three platforms: AWS, Google Cloud, and Azure. Depending on the platform where your service is deployed, you can choose one of them to configure KMS encryption. + +> **Warning:** +> +> Currently, specifying a master key using Google Cloud KMS is experimental. It is not recommended that you use it in production environments. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. + + + +
+ +**Step 1. Create a master key** + +To create a key on AWS, take the following steps: 1. Go to the [AWS KMS](https://console.aws.amazon.com/kms) on the AWS console. 2. Make sure that you have selected the correct region on the top right corner of your console. @@ -85,19 +120,9 @@ aws --region us-west-2 kms create-alias --alias-name "alias/tidb-tde" --target-k The `--target-key-id` to enter in the second command is in the output of the first command. -### Configure encryption +**Step 2. Configure the master key** -To enable encryption, you can add the encryption section in the configuration files of TiKV and PD: - -``` -[security.encryption] -data-encryption-method = "aes128-ctr" -data-key-rotation-period = "168h" # 7 days -``` - -Possible values for `data-encryption-method` are "aes128-ctr", "aes192-ctr", "aes256-ctr", "sm4-ctr" (only in v6.3.0 and later versions) and "plaintext". The default value is "plaintext", which means encryption is not turned on. `data-key-rotation-period` defines how often TiKV rotates the data key. Encryption can be turned on for a fresh TiKV cluster, or an existing TiKV cluster, though only data written after encryption is enabled is guaranteed to be encrypted. To disable encryption, remove `data-encryption-method` in the configuration file, or reset it to "plaintext", and restart TiKV. To change encryption method, update `data-encryption-method` in the configuration file and restart TiKV. To change the encryption algorithm, replace `data-encryption-method` with a supported encryption algorithm and then restart TiKV. After the replacement, as new data is written in, the encryption file generated by the previous encryption algorithm is gradually rewritten to a file generated by the new encryption algorithm. - -The master key has to be specified if encryption is enabled (that is,`data-encryption-method` is not "plaintext"). To specify a AWS KMS CMK as master key, add the `encryption.master-key` section after the `encryption` section: +To specify the master key using AWS KMS, add the `[security.encryption.master-key]` configuration after the `[security.encryption]` section in the TiKV configuration file: ``` [security.encryption.master-key] @@ -111,6 +136,87 @@ The `key-id` specifies the key ID for the KMS CMK. The `region` is the AWS regio You can also use [multi-Region keys](https://docs.aws.amazon.com/kms/latest/developerguide/multi-region-keys-overview.html) in AWS. For this, you need to set up a primary key in a specific region and add replica keys in the regions you require. +
+
+ +**Step 1. Create a master key** + +To create a key on Google Cloud, take the following steps: + +1. Go to the [Key Management](https://console.cloud.google.com/security/kms/keyrings) page in the Google Cloud console. +2. Click **Create key ring**. Enter a name for the key ring, select a location of the key ring, and then click **Create**. Note that the location of the key ring needs to cover the region where the TiDB cluster is deployed. +3. Select the key ring you created in the previous step, and then click **Create Key** on the key ring details page. +4. Enter a name for the key, set the key information as follows, and then click **Create**. + + - **Protection level**: **Software** or **HSM** + - **Key Material**: **Generated key** + - **Purpose**: **Symmetric encrypt/decrypt** + +You can also perform this operation using the gcloud CLI: + +```shell +gcloud kms keyrings create "key-ring-name" --location "global" +gcloud kms keys create "key-name" --keyring "key-ring-name" --location "global" --purpose "encryption" --rotation-period "30d" +``` + +Make sure to replace the values of `"key-ring-name"`, `"key-name"`, `"global"`, and `"30d"` in the preceding command with the names and configurations corresponding to your actual key. + +**Step 2. Configure the master key** + +To specify the master key using Google Cloud KMS, add the `[security.encryption.master-key]` configuration after the `[security.encryption]` section: + +``` +[security.encryption.master-key] +type = "kms" +key-id = "projects/project-name/locations/global/keyRings/key-ring-name/cryptoKeys/key-name" +vendor = "gcp" + +[security.encryption.master-key.gcp] +credential-file-path = "/path/to/credential.json" +``` + +- `key-id` specifies the key ID of the KMS CMK. +- `credential-file-path` specifies the path of the authentication credentials file, which currently supports two types of credentials: Service Account and Authentication User. If the TiKV environment is already configured with [application default credentials](https://cloud.google.com/docs/authentication/application-default-credentials), there is no need to configure `credential-file-path`. + +
+
+ +**Step 1. Create a master key** + +To create a key on Azure, refer to the instructions in [Set and retrieve a key from Azure Key Vault using the Azure portal](https://learn.microsoft.com/en-us/azure/key-vault/keys/quick-create-portal). + +**Step 2. Configure the master key** + +To specify the master key using Azure KMS, add the `[security.encryption.master-key]` configuration after the `[security.encryption]` section in the TiKV configuration file: + +``` +[security.encryption.master-key] +type = 'kms' +key-id = 'your-kms-key-id' +region = 'region-name' +endpoint = 'endpoint' +vendor = 'azure' + +[security.encryption.master-key.azure] +tenant-id = 'tenant_id' +client-id = 'client_id' +keyvault-url = 'keyvault_url' +hsm-name = 'hsm_name' +hsm-url = 'hsm_url' +# The following four fields are optional, used to set client authentication credentials. You can configure them according to the requirements of your scenario. +client_certificate = "" +client_certificate_path = "" +client_certificate_password = "" +client_secret = "" +``` + +Except `vendor`, you need to modify the values of other fields in the preceding configuration to the corresponding configuration of the actual key. + +
+
+ +#### Specify a master key via a file + To specify a master key that's stored in a file, the master key configuration would look like the following: ``` diff --git a/error-codes.md b/error-codes.md index 08252447127c5..1bba32e0a314b 100644 --- a/error-codes.md +++ b/error-codes.md @@ -532,7 +532,7 @@ TiDB is compatible with the error codes in MySQL, and in most cases returns the * Error Number: 9001 - The complete error message: `ERROR 9001 (HY000): PD Server Timeout` + The complete error message: `ERROR 9001 (HY000): PD server timeout` The PD request timed out. @@ -540,7 +540,7 @@ TiDB is compatible with the error codes in MySQL, and in most cases returns the * Error Number: 9002 - The complete error message: `ERROR 9002 (HY000): TiKV Server Timeout` + The complete error message: `ERROR 9002 (HY000): TiKV server timeout` The TiKV request timed out. diff --git a/expression-syntax.md b/expression-syntax.md index 09489d4f95975..fe5fae943a5ba 100644 --- a/expression-syntax.md +++ b/expression-syntax.md @@ -18,7 +18,7 @@ The expressions can be divided into the following types: - ParamMarker (`?`), system variables, user variables and CASE expressions. -The following rules are the expression syntax, which is based on the [parser.y](https://github.com/pingcap/parser/blob/master/parser.y) rules of TiDB parser. For the navigable version of the following syntax diagram, refer to [TiDB SQL Syntax Diagram](https://pingcap.github.io/sqlgram/#Expression). +The following rules are the expression syntax, which is based on the [`parser.y`](https://github.com/pingcap/tidb/blob/master/pkg/parser/parser.y) rules of TiDB parser. For the navigable version of the following syntax diagram, refer to [TiDB SQL Syntax Diagram](https://pingcap.github.io/sqlgram/#Expression). ```ebnf+diagram Expression ::= diff --git a/faq/backup-and-restore-faq.md b/faq/backup-and-restore-faq.md index f76327604fc6f..de4efdfc1cda6 100644 --- a/faq/backup-and-restore-faq.md +++ b/faq/backup-and-restore-faq.md @@ -321,3 +321,9 @@ No, it is not necessary. Starting from v7.1.0, BR supports resuming data from a ## After the recovery is complete, can I delete a specific table and then recover it again? Yes, after deleting a specific table, you can recover it again. But note that, you can only recover tables that are deleted using the `DROP TABLE` or `TRUNCATE TABLE` statement, not the `DELETE FROM` statement. This is because `DELETE FROM` only updates the MVCC version to mark the data to be deleted, and the actual data deletion occurs after GC. + +### Why does BR take a lot of memory when restoring statistics information? + +Before v7.6.0, the statistics data backed up by BR is stored together with the table information and loaded into memory during recovery. Therefore, when the backup statistics data is very large, BR needs to occupy a large amount of memory. + +Starting from v7.6.0, the backup statistics is stored in a specific file separately. BR does not load statistic data of any table until BR starts to restore the table, which saves memory. diff --git a/faq/sql-faq.md b/faq/sql-faq.md index 34b83fed00e70..967cea5adaea1 100644 --- a/faq/sql-faq.md +++ b/faq/sql-faq.md @@ -229,9 +229,11 @@ You can combine the above two parameters with the DML of TiDB to use them. For e ## What's the trigger strategy for `auto analyze` in TiDB? -When the number of rows in a new table reaches 1000, and the ratio (the number of modified rows / the current total number of rows) is larger than `tidb_auto_analyze_ratio`, the [`ANALYZE`](/sql-statements/sql-statement-analyze-table.md) statement is automatically triggered. The default value of `tidb_auto_analyze_ratio` is `0.5`, indicating that this feature is enabled by default. To ensure safety, its minimum value is `0.3` when the feature is enabled, and it must be smaller than `pseudo-estimate-ratio` whose default value is `0.8`; otherwise pseudo statistics will be used for a period of time. It is recommended to set `tidb_auto_analyze_ratio` to `0.5`. +When the number of rows in a table or a single partition of a partitioned table reaches 1000, and the ratio (the number of modified rows / the current total number of rows) of the table or partition is larger than [`tidb_auto_analyze_ratio`](/system-variables.md#tidb_auto_analyze_ratio), the [`ANALYZE`](/sql-statements/sql-statement-analyze-table.md) statement is automatically triggered. -To disable `auto analyze`, use the system variable `tidb_enable_auto_analyze`. +The default value of the `tidb_auto_analyze_ratio` system variable is `0.5`, indicating that this feature is enabled by default. It is not recommended to set the value of `tidb_auto_analyze_ratio` to be larger than or equal to [`pseudo-estimate-ratio`](/tidb-configuration-file.md#pseudo-estimate-ratio) (the default value is `0.8`), otherwise the optimizer might use pseudo statistics. TiDB v5.3.0 introduces the [`tidb_enable_pseudo_for_outdated_stats`](/system-variables.md#tidb_enable_pseudo_for_outdated_stats-new-in-v530) variable, and when you set it to `OFF`, pseudo statistics are not used even if the statistics are outdated. + +To disable `auto analyze`, use the system variable [`tidb_enable_auto_analyze`](/system-variables.md#tidb_enable_auto_analyze-new-in-v610). ## Can I use optimizer hints to override the optimizer behavior? diff --git a/functions-and-operators/expressions-pushed-down.md b/functions-and-operators/expressions-pushed-down.md index e114bfc300856..0d6f0a388cf3f 100644 --- a/functions-and-operators/expressions-pushed-down.md +++ b/functions-and-operators/expressions-pushed-down.md @@ -21,7 +21,7 @@ TiFlash also supports pushdown for the functions and operators [listed on this p | [Control flow functions](/functions-and-operators/control-flow-functions.md) | [CASE](https://dev.mysql.com/doc/refman/8.0/en/flow-control-functions.html#operator_case), [IF()](https://dev.mysql.com/doc/refman/8.0/en/flow-control-functions.html#function_if), [IFNULL()](https://dev.mysql.com/doc/refman/8.0/en/flow-control-functions.html#function_ifnull) | | [JSON functions](/functions-and-operators/json-functions.md) | [JSON_ARRAY([val[, val] ...])](https://dev.mysql.com/doc/refman/8.0/en/json-creation-functions.html#function_json-array),
[JSON_CONTAINS(target, candidate[, path])](https://dev.mysql.com/doc/refman/8.0/en/json-search-functions.html#function_json-contains),
[JSON_EXTRACT(json_doc, path[, path] ...)](https://dev.mysql.com/doc/refman/8.0/en/json-search-functions.html#function_json-extract),
[JSON_INSERT(json_doc, path, val[, path, val] ...)](https://dev.mysql.com/doc/refman/8.0/en/json-modification-functions.html#function_json-insert),
[JSON_LENGTH(json_doc[, path])](https://dev.mysql.com/doc/refman/8.0/en/json-attribute-functions.html#function_json-length),
[JSON_MERGE(json_doc, json_doc[, json_doc] ...)](https://dev.mysql.com/doc/refman/8.0/en/json-modification-functions.html#function_json-merge),
[JSON_OBJECT([key, val[, key, val] ...])](https://dev.mysql.com/doc/refman/8.0/en/json-creation-functions.html#function_json-object),
[JSON_REMOVE(json_doc, path[, path] ...)](https://dev.mysql.com/doc/refman/8.0/en/json-modification-functions.html#function_json-remove),
[JSON_REPLACE(json_doc, path, val[, path, val] ...)](https://dev.mysql.com/doc/refman/8.0/en/json-modification-functions.html#function_json-replace),
[JSON_SET(json_doc, path, val[, path, val] ...)](https://dev.mysql.com/doc/refman/8.0/en/json-modification-functions.html#function_json-set),
[JSON_TYPE(json_val)](https://dev.mysql.com/doc/refman/8.0/en/json-attribute-functions.html#function_json-type),
[JSON_UNQUOTE(json_val)](https://dev.mysql.com/doc/refman/8.0/en/json-modification-functions.html#function_json-unquote),
[JSON_VALID(val)](https://dev.mysql.com/doc/refman/8.0/en/json-attribute-functions.html#function_json-valid),
[value MEMBER OF(json_array)](https://dev.mysql.com/doc/refman/8.0/en/json-search-functions.html#operator_member-of) | | [Date and time functions](/functions-and-operators/date-and-time-functions.md) | [DATE()](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_date), [DATE_FORMAT()](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_date-format), [DATEDIFF()](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_datediff), [DAYOFMONTH()](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_dayofmonth), [DAYOFWEEK()](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_dayofweek), [DAYOFYEAR()](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_dayofyear), [FROM_DAYS()](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_from-days), [HOUR()](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_hour), [MAKEDATE()](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_makedate), [MAKETIME()](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_maketime), [MICROSECOND()](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_microsecond), [MINUTE()](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_minute), [MONTH()](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_month), [MONTHNAME()](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_monthname), [PERIOD_ADD()](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_period-add), [PERIOD_DIFF()](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_period-diff), [SEC_TO_TIME()](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_sec-to-time), [SECOND()](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_second), [SYSDATE()](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_sysdate), [TIME_TO_SEC()](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_time-to-sec), [TIMEDIFF()](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_timediff), [WEEK()](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_week), [WEEKOFYEAR()](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_weekofyear), [YEAR()](https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_year) | -| [String functions](/functions-and-operators/string-functions.md) | [ASCII()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_ascii), [BIT_LENGTH()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_bit-length), [CHAR()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_char), [CHAR_LENGTH()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_char-length), [CONCAT()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_concat), [CONCAT_WS()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_concat-ws), [ELT()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_elt), [FIELD()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_field), [HEX()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_hex), [LENGTH()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_length), [LIKE](https://dev.mysql.com/doc/refman/8.0/en/string-comparison-functions.html#operator_like), [LOWER()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_lower), [LTRIM()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_ltrim), [MID()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_mid), [NOT LIKE](https://dev.mysql.com/doc/refman/8.0/en/string-comparison-functions.html#operator_not-like), [NOT REGEXP](https://dev.mysql.com/doc/refman/8.0/en/regexp.html#operator_not-regexp), [REGEXP](https://dev.mysql.com/doc/refman/8.0/en/regexp.html#operator_regexp), [REPLACE()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_replace), [REVERSE()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_reverse), [RIGHT()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_right), [RTRIM()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_rtrim), [SPACE()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_space), [STRCMP()](https://dev.mysql.com/doc/refman/8.0/en/string-comparison-functions.html#function_strcmp), [SUBSTR()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_substr), [SUBSTRING()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_substring), [UPPER()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_upper) | +| [String functions](/functions-and-operators/string-functions.md) | [ASCII()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_ascii), [BIT_LENGTH()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_bit-length), [CHAR()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_char), [CHAR_LENGTH()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_char-length), [CONCAT()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_concat), [CONCAT_WS()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_concat-ws), [ELT()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_elt), [FIELD()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_field), [HEX()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_hex), [LENGTH()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_length), [LIKE](https://dev.mysql.com/doc/refman/8.0/en/string-comparison-functions.html#operator_like), [LOWER()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_lower), [LTRIM()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_ltrim), [MID()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_mid), [NOT LIKE](https://dev.mysql.com/doc/refman/8.0/en/string-comparison-functions.html#operator_not-like), [NOT REGEXP](https://dev.mysql.com/doc/refman/8.0/en/regexp.html#operator_not-regexp), [REGEXP](https://dev.mysql.com/doc/refman/8.0/en/regexp.html#operator_regexp), [REGEXP_INSTR()](https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-instr), [REGEXP_LIKE()](https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-like), [REGEXP_REPLACE()](https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-replace), [REGEXP_SUBSTR()](https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-substr), [REPLACE()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_replace), [REVERSE()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_reverse), [RIGHT()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_right), [RLIKE](https://dev.mysql.com/doc/refman/8.0/en/regexp.html#operator_regexp), [RTRIM()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_rtrim), [SPACE()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_space), [STRCMP()](https://dev.mysql.com/doc/refman/8.0/en/string-comparison-functions.html#function_strcmp), [SUBSTR()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_substr), [SUBSTRING()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_substring), [UPPER()](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_upper) | | [Aggregation functions](/functions-and-operators/aggregate-group-by-functions.md#aggregate-group-by-functions) | [COUNT()](https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_count), [COUNT(DISTINCT)](https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_count-distinct), [SUM()](https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_sum), [AVG()](https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_avg), [MAX()](https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_max), [MIN()](https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_min), [VARIANCE()](https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_variance), [VAR_POP()](https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_var-pop), [STD()](https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_std), [STDDEV()](https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_stddev), [STDDEV_POP](https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_stddev-pop), [VAR_SAMP()](https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_var-samp), [STDDEV_SAMP()](https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_stddev-samp), [JSON_ARRAYAGG(key)](https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_json-arrayagg), [JSON_OBJECTAGG(key, value)](https://dev.mysql.com/doc/refman/8.0/en/aggregate-functions.html#function_json-objectagg) | | [Encryption and compression functions](/functions-and-operators/encryption-and-compression-functions.md#encryption-and-compression-functions) | [MD5()](https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_md5), [SHA1(), SHA()](https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_sha1), [UNCOMPRESSED_LENGTH()](https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_uncompressed-length) | | [Cast functions and operators](/functions-and-operators/cast-functions-and-operators.md#cast-functions-and-operators) | [CAST()](https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html#function_cast), [CONVERT()](https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html#function_convert) | diff --git a/functions-and-operators/string-functions.md b/functions-and-operators/string-functions.md index 7c3762a95d949..d6c9f6bf1ad7b 100644 --- a/functions-and-operators/string-functions.md +++ b/functions-and-operators/string-functions.md @@ -28,36 +28,20 @@ The `ASCII(str)` function is used to get the ASCII value of the leftmost charact > > `ASCII(str)` only works for characters represented using 8 bits of binary digits (one byte). -Examples: +Example: ```sql -SELECT ASCII('A'); - -+------------+ -| ASCII('A') | -+------------+ -| 65 | -+------------+ +SELECT ASCII('A'), ASCII('TiDB'), ASCII(23); ``` -```sql -SELECT ASCII('TiDB'); - -+---------------+ -| ASCII('TiDB') | -+---------------+ -| 84 | -+---------------+ -``` +Output: ```sql -SELECT ASCII(23); - -+-----------+ -| ASCII(23) | -+-----------+ -| 50 | -+-----------+ ++------------+---------------+-----------+ +| ASCII('A') | ASCII('TiDB') | ASCII(23) | ++------------+---------------+-----------+ +| 65 | 84 | 50 | ++------------+---------------+-----------+ ``` ### [`BIN()`](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_bin) @@ -68,24 +52,34 @@ The `BIN()` function is used to convert the given argument into a string represe - If the argument is a negative number, the function converts the absolute value of the argument to its binary representation, inverts each bit of the binary value (changing `0` to `1` and `1` to `0`), and then adds `1` to the inverted value. - If the argument is a string containing only digits, the function returns the result according to those digits. For example, the results for `"123"` and `123` are the same. - If the argument is a string and its first character is not a digit (such as `"q123"`), the function returns `0`. -- If the argument is a string that consists of digits and non-digits, the function returns the result according to the consecutive digits at the beginning of the argument. For example, the results for `"123q123"` and `123` are the same. +- If the argument is a string that consists of digits and non-digits, the function returns the result according to the consecutive digits at the beginning of the argument. For example, the results for `"123q123"` and `123` are the same, but `BIN('123q123')` generates a warning like `Truncated incorrect INTEGER value: '123q123'`. - If the argument is `NULL`, the function returns `NULL`. -Examples: +Example 1: ```sql -SELECT BIN(123); +SELECT BIN(123), BIN('123q123'); +``` + +Output 1: -+----------+ -| BIN(123) | -+----------+ -| 1111011 | -+----------+ +```sql ++----------+----------------+ +| BIN(123) | BIN('123q123') | ++----------+----------------+ +| 1111011 | 1111011 | ++----------+----------------+ ``` +Example 2: + ```sql SELECT BIN(-7); +``` + +Output 2: +```sql +------------------------------------------------------------------+ | BIN(-7) | +------------------------------------------------------------------+ @@ -93,18 +87,6 @@ SELECT BIN(-7); +------------------------------------------------------------------+ ``` -```sql -SELECT BIN("123q123"); - -+----------------+ -| BIN("123q123") | -+----------------+ -| 1111011 | -+----------------+ -``` - -Return a string containing binary representation of a number. - ### [`BIT_LENGTH()`](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_bit-length) The `BIT_LENGTH()` function is used to return the length of a given argument in bits. @@ -1307,7 +1289,7 @@ Example 1: SELECT SUBSTRING_INDEX('www.tidbcloud.com', '.', 2); ``` -Result 1: +Output 1: ```sql +-----------------------------------------+ @@ -1323,7 +1305,7 @@ Example 2: SELECT SUBSTRING_INDEX('www.tidbcloud.com', '.', -1); ``` -Result 2: +Output 2: ```sql +------------------------------------------+ @@ -1352,7 +1334,7 @@ Example 1: SELECT TO_BASE64('abc'); ``` -Result 1: +Output 1: ```sql +------------------+ @@ -1368,7 +1350,7 @@ Example 2: SELECT TO_BASE64(6); ``` -Result 2: +Output 2: ```sql +--------------+ @@ -1388,19 +1370,108 @@ Remove leading and trailing spaces. ### [`UCASE()`](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_ucase) -Synonym for `UPPER()`. +The `UCASE()` function is used to convert a string to uppercase letters. This function is equivalent to the `UPPER()` function. + +> **Note:** +> +> When the string is null, the `UCASE()` function returns `NULL`. + +Example: + +```sql +SELECT UCASE('bigdata') AS result_upper, UCASE(null) AS result_null; +``` + +Output: + +```sql ++--------------+-------------+ +| result_upper | result_null | ++--------------+-------------+ +| BIGDATA | NULL | ++--------------+-------------+ +``` ### [`UNHEX()`](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_unhex) -Return a string containing hex representation of a number. +The `UNHEX()` function performs the reverse operation of the `HEX()` function. It treats each pair of characters in the argument as a hexadecimal number and converts it to the character represented by that number, returning the result as a binary string. + +> **Note:** +> +> The argument must be a valid hexadecimal value that contains `0`–`9`, `A`–`F`, or `a`–`f`. If the argument is `NULL` or falls outside this range, the function returns `NULL`. + +Example: + +```sql +SELECT UNHEX('54694442'); +``` + +Output: + +```sql ++--------------------------------------+ +| UNHEX('54694442') | ++--------------------------------------+ +| 0x54694442 | ++--------------------------------------+ +``` ### [`UPPER()`](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_upper) -Convert to uppercase. +The `UPPER()` function is used to convert a string to uppercase letters. This function is equivalent to the `UCASE()` function. + +> **Note:** +> +> When the string is null, the `UPPER()` function returns `NULL`. + +Example: + +```sql +SELECT UPPER('bigdata') AS result_upper, UPPER(null) AS result_null; +``` + +Output: + +```sql ++--------------+-------------+ +| result_upper | result_null | ++--------------+-------------+ +| BIGDATA | NULL | ++--------------+-------------+ +``` ### [`WEIGHT_STRING()`](https://dev.mysql.com/doc/refman/8.0/en/string-functions.html#function_weight-string) -Return the weight string for the input string. +The `WEIGHT_STRING()` function returns the weight string (binary characters) for the input string, primarily used for sorting and comparison operations in multi-character set scenarios. If the argument is `NULL`, it returns `NULL`. The syntax is as follows: + +```sql +WEIGHT_STRING(str [AS {CHAR|BINARY}(N)]) +``` + +- `str`: the input string expression. If it is a non-binary string, such as a `CHAR`, `VARCHAR`, or `TEXT` value, the return value contains the collation weights for the string. If it is a binary string, such as a `BINARY`, `VARBINARY`, or `BLOB` value, the return value is the same as the input. + +- `AS {CHAR|BINARY}(N)`: optional parameters used to specify the type and length of the output. `CHAR` represents the character data type, and `BINARY` represents the binary data type. `N` specifies the output length, which is an integer greater than or equal to 1. + +> **Note:** +> +> If `N` is less than the string length, the string is truncated. If `N` exceeds the string length, `AS CHAR(N)` pads the string with spaces to the specified length, and `AS BINARY(N)` pads the string with `0x00` to the specified length. + +Example: + +```sql +SET NAMES 'utf8mb4'; +SELECT HEX(WEIGHT_STRING('ab' AS CHAR(3))) AS char_result, HEX(WEIGHT_STRING('ab' AS BINARY(3))) AS binary_result; +``` + +Output: + +```sql ++-------------+---------------+ +| char_result | binary_result | ++-------------+---------------+ +| 6162 | 616200 | ++-------------+---------------+ +``` ## Unsupported functions diff --git a/hardware-and-software-requirements.md b/hardware-and-software-requirements.md index 8896807aa8663..b345b4fbadf3d 100644 --- a/hardware-and-software-requirements.md +++ b/hardware-and-software-requirements.md @@ -267,3 +267,32 @@ As an open-source distributed SQL database, TiDB requires the following network ## Web browser requirements TiDB relies on [Grafana](https://grafana.com/) to provide visualization of database metrics. A recent version of Microsoft Edge, Safari, Chrome or Firefox with Javascript enabled is sufficient. + +## Hardware and software requirements for TiFlash disaggregated storage and compute architecture + +The preceding TiFlash software and hardware requirements are for the coupled storage and compute architecture. Starting from v7.0.0, TiFlash supports the [disaggregated storage and compute architecture](/tiflash/tiflash-disaggregated-and-s3.md). In this architecture, TiFlash is divided into two types of nodes: the Write Node and the Compute Node. The requirements for these nodes are as follows: + +- Software: remain the same as the coupled storage and compute architecture, see [OS and platform requirements](#os-and-platform-requirements). +- Network port: remain the same as the coupled storage and compute architecture, see [Network](#network-requirements). +- Disk space: + - TiFlash Write Node: it is recommended to configure at least 200 GB of disk space, which is used as a local buffer when adding TiFlash replicas and migrating Region replicas before uploading data to Amazon S3. In addition, an object storage compatible with Amazon S3 is required. + - TiFlash Compute Node: it is recommended to configure at least 100 GB of disk space, which is mainly used to cache the data read from the Write Node to improve performance. The cache of the Compute Node might be fully used, which is normal. +- CPU and memory requirements are described in the following sections. + +### Development and test environments + +| Component | CPU | Memory | Local Storage | Network | Number of Instances (Minimum Requirement) | +| --- | --- | --- | --- | --- | --- | +| TiFlash Write Node | 16 cores+ | 32 GB+ | SSD, 200 GB+ | Gigabit Ethernet | 1 | +| TiFlash Compute Node | 16 cores+ | 32 GB+ | SSD, 100 GB+ | Gigabit Ethernet | 0 (see the following note) | + +### Production environment + +| Component | CPU | Memory | Disk Type | Network | Number of Instances (Minimum Requirement) | +| --- | --- | --- | --- | --- | --- | +| TiFlash Write Node | 32 cores+ | 64 GB+ | 1 or more SSDs | 10 Gigabit Ethernet (2 recommended) | 1 | +| TiFlash Compute Node | 32 cores+ | 64 GB+ | 1 or more SSDs | 10 Gigabit Ethernet (2 recommended) | 0 (see the following note) | + +> **Note:** +> +> You can use deployment tools such as TiUP to quickly scale in or out the TiFlash Compute Node, within the range of `[0, +inf]`. diff --git a/information-schema/information-schema-tidb-index-usage.md b/information-schema/information-schema-tidb-index-usage.md new file mode 100644 index 0000000000000..00e57aa9cd906 --- /dev/null +++ b/information-schema/information-schema-tidb-index-usage.md @@ -0,0 +1,103 @@ +--- +title: TIDB_INDEX_USAGE +summary: Learn the `TIDB_INDEX_USAGE` INFORMATION_SCHEMA table. +--- + +# TIDB_INDEX_USAGE + + + +Starting from v8.0.0, TiDB provides the `TIDB_INDEX_USAGE` table. You can use `TIDB_INDEX_USAGE` to get the usage statistics of all indexes on the current TiDB node. By default, TiDB collects these index usage statistics during SQL statement execution. You can disable this feature by turning off the [`instance.tidb_enable_collect_execution_info`](/tidb-configuration-file.md#tidb_enable_collect_execution_info) configuration item or the [`tidb_enable_collect_execution_info`](/system-variables.md#tidb_enable_collect_execution_info) system variable. + + + + + +Starting from v8.0.0, TiDB provides the `TIDB_INDEX_USAGE` table. You can use `TIDB_INDEX_USAGE` to get the usage statistics of all indexes on the current TiDB node. By default, TiDB collects these index usage statistics during SQL statement execution. You can disable this feature by turning off the [`instance.tidb_enable_collect_execution_info`](https://docs.pingcap.com/tidb/v8.0/tidb-configuration-file#tidb_enable_collect_execution_info) configuration item or the [`tidb_enable_collect_execution_info`](/system-variables.md#tidb_enable_collect_execution_info) system variable. + + + +```sql +USE INFORMATION_SCHEMA; +DESC TIDB_INDEX_USAGE; +``` + +```sql ++--------------------------+-------------+------+------+---------+-------+ +| Field | Type | Null | Key | Default | Extra | ++--------------------------+-------------+------+------+---------+-------+ +| TABLE_SCHEMA | varchar(64) | YES | | NULL | | +| TABLE_NAME | varchar(64) | YES | | NULL | | +| INDEX_NAME | varchar(64) | YES | | NULL | | +| QUERY_TOTAL | bigint(21) | YES | | NULL | | +| KV_REQ_TOTAL | bigint(21) | YES | | NULL | | +| ROWS_ACCESS_TOTAL | bigint(21) | YES | | NULL | | +| PERCENTAGE_ACCESS_0 | bigint(21) | YES | | NULL | | +| PERCENTAGE_ACCESS_0_1 | bigint(21) | YES | | NULL | | +| PERCENTAGE_ACCESS_1_10 | bigint(21) | YES | | NULL | | +| PERCENTAGE_ACCESS_10_20 | bigint(21) | YES | | NULL | | +| PERCENTAGE_ACCESS_20_50 | bigint(21) | YES | | NULL | | +| PERCENTAGE_ACCESS_50_100 | bigint(21) | YES | | NULL | | +| PERCENTAGE_ACCESS_100 | bigint(21) | YES | | NULL | | +| LAST_ACCESS_TIME | datetime | YES | | NULL | | ++--------------------------+-------------+------+------+---------+-------+ +14 rows in set (0.00 sec) +``` + +The columns in the `TIDB_INDEX_USAGE` table are as follows: + +* `TABLE_SCHEMA`: The name of the database to which the table containing the index belongs. +* `TABLE_NAME`: The name of the table containing the index. +* `INDEX_NAME`: The name of the index. +* `QUERY_TOTAL`: The total number of statements accessing the index. +* `KV_REQ_TOTAL`: The total number of KV requests generated when accessing the index. +* `ROWS_ACCESS_TOTAL`: The total number of rows scanned when accessing the index. +* `PERCENTAGE_ACCESS_0`: The number of times the row access ratio (the percentage of accessed rows out of the total number of rows in the table) is 0. +* `PERCENTAGE_ACCESS_0_1`: The number of times the row access ratio is between 0% and 1%. +* `PERCENTAGE_ACCESS_1_10`: The number of times the row access ratio is between 1% and 10%. +* `PERCENTAGE_ACCESS_10_20`: The number of times the row access ratio is between 10% and 20%. +* `PERCENTAGE_ACCESS_20_50`: The number of times the row access ratio is between 20% and 50%. +* `PERCENTAGE_ACCESS_50_100`: The number of times the row access ratio is between 50% and 100%. +* `PERCENTAGE_ACCESS_100`: The number of times the row access ratio is 100%. +* `LAST_ACCESS_TIME`: The time of the most recent access to the index. + +## CLUSTER_TIDB_INDEX_USAGE + +The `TIDB_INDEX_USAGE` table only provides usage statistics of all indexes on a single TiDB node. To obtain the index usage statistics on all TiDB nodes in the cluster, you need to query the `CLUSTER_TIDB_INDEX_USAGE` table. + +Compared with the `TIDB_INDEX_USAGE` table, the query result of the `CLUSTER_TIDB_INDEX_USAGE` table includes an additional `INSTANCE` field. This field displays the IP address and port of each node in the cluster, which helps you distinguish the statistics across different nodes. + +```sql +USE INFORMATION_SCHEMA; +DESC CLUSTER_TIDB_INDEX_USAGE; +``` + +The output is as follows: + +```sql ++-------------------------+-----------------------------------------------------------------+------+------+---------+-------+ +| Field | Type | Null | Key | Default | Extra | ++-------------------------+-----------------------------------------------------------------+------+------+---------+-------+ +| INSTANCE | varchar(64) | YES | | NULL | | +| ID | bigint(21) unsigned | NO | PRI | NULL | | +| START_TIME | timestamp(6) | YES | | NULL | | +| CURRENT_SQL_DIGEST | varchar(64) | YES | | NULL | | +| CURRENT_SQL_DIGEST_TEXT | text | YES | | NULL | | +| STATE | enum('Idle','Running','LockWaiting','Committing','RollingBack') | YES | | NULL | | +| WAITING_START_TIME | timestamp(6) | YES | | NULL | | +| MEM_BUFFER_KEYS | bigint(64) | YES | | NULL | | +| MEM_BUFFER_BYTES | bigint(64) | YES | | NULL | | +| SESSION_ID | bigint(21) unsigned | YES | | NULL | | +| USER | varchar(16) | YES | | NULL | | +| DB | varchar(64) | YES | | NULL | | +| ALL_SQL_DIGESTS | text | YES | | NULL | | +| RELATED_TABLE_IDS | text | YES | | NULL | | +| WAITING_TIME | double | YES | | NULL | | ++-------------------------+-----------------------------------------------------------------+------+------+---------+-------+ +15 rows in set (0.00 sec) +``` + +## Limitations + +- The data in the `TIDB_INDEX_USAGE` table might be delayed by up to 5 minutes. +- After TiDB restarts, the data in the `TIDB_INDEX_USAGE` table is cleared. diff --git a/information-schema/information-schema.md b/information-schema/information-schema.md index 9afd5684c5624..ac9b22bca1a52 100644 --- a/information-schema/information-schema.md +++ b/information-schema/information-schema.md @@ -123,6 +123,7 @@ Many `INFORMATION_SCHEMA` tables have a corresponding `SHOW` command. The benefi | `CLUSTER_SLOW_QUERY` | Provides a cluster-level view of the `SLOW_QUERY` table. This table is not available on [TiDB Serverless](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters. | | `CLUSTER_STATEMENTS_SUMMARY` | Provides a cluster-level view of the `STATEMENTS_SUMMARY` table. This table is not available on [TiDB Serverless](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters. | | `CLUSTER_STATEMENTS_SUMMARY_HISTORY` | Provides a cluster-level view of the `STATEMENTS_SUMMARY_HISTORY` table. This table is not available on [TiDB Serverless](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters. | +| `CLUSTER_TIDB_INDEX_USAGE` | Provides a cluster-level view of the `TIDB_INDEX_USAGE` table. | | `CLUSTER_TIDB_TRX` | Provides a cluster-level view of the `TIDB_TRX` table. | | [`CLUSTER_SYSTEMINFO`](/information-schema/information-schema-cluster-systeminfo.md) | Provides details about kernel parameter configuration for servers in the cluster. This table is not applicable to TiDB Cloud. | | [`DATA_LOCK_WAITS`](/information-schema/information-schema-data-lock-waits.md) | Provides the lock-waiting information on the TiKV server. | @@ -145,6 +146,7 @@ Many `INFORMATION_SCHEMA` tables have a corresponding `SHOW` command. The benefi | [`TIDB_HOT_REGIONS`](/information-schema/information-schema-tidb-hot-regions.md) | Provides statistics about which regions are hot. | | [`TIDB_HOT_REGIONS_HISTORY`](/information-schema/information-schema-tidb-hot-regions-history.md) | Provides history statistics about which Regions are hot. | | [`TIDB_INDEXES`](/information-schema/information-schema-tidb-indexes.md) | Provides index information about TiDB tables. | +| [`TIDB_INDEX_USAGE`](/information-schema/information-schema-tidb-index-usage.md) | Provides the information of the index usage statistics on the TiDB node. | | [`TIDB_SERVERS_INFO`](/information-schema/information-schema-tidb-servers-info.md) | Provides a list of TiDB servers (namely, tidb-server component) | | [`TIDB_TRX`](/information-schema/information-schema-tidb-trx.md) | Provides the information of the transactions that are being executed on the TiDB node. | | [`TIFLASH_REPLICA`](/information-schema/information-schema-tiflash-replica.md) | Provides details about TiFlash replicas. | @@ -196,6 +198,7 @@ Many `INFORMATION_SCHEMA` tables have a corresponding `SHOW` command. The benefi | [`TIDB_HOT_REGIONS`](https://docs.pingcap.com/tidb/stable/information-schema-tidb-hot-regions) | Provides statistics about which regions are hot. This table is not applicable to TiDB Cloud. | | [`TIDB_HOT_REGIONS_HISTORY`](/information-schema/information-schema-tidb-hot-regions-history.md) | Provides history statistics about which Regions are hot. | | [`TIDB_INDEXES`](/information-schema/information-schema-tidb-indexes.md) | Provides index information about TiDB tables. | +| [`TIDB_INDEX_USAGE`](/information-schema/information-schema-tidb-index-usage.md) | Provides the information of the index usage statistics on the TiDB node. | | [`TIDB_SERVERS_INFO`](/information-schema/information-schema-tidb-servers-info.md) | Provides a list of TiDB servers (namely, tidb-server component) | | [`TIDB_TRX`](/information-schema/information-schema-tidb-trx.md) | Provides the information of the transactions that are being executed on the TiDB node. | | [`TIFLASH_REPLICA`](/information-schema/information-schema-tiflash-replica.md) | Provides details about TiFlash replicas. | diff --git a/media/develop/mysql-workbench-adjust-sqleditor-read-timeout.jpg b/media/develop/mysql-workbench-adjust-sqleditor-read-timeout.jpg new file mode 100644 index 0000000000000..0891f2741f8e3 Binary files /dev/null and b/media/develop/mysql-workbench-adjust-sqleditor-read-timeout.jpg differ diff --git a/media/ticdc/ticdc-simple-consumer-1.png b/media/ticdc/ticdc-simple-consumer-1.png new file mode 100644 index 0000000000000..bc2bd9efbc933 Binary files /dev/null and b/media/ticdc/ticdc-simple-consumer-1.png differ diff --git a/media/ticdc/ticdc-simple-consumer-2.png b/media/ticdc/ticdc-simple-consumer-2.png new file mode 100644 index 0000000000000..a7e660bf9d4b3 Binary files /dev/null and b/media/ticdc/ticdc-simple-consumer-2.png differ diff --git a/mysql-compatibility.md b/mysql-compatibility.md index 0716f33b423a7..0851074746c9a 100644 --- a/mysql-compatibility.md +++ b/mysql-compatibility.md @@ -172,7 +172,7 @@ In TiDB, all supported DDL changes can be performed online. However, there are s * The `ALGORITHM={INSTANT,INPLACE,COPY}` syntax functions only as an assertion in TiDB, and does not modify the `ALTER` algorithm. See [`ALTER TABLE`](/sql-statements/sql-statement-alter-table.md) for further details. * Adding/Dropping the primary key of the `CLUSTERED` type is unsupported. For more details about the primary key of the `CLUSTERED` type, refer to [clustered index](/clustered-indexes.md). * Different types of indexes (`HASH|BTREE|RTREE|FULLTEXT`) are not supported, and will be parsed and ignored when specified. -* TiDB supports `HASH`, `RANGE`, `LIST`, and `KEY` partitioning types. Currently, the `KEY` partition type does not support partition statements with an empty partition column list. For an unsupported partition type, TiDB returns `Warning: Unsupported partition type %s, treat as normal table`, where `%s` is the specific unsupported partition type. +* TiDB supports `HASH`, `RANGE`, `LIST`, and `KEY` partitioning types. For an unsupported partition type, TiDB returns `Warning: Unsupported partition type %s, treat as normal table`, where `%s` is the specific unsupported partition type. * Range, Range COLUMNS, List, and List COLUMNS partitioned tables support `ADD`, `DROP`, `TRUNCATE`, and `REORGANIZE` operations. Other partition operations are ignored. * Hash and Key partitioned tables support `ADD`, `COALESCE`, and `TRUNCATE` operations. Other partition operations are ignored. * The following syntaxes are not supported for partitioned tables: diff --git a/optimistic-transaction.md b/optimistic-transaction.md index 147ad9fa41966..f7e4b1fe558c1 100644 --- a/optimistic-transaction.md +++ b/optimistic-transaction.md @@ -65,6 +65,10 @@ However, TiDB transactions also have the following disadvantages: ## Transaction retries +> **Note:** +> +> Starting from v8.0.0, the [`tidb_disable_txn_auto_retry`](/system-variables.md#tidb_disable_txn_auto_retry) system variable is deprecated, and TiDB no longer supports automatic retries of optimistic transactions. It is recommended to use the [Pessimistic transaction mode](/pessimistic-transaction.md). If you encounter optimistic transaction conflicts, you can capture the error and retry transactions in your application. + In the optimistic transaction model, transactions might fail to be committed because of write–write conflict in heavy contention scenarios. TiDB uses optimistic concurrency control by default, whereas MySQL applies pessimistic concurrency control. This means that MySQL adds locks during the execution of write-type SQL statements, and its Repeatable Read isolation level allows for current reads, so commits generally do not encounter exceptions. To lower the difficulty of adapting applications, TiDB provides an internal retry mechanism. ### Automatic retry diff --git a/optimizer-fix-controls.md b/optimizer-fix-controls.md index de8b98807d664..e79f2613bb480 100644 --- a/optimizer-fix-controls.md +++ b/optimizer-fix-controls.md @@ -26,6 +26,12 @@ SET SESSION tidb_opt_fix_control = '44262:ON,44389:ON'; ## Optimizer Fix Controls reference +### [`33031`](https://github.com/pingcap/tidb/issues/33031) New in v8.0.0 + +- Default value: `OFF` +- Possible values: `ON`, `OFF` +- This variable controls whether to allow plan cache for partitioned tables. If it is set to `ON`, neither [Prepared statement plan cache](/sql-prepared-plan-cache.md) nor [Non-prepared statement plan cache](/sql-non-prepared-plan-cache.md) will be enabled for [partitioned tables](/partitioned-table.md). + ### [`44262`](https://github.com/pingcap/tidb/issues/44262) New in v6.5.3 and v7.2.0 - Default value: `OFF` @@ -63,4 +69,4 @@ SET SESSION tidb_opt_fix_control = '44262:ON,44389:ON'; - Default value: `1000` - Possible values: `[0, 2147483647]` - This variable sets the threshold for the optimizer's heuristic strategy to select access paths. If the estimated rows for an access path (such as `Index_A`) is much smaller than that of other access paths (default `1000` times), the optimizer skips the cost comparison and directly selects `Index_A`. -- `0` means to disable this heuristic strategy. \ No newline at end of file +- `0` means to disable this heuristic strategy. diff --git a/optimizer-hints.md b/optimizer-hints.md index 38a7813cf4ced..369020b628aae 100644 --- a/optimizer-hints.md +++ b/optimizer-hints.md @@ -12,7 +12,7 @@ If you encounter a situation where hints do not take effect, see [Troubleshoot c ## Syntax -Optimizer hints are case insensitive and specified within `/*+ ... */` comments following the `SELECT`, `UPDATE` or `DELETE` keyword in a SQL statement. Optimizer hints are not currently supported for `INSERT` statements. +Optimizer hints are case insensitive and specified within `/*+ ... */` comments following the `SELECT`, `INSERT`, `UPDATE` or `DELETE` keyword in a SQL statement. Multiple hints can be specified by separating with commas. For example, the following query uses three different hints: diff --git a/partitioned-table.md b/partitioned-table.md index 1b80da4c7beb1..1f614f01babaa 100644 --- a/partitioned-table.md +++ b/partitioned-table.md @@ -665,11 +665,11 @@ PARTITION BY KEY(fname, store_id) PARTITIONS 4; ``` -Currently, TiDB does not support creating Key partitioned tables if the partition column list specified in `PARTITION BY KEY` is empty. For example, after you execute the following statement, TiDB will create a non-partitioned table and return an `Unsupported partition type KEY, treat as normal table` warning. +Similar to MySQL, TiDB supports creating Key partitioned tables with an empty partition column list specified in `PARTITION BY KEY`. For example, the following statement creates a partitioned table using the primary key `id` as the partitioning key: ```sql CREATE TABLE employees ( - id INT NOT NULL, + id INT NOT NULL PRIMARY KEY, fname VARCHAR(30), lname VARCHAR(30), hired DATE NOT NULL DEFAULT '1970-01-01', @@ -682,6 +682,20 @@ PARTITION BY KEY() PARTITIONS 4; ``` +If the table lacks a primary key but contains a unique key, the unique key is used as the partitioning key: + +```sql +CREATE TABLE k1 ( + id INT NOT NULL, + name VARCHAR(20), + UNIQUE KEY (id) +) +PARTITION BY KEY() +PARTITIONS 2; +``` + +However, the previous statement will fail if the unique key column is not defined as `NOT NULL`. + #### How TiDB handles Linear Hash partitions Before v6.4.0, if you execute DDL statements of [MySQL Linear Hash](https://dev.mysql.com/doc/refman/8.0/en/partitioning-linear-hash.html) partitions in TiDB, TiDB can only create non-partitioned tables. In this case, if you still want to use partitioned tables in TiDB, you need to modify the DDL statements. @@ -1682,8 +1696,6 @@ YEARWEEK() Currently, TiDB supports Range partitioning, Range COLUMNS partitioning, List partitioning, List COLUMNS partitioning, Hash partitioning, and Key partitioning. Other partitioning types that are available in MySQL are not supported yet in TiDB. -Currently, TiDB does not support using an empty partition column list for Key partitioning. - With regard to partition management, any operation that requires moving data in the bottom implementation is not supported currently, including but not limited to: adjust the number of partitions in a Hash partitioned table, modify the Range of a Range partitioned table, and merge partitions. For the unsupported partitioning types, when you create a table in TiDB, the partitioning information is ignored and the table is created in the regular form with a warning reported. @@ -1987,7 +1999,7 @@ mysql> explain select /*+ TIDB_INLJ(t1, t2) */ t1.* from t1, t2 where t2.code = From example 2, you can see that in `dynamic` mode, the execution plan with IndexJoin is selected when you execute the query. -Currently, neither `static` nor `dynamic` pruning mode supports prepared statements plan cache. +Currently, `static` pruning mode does not support plan cache for both prepared and non-prepared statements. #### Update statistics of partitioned tables in dynamic pruning mode diff --git a/pd-microservices.md b/pd-microservices.md new file mode 100644 index 0000000000000..4b7800223a14a --- /dev/null +++ b/pd-microservices.md @@ -0,0 +1,81 @@ +--- +title: PD Microservices +summary: Learn how to enable the microservice mode of PD to improve service quality. +--- + +# PD Microservices + +Starting from v8.0.0, PD supports the microservice mode, which splits the timestamp allocation and cluster scheduling functions of PD into the following two independently deployed microservices. In this way, these two functions are decoupled from the routing function of PD, which allows PD to focus on the routing service for metadata. + +- `tso` microservice: provides monotonically increasing timestamp allocation for the entire cluster. +- `scheduling` microservice: provides scheduling functions for the entire cluster, including but not limited to load balancing, hot spot handling, replica repair, and replica placement. + +Each microservice is deployed as an independent process. If you configure more than one replica for a microservice, the microservice automatically implements a primary-secondary fault-tolerant mode to ensure high availability and reliability of the service. + +> **Warning:** +> +> Currently, the PD microservices feature is experimental. It is not recommended that you use it in production environments. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/tikv/pd/issues) on GitHub. + +## Usage scenarios + +PD microservices are typically used to address performance bottlenecks in PD and improve PD service quality. With this feature, you can avoid the following issues: + +- Long-tail latency or jitter in TSO allocations due to excessive pressure in PD clusters +- Service unavailability of the entire cluster due to failures in the scheduling module +- Bottleneck issues solely caused by PD + +In addition, when the scheduling module is changed, you can update the `scheduling` microservice independently without restarting PD, thus avoiding any impact on the overall service of the cluster. + +> **Note:** +> +> If the performance bottleneck of a cluster is not caused by PD, there is no need to enable microservices, because using microservices increases the number of components and raises operational costs. + +## Restrictions + +- Currently, the `tso` microservice does not support dynamic start and stop. After enabling or disabling the `tso` microservice, you need to restart the PD cluster for the changes to take effect. +- Only the TiDB component supports a direct connection to the `tso` microservice through service discovery, while other components need to forward requests to the `tso` microservice through PD to obtain timestamps. +- Microservices are not compatible with the [Data Replication Auto Synchronous (DR Auto-Sync)](/two-data-centers-in-one-city-deployment.md) feature. +- Microservices are not compatible with the TiDB system variable [`tidb_enable_tso_follower_proxy`](/system-variables.md#tidb_enable_tso_follower_proxy-new-in-v530). +- Because [hibernate Regions](/tikv-configuration-file.md#hibernate-regions) might exist in a cluster, during a primary and secondary switchover of the `scheduling` microservice, the scheduling function of the cluster might be unavailable for a certain period (up to [`peer-stale-state-check-interval`](/tikv-configuration-file.md#peer-stale-state-check-interval), which is five minutes by default) to avoid redundant scheduling. + +## Usage + +Currently, PD microservices can be deployed using TiDB Operator or TiUP Playground. + + +
+ +For detailed information on using TiDB Operator, see the following documents: + +- [Deploy PD microservices](https://docs.pingcap.com/tidb-in-kubernetes/dev/configure-a-tidb-cluster#enable-pd-microservices) +- [Configure PD microservices](https://docs.pingcap.com/tidb-in-kubernetes/dev/configure-a-tidb-cluster#configure-pd-microservices) +- [Modify PD microservices](https://docs.pingcap.com/tidb-in-kubernetes/dev/modify-tidb-configuration#modify-pd-microservice-configuration) +- [Scale PD microservice components](https://docs.pingcap.com/tidb-in-kubernetes/dev/scale-a-tidb-cluster#scale-pd-microservice-components) + +
+
+ +For detailed information on using TiUP Playground, see the following document: + +- [Deploy PD microservices](/tiup/tiup-playground.md#deploy-pd-microservices) + +
+
+ +When deploying and using PD microservices, pay attention to the following: + +- After you enable microservices and restart PD for a cluster, PD stops allocating TSO for the cluster. Therefore, you need to deploy the `tso` microservice in the cluster when you enable microservices. +- If the `scheduling` microservice is deployed in a cluster, the scheduling function of the cluster is provided by the `scheduling` microservice. If the `scheduling` microservice is not deployed, the scheduling function of the cluster is still provided by PD. +- The `scheduling` microservice supports dynamic switching, which is enabled by default (`enable-scheduling-fallback` defaults to `true`). If the process of the `scheduling` microservice is terminated, PD continues to provide scheduling services for the cluster by default. + + If the binary versions of the `scheduling` microservice and PD are different, to prevent changes in the scheduling logic, you can disable the dynamic switching function of the `scheduling` microservice by executing `pd-ctl config set enable-scheduling-fallback false`. After this function is disabled, PD will not take over the scheduling service when the process of the `scheduling` microservice is terminated. This means that the scheduling service of the cluster will be unavailable until the `scheduling` microservice is restarted. + +## Tool compatibility + +Microservices do not affect the normal use of data import, export, and other replication tools. + +## FAQs + +- How can I determine if PD becomes a performance bottleneck? + + When your cluster is in a normal state, you can check monitoring metrics in the Grafana PD panel. If the `TiDB - PD server TSO handle time` metric shows a notable increase in latency or the `Heartbeat - TiKV side heartbeat statistics` metric shows a significant number of pending items, it indicates that PD becomes a performance bottleneck. \ No newline at end of file diff --git a/placement-rules-in-sql.md b/placement-rules-in-sql.md index 2e35f8828ffc2..75d234c5c9c01 100644 --- a/placement-rules-in-sql.md +++ b/placement-rules-in-sql.md @@ -232,7 +232,7 @@ You can configure `CONSTRAINTS`, `FOLLOWER_CONSTRAINTS`, and `LEARNER_CONSTRAINT | CONSTRAINTS format | Description | |----------------------------|-----------------------------------------------------------------------------------------------------------| | List format | If a constraint to be specified applies to all replicas, you can use a key-value list format. Each key starts with `+` or `-`. For example:
  • `[+region=us-east-1]` means placing data on nodes that have a `region` label as `us-east-1`.
  • `[+region=us-east-1,-type=fault]` means placing data on nodes that have a `region` label as `us-east-1` but do not have a `type` label as `fault`.

| -| Dictionary format | If you need to specify different numbers of replicas for different constraints, you can use the dictionary format. For example:
  • `FOLLOWER_CONSTRAINTS="{+region=us-east-1: 1,+region=us-east-2: 1,+region=us-west-1: 1}";` means placing one Follower in `us-east-1`, one Follower in `us-east-2`, and one Follower in `us-west-1`.
  • `FOLLOWER_CONSTRAINTS='{"+region=us-east-1,+type=scale-node": 1,"+region=us-west-1": 1}';` means placing one Follower on a node that is located in the `us-east-1` region and has the `type` label as `scale-node`, and one Follower in `us-west-1`.
The dictionary format supports each key starting with `+` or `-` and allows you to configure the special `#reject-leader` attribute. For example, `FOLLOWER_CONSTRAINTS='{"+region=us-east-1":1, "+region=us-east-2": 2, "+region=us-west-1,#reject-leader": 1}'` means that the Leaders elected in `us-west-1` will be evicted as much as possible during disaster recovery.| +| Dictionary format | If you need to specify different numbers of replicas for different constraints, you can use the dictionary format. For example:
  • `FOLLOWER_CONSTRAINTS="{+region=us-east-1: 1,+region=us-east-2: 1,+region=us-west-1: 1}";` means placing one Follower in `us-east-1`, one Follower in `us-east-2`, and one Follower in `us-west-1`.
  • `FOLLOWER_CONSTRAINTS='{"+region=us-east-1,+type=scale-node": 1,"+region=us-west-1": 1}';` means placing one Follower on a node that is located in the `us-east-1` region and has the `type` label as `scale-node`, and one Follower in `us-west-1`.
The dictionary format supports each key starting with `+` or `-` and allows you to configure the special `#evict-leader` attribute. For example, `FOLLOWER_CONSTRAINTS='{"+region=us-east-1":1, "+region=us-east-2": 2, "+region=us-west-1,#evict-leader": 1}'` means that the Leaders elected in `us-west-1` will be evicted as much as possible during disaster recovery.| > **Note:** > @@ -413,10 +413,10 @@ CREATE PLACEMENT POLICY deploy221_primary_east1 LEADER_CONSTRAINTS="[+region=us- After this placement policy is created and attached to the desired data, the Raft Leader replicas of the data will be placed in the `us-east-1` region specified by the `LEADER_CONSTRAINTS` option, while other replicas of the data will be placed in regions specified by the `FOLLOWER_CONSTRAINTS` option. Note that if the cluster fails, such as a node outage in the `us-east-1` region, a new Leader will still be elected from other regions, even if these regions are specified in `FOLLOWER_CONSTRAINTS`. In other words, ensuring service availability takes the highest priority. -In the event of a failure in the `us-east-1` region, if you do not want to place new Leaders in `us-west-1`, you can configure a special `reject-leader` attribute to evict the newly elected Leaders in that region: +In the event of a failure in the `us-east-1` region, if you do not want to place new Leaders in `us-west-1`, you can configure a special `evict-leader` attribute to evict the newly elected Leaders in that region: ```sql -CREATE PLACEMENT POLICY deploy221_primary_east1 LEADER_CONSTRAINTS="[+region=us-east-1]" FOLLOWER_CONSTRAINTS='{"+region=us-east-1": 1, "+region=us-east-2": 2, "+region=us-west-1,#reject-leader": 1}'; +CREATE PLACEMENT POLICY deploy221_primary_east1 LEADER_CONSTRAINTS="[+region=us-east-1]" FOLLOWER_CONSTRAINTS='{"+region=us-east-1": 1, "+region=us-east-2": 2, "+region=us-west-1,#evict-leader": 1}'; ``` #### Use `PRIMARY_REGION` diff --git a/releases/release-5.2.4.md b/releases/release-5.2.4.md index 039dd7944d8cb..4064c23c9a7fe 100644 --- a/releases/release-5.2.4.md +++ b/releases/release-5.2.4.md @@ -88,7 +88,7 @@ TiDB version: 5.2.4 - Fix the issue that the system variable `max_allowed_packet` does not take effect [#31422](https://github.com/pingcap/tidb/issues/31422) - Fix the issue that the `REPLACE` statement incorrectly changes other rows when the auto ID is out of range [#29483](https://github.com/pingcap/tidb/issues/29483) - Fix the issue that the slow query log cannot output log normally and might consume too much memory [#32656](https://github.com/pingcap/tidb/issues/32656) - - Fix the issue that the result of NATURAL JOIN might include unexpected columns [#24981](https://github.com/pingcap/tidb/issues/29481) + - Fix the issue that the result of NATURAL JOIN might include unexpected columns [#29481](https://github.com/pingcap/tidb/issues/29481) - Fix the issue that using `ORDER BY` and `LIMIT` together in one statement might output wrong results if a prefix-column index is used to query data [#29711](https://github.com/pingcap/tidb/issues/29711) - Fix the issue that the DOUBLE type auto-increment column might be changed when the optimistic transaction retries [#29892](https://github.com/pingcap/tidb/issues/29892) - Fix the issue that the STR_TO_DATE function cannot handle the preceding zero of the microsecond part correctly [#30078](https://github.com/pingcap/tidb/issues/30078) @@ -165,7 +165,7 @@ TiDB version: 5.2.4 + TiCDC - Fix the issue that default values cannot be replicated [#3793](https://github.com/pingcap/tiflow/issues/3793) - - Fix a bug that sequence is incorrectly replicated in some cases [#4563](https://github.com/pingcap/tiflow/issues/4552) + - Fix a bug that sequence is incorrectly replicated in some cases [#4552](https://github.com/pingcap/tiflow/issues/4552) - Fix a bug that a TiCDC node exits abnormally when a PD leader is killed [#4248](https://github.com/pingcap/tiflow/issues/4248) - Fix a bug that MySQL sink generates duplicated `replace` SQL statements when `batch-replace-enable` is disabled [#4501](https://github.com/pingcap/tiflow/issues/4501) - Fix the issue of panic and data inconsistency that occurs when outputting the default column value [#3929](https://github.com/pingcap/tiflow/issues/3929) diff --git a/releases/release-5.3.1.md b/releases/release-5.3.1.md index 09b3c1cf7b342..8f1a1da8e4974 100644 --- a/releases/release-5.3.1.md +++ b/releases/release-5.3.1.md @@ -20,7 +20,7 @@ TiDB version: 5.3.1 - TiDB - - Optimize the mapping logic of user login mode to make the logging more MySQL-compatible [#30450](https://github.com/pingcap/tidb/issues/32648) + - Optimize the mapping logic of user login mode to make the logging more MySQL-compatible [#32648](https://github.com/pingcap/tidb/issues/32648) - TiKV diff --git a/releases/release-5.3.2.md b/releases/release-5.3.2.md index bc1d7ce8aa835..2cfe4d3f85c8c 100644 --- a/releases/release-5.3.2.md +++ b/releases/release-5.3.2.md @@ -145,7 +145,7 @@ TiDB version: 5.3.2 - Fix the issue that TiCDC fails to start when the first PD set in `--pd` is not available after TLS is enabled [#4777](https://github.com/pingcap/tiflow/issues/4777) - Fix a bug that querying status through open API may be blocked when the PD node is abnormal [#4778](https://github.com/pingcap/tiflow/issues/4778) - Fix a stability problem in workerpool used by Unified Sorter [#4447](https://github.com/pingcap/tiflow/issues/4447) - - Fix a bug that sequence is incorrectly replicated in some cases [#4563](https://github.com/pingcap/tiflow/issues/4552) + - Fix a bug that sequence is incorrectly replicated in some cases [#4552](https://github.com/pingcap/tiflow/issues/4552) + TiDB Data Migration (DM) diff --git a/releases/release-5.4.1.md b/releases/release-5.4.1.md index 259be6362a7e5..6d14a10ccc17e 100644 --- a/releases/release-5.4.1.md +++ b/releases/release-5.4.1.md @@ -145,7 +145,7 @@ TiDB v5.4.1 does not introduce any compatibility changes in product design. But - Fix incorrect metrics caused by owner changes [#4774](https://github.com/pingcap/tiflow/issues/4774) - Fix the TiCDC panic issue that might occur because `Canal-JSON` does not support nil [#4736](https://github.com/pingcap/tiflow/issues/4736) - Fix a stability problem in workerpool used by Unified Sorter [#4447](https://github.com/pingcap/tiflow/issues/4447) - - Fix a bug that sequence is incorrectly replicated in some cases [#4563](https://github.com/pingcap/tiflow/issues/4552) + - Fix a bug that sequence is incorrectly replicated in some cases [#4552](https://github.com/pingcap/tiflow/issues/4552) - Fix the TiCDC panic issue that might occur when `Canal-JSON` incorrectly handles `string` [#4635](https://github.com/pingcap/tiflow/issues/4635) - Fix a bug that a TiCDC node exits abnormally when a PD leader is killed [#4248](https://github.com/pingcap/tiflow/issues/4248) - Fix a bug that MySQL sink generates duplicated `replace` SQL statements when `batch-replace-enable` is disabled [#4501](https://github.com/pingcap/tiflow/issues/4501) diff --git a/releases/release-5.4.3.md b/releases/release-5.4.3.md index e71ac559742f5..01dd37d348833 100644 --- a/releases/release-5.4.3.md +++ b/releases/release-5.4.3.md @@ -75,7 +75,7 @@ TiDB version: 5.4.3 + TiDB Lightning - - Fix the issue that an auto-increment column of the `BIGINT` type might be out of range [#27397](https://github.com/pingcap/tidb/issues/27937) + - Fix the issue that an auto-increment column of the `BIGINT` type might be out of range [#27937](https://github.com/pingcap/tidb/issues/27937) - Fix the issue that de-duplication might cause TiDB Lightning to panic in extreme cases [#34163](https://github.com/pingcap/tidb/issues/34163) - Fix the issue that TiDB Lightning does not support columns starting with slash, number, or non-ascii characters in Parquet files [#36980](https://github.com/pingcap/tidb/issues/36980) - Fix the issue that TiDB Lightning fails to connect to TiDB when TiDB uses an IPv6 host [#35880](https://github.com/pingcap/tidb/issues/35880) diff --git a/releases/release-6.0.0-dmr.md b/releases/release-6.0.0-dmr.md index 73198ae4f28a6..826bed2523789 100644 --- a/releases/release-6.0.0-dmr.md +++ b/releases/release-6.0.0-dmr.md @@ -764,7 +764,7 @@ TiDB v6.0.0 is a DMR, and its version is 6.0.0-DMR. - Fix a bug that a TiCDC node exits abnormally when a PD leader is killed [#4248](https://github.com/pingcap/tiflow/issues/4248) - Fix the error `Unknown system variable 'transaction_isolation'` for some MySQL versions [#4504](https://github.com/pingcap/tiflow/issues/4504) - Fix the TiCDC panic issue that might occur when `Canal-JSON` incorrectly handles `string` [#4635](https://github.com/pingcap/tiflow/issues/4635) - - Fix a bug that sequence is incorrectly replicated in some cases [#4563](https://github.com/pingcap/tiflow/issues/4552) + - Fix a bug that sequence is incorrectly replicated in some cases [#4552](https://github.com/pingcap/tiflow/issues/4552) - Fix the TiCDC panic issue that might occur because `Canal-JSON` does not support nil [#4736](https://github.com/pingcap/tiflow/issues/4736) - Fix the wrong data mapping for avro codec of type `Enum/Set` and `TinyText/MediumText/Text/LongText` [#4454](https://github.com/pingcap/tiflow/issues/4454) - Fix a bug that Avro converts a `NOT NULL` column to a nullable field [#4818](https://github.com/pingcap/tiflow/issues/4818) diff --git a/releases/release-6.1.0.md b/releases/release-6.1.0.md index 0ff55dd92562f..7f58ad1021752 100644 --- a/releases/release-6.1.0.md +++ b/releases/release-6.1.0.md @@ -418,7 +418,7 @@ In 6.1.0, the key new features or improvements are as follows: + TiDB Data Migration (DM) - - Fix the `start-time` time zone issue and change DM behavior from using the downstream time zone to using the upstream time zone [#5271](https://github.com/pingcap/tiflow/issues/5471) + - Fix the `start-time` time zone issue and change DM behavior from using the downstream time zone to using the upstream time zone [#5471](https://github.com/pingcap/tiflow/issues/5471) - Fix the issue that DM occupies more disk space after the task automatically resumes [#3734](https://github.com/pingcap/tiflow/issues/3734) [#5344](https://github.com/pingcap/tiflow/issues/5344) - Fix the problem that checkpoint flush may cause the data of failed rows to be skipped [#5279](https://github.com/pingcap/tiflow/issues/5279) - Fix the issue that in some cases manually executing the filtered DDL in the downstream might cause task resumption failure [#5272](https://github.com/pingcap/tiflow/issues/5272) @@ -434,5 +434,5 @@ In 6.1.0, the key new features or improvements are as follows: - Fix the issue that the precheck does not check local disk resources and cluster availability [#34213](https://github.com/pingcap/tidb/issues/34213) - Fix the issue of incorrect routing for schemas [#33381](https://github.com/pingcap/tidb/issues/33381) - Fix the issue that the PD configuration is not restored correctly when TiDB Lightning panics [#31733](https://github.com/pingcap/tidb/issues/31733) - - Fix the issue of Local-backend import failure caused by out-of-bounds data in the `auto_increment` column [#29737](https://github.com/pingcap/tidb/issues/27937) + - Fix the issue of Local-backend import failure caused by out-of-bounds data in the `auto_increment` column [#27937](https://github.com/pingcap/tidb/issues/27937) - Fix the issue of local backend import failure when the `auto_random` or `auto_increment` column is null [#34208](https://github.com/pingcap/tidb/issues/34208) diff --git a/releases/release-6.2.0.md b/releases/release-6.2.0.md index 73893616d5c4a..37ca32dd9c3f4 100644 --- a/releases/release-6.2.0.md +++ b/releases/release-6.2.0.md @@ -328,7 +328,7 @@ Since TiDB v6.2.0, backing up and restoring RawKV using BR is deprecated. - Support the `SHOW COUNT(*) WARNINGS` and `SHOW COUNT(*) ERRORS` statements [#25068](https://github.com/pingcap/tidb/issues/25068) @[likzn](https://github.com/likzn) - Add validation check for some system variables [#35048](https://github.com/pingcap/tidb/issues/35048) @[morgo](https://github.com/morgo) - - Optimize the error messages for some type conversions [#32447](https://github.com/pingcap/tidb/issues/32744) @[fanrenhoo](https://github.com/fanrenhoo) + - Optimize the error messages for some type conversions [#32744](https://github.com/pingcap/tidb/issues/32744) @[fanrenhoo](https://github.com/fanrenhoo) - The `KILL` command now supports DDL operations [#24144](https://github.com/pingcap/tidb/issues/24144) @[morgo](https://github.com/morgo) - Make the output of `SHOW TABLES/DATABASES LIKE …` more MySQL-compatible. The column names in the output contain the `LIKE` value [#35116](https://github.com/pingcap/tidb/issues/35116) @[likzn](https://github.com/likzn) - Improve the performance of JSON-related functions [#35859](https://github.com/pingcap/tidb/issues/35859) @[wjhuang2016](https://github.com/wjhuang2016) diff --git a/releases/release-6.5.2.md b/releases/release-6.5.2.md index 53a6eded2e74a..9bab3dacafb18 100644 --- a/releases/release-6.5.2.md +++ b/releases/release-6.5.2.md @@ -65,7 +65,7 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v6.5/quick-start-with- - Fix the issue that full index scans might cause errors when prepared plan cache is enabled [#42150](https://github.com/pingcap/tidb/issues/42150) @[fzzf678](https://github.com/fzzf678) - Fix the issue that IndexMerge might produce incorrect results when prepare plan cache is enabled [#41828](https://github.com/pingcap/tidb/issues/41828) @[qw4990](https://github.com/qw4990) - Fix the issue that the configuration of `max_prepared_stmt_count` does not take effect [#39735](https://github.com/pingcap/tidb/issues/39735) @[xuyifangreeneyes](https://github.com/xuyifangreeneyes) - - Fix the issue that IndexMerge might produce incorrect results when prepare plan cache is enabled [#41828](https://github.com/pingcap/tidb/issues/41828) @[qw4990](https://github.com/qw4990) @[XuHuaiyu](https://github.com/XuHuaiyu) + - Fix the issue that global memory control might incorrectly kill SQL statements with memory usage less than `tidb_server_memory_limit_sess_min_size` [#42662](https://github.com/pingcap/tidb/issues/42662) @[XuHuaiyu](https://github.com/XuHuaiyu) - Fix the issue that Index Join might cause panic in dynamic trimming mode of partition tables [#40596](https://github.com/pingcap/tidb/issues/40596) @[tiancaiamao](https://github.com/tiancaiamao) + TiKV diff --git a/releases/release-6.5.5.md b/releases/release-6.5.5.md index 35a7bef961e7e..16246a2af6c76 100644 --- a/releases/release-6.5.5.md +++ b/releases/release-6.5.5.md @@ -56,7 +56,7 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v6.5/quick-start-with- - Fix the issue that restoring implicit primary keys by PITR might lead to conflicts [#46520](https://github.com/pingcap/tidb/issues/46520) @[3pointer](https://github.com/3pointer) - Fix the issue that an error occurs when PITR recovers the meta-kv [#46578](https://github.com/pingcap/tidb/issues/46578) @[Leavrth](https://github.com/Leavrth) - - Fix an error in BR integration test cases [#45561](https://github.com/pingcap/tidb/issues/46561) @[purelind](https://github.com/purelind) + - Fix an error in BR integration test cases [#46561](https://github.com/pingcap/tidb/issues/46561) @[purelind](https://github.com/purelind) - Fix the issue that PITR fails to restore data from GCS [#47022](https://github.com/pingcap/tidb/issues/47022) @[Leavrth](https://github.com/Leavrth) + TiCDC diff --git a/releases/release-6.5.6.md b/releases/release-6.5.6.md index 560445846dc73..12913b3af00ed 100644 --- a/releases/release-6.5.6.md +++ b/releases/release-6.5.6.md @@ -141,7 +141,7 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v6.5/quick-start-with- - Fix the issue that the log backup might get stuck in some scenarios when backing up large wide tables [#15714](https://github.com/tikv/tikv/issues/15714) @[YuJuncen](https://github.com/YuJuncen) - Fix the issue that frequent flushes cause log backup to get stuck [#15602](https://github.com/tikv/tikv/issues/15602) @[3pointer](https://github.com/3pointer) - - Fix the issue that the retry after an EC2 metadata connection reset cause degraded backup and restore performance [#46750](https://github.com/pingcap/tidb/issues/47650) @[Leavrth](https://github.com/Leavrth) + - Fix the issue that the retry after an EC2 metadata connection reset cause degraded backup and restore performance [#47650](https://github.com/pingcap/tidb/issues/47650) @[Leavrth](https://github.com/Leavrth) - Fix the issue that running PITR multiple times within 1 minute might cause data loss [#15483](https://github.com/tikv/tikv/issues/15483) @[YuJuncen](https://github.com/YuJuncen) - Fix the issue that the default values for BR SQL commands and CLI are different, which might cause OOM issues [#48000](https://github.com/pingcap/tidb/issues/48000) @[YuJuncen](https://github.com/YuJuncen) - Fix the issue that log backup might panic when the PD owner is transferred [#47533](https://github.com/pingcap/tidb/issues/47533) @[YuJuncen](https://github.com/YuJuncen) diff --git a/releases/release-6.5.8.md b/releases/release-6.5.8.md index 9d9b74cb4557e..4724e94b804a1 100644 --- a/releases/release-6.5.8.md +++ b/releases/release-6.5.8.md @@ -39,7 +39,7 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v6.5/quick-start-with- - Fix the issue that histogram statistics might not be parsed into readable strings when the histogram boundary contains `NULL` [#49823](https://github.com/pingcap/tidb/issues/49823) @[AilinKid](https://github.com/AilinKid) - Fix the issue that hints cannot be used in `REPLACE INTO` statements [#34325](https://github.com/pingcap/tidb/issues/34325) @[YangKeao](https://github.com/YangKeao) - Fix the issue that query results are incorrect due to `STREAM_AGG()` incorrectly handling CI [#49902](https://github.com/pingcap/tidb/issues/49902) @[wshwsh12](https://github.com/wshwsh12) - - Fix the issue that the query result of a range partitioned table is incorrect in some cases due to wrong partition pruning [#50082](https://github.com/pingcap/tidb/issues/49823) @[Defined2014](https://github.com/Defined2014) + - Fix the issue that the query result of a range partitioned table is incorrect in some cases due to wrong partition pruning [#50082](https://github.com/pingcap/tidb/issues/50082) @[Defined2014](https://github.com/Defined2014) - Fix the issue that the auto-increment ID allocation reports an error due to concurrent conflicts when using an auto-increment column with `AUTO_ID_CACHE=1` [#50519](https://github.com/pingcap/tidb/issues/50519) @[tiancaiamao](https://github.com/tiancaiamao) - Mitigate the issue that TiDB nodes might encounter OOM errors when dealing with a large number of tables or partitions [#50077](https://github.com/pingcap/tidb/issues/50077) @[zimulala](https://github.com/zimulala) - Fix the issue that data is inconsistent under the TiDB Distributed eXecution Framework (DXF) when executing `ADD INDEX` after the DDL Owner is network isolated [#49773](https://github.com/pingcap/tidb/issues/49773) @[tangenta](https://github.com/tangenta) diff --git a/releases/release-6.6.0.md b/releases/release-6.6.0.md index 8a48bd5586cb8..3791470727703 100644 --- a/releases/release-6.6.0.md +++ b/releases/release-6.6.0.md @@ -168,7 +168,7 @@ In v6.6.0-DMR, the key new features and improvements are as follows: For more information, see [documentation](/placement-rules-in-sql.md#specify-survival-preferences). -* Support rolling back DDL operations via the `FLASHBACK CLUSTER TO TIMESTAMP` statement [#14088](https://github.com/tikv/tikv/pull/14088) @[Defined2014](https://github.com/Defined2014) @[JmPotato](https://github.com/JmPotato) +* Support rolling back DDL operations via the `FLASHBACK CLUSTER TO TIMESTAMP` statement [#14045](https://github.com/tikv/tikv/issues/14045) @[Defined2014](https://github.com/Defined2014) @[JmPotato](https://github.com/JmPotato) The [`FLASHBACK CLUSTER TO TIMESTAMP`](/sql-statements/sql-statement-flashback-cluster.md) statement supports restoring the entire cluster to a specified point in time within the Garbage Collection (GC) lifetime. In TiDB v6.6.0, this feature adds support for rolling back DDL operations. This can be used to quickly undo a DML or DDL misoperation on a cluster, roll back a cluster within minutes, and roll back a cluster multiple times on the timeline to determine when specific data changes occurred. @@ -318,7 +318,7 @@ In v6.6.0-DMR, the key new features and improvements are as follows: For more information, see [documentation](/enable-tls-between-components.md). -* TiDB Lightning supports accessing Amazon S3 data via AWS IAM role keys and session tokens [#4075](https://github.com/pingcap/tidb/issues/40750) @[okJiang](https://github.com/okJiang) +* TiDB Lightning supports accessing Amazon S3 data via AWS IAM role keys and session tokens [#40750](https://github.com/pingcap/tidb/issues/40750) @[okJiang](https://github.com/okJiang) Before v6.6.0, TiDB Lightning only supports accessing S3 data via AWS IAM **user's access keys** (each access key consists of an access key ID and a secret access key) so you cannot use a temporary session token to access S3 data. Starting from v6.6.0, TiDB Lightning supports accessing S3 data via AWS IAM **role's access keys + session tokens** as well to improve the data security. diff --git a/releases/release-7.1.2.md b/releases/release-7.1.2.md index 755ddda1d6e0d..02d3ba937f0f8 100644 --- a/releases/release-7.1.2.md +++ b/releases/release-7.1.2.md @@ -104,7 +104,7 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v7.1/quick-start-with- - Fix the issue that inserting data into a partitioned table might fail after exchanging partitions between the partition table and a table with placement policies [#45791](https://github.com/pingcap/tidb/issues/45791) @[mjonss](https://github.com/mjonss) - Fix the issue of encoding time fields with incorrect timezone information [#46033](https://github.com/pingcap/tidb/issues/46033) @[tangenta](https://github.com/tangenta) - Fix the issue that DDL statements that fast add indexes would get stuck when the `tmp` directory does not exist [#45456](https://github.com/pingcap/tidb/issues/45456) @[tangenta](https://github.com/tangenta) - - Fix the issue that upgrading multiple TiDB instances simultaneously might block the upgrade process [#46288](https://github.com/pingcap/tidb/issues/46228) @[zimulala](https://github.com/zimulala) + - Fix the issue that upgrading multiple TiDB instances simultaneously might block the upgrade process [#46228](https://github.com/pingcap/tidb/issues/46228) @[zimulala](https://github.com/zimulala) - Fix the issue of uneven Region scattering caused by incorrect parameters used in splitting Regions [#46135](https://github.com/pingcap/tidb/issues/46135) @[zimulala](https://github.com/zimulala) - Fix the issue that DDL operations might get stuck after TiDB is restarted [#46751](https://github.com/pingcap/tidb/issues/46751) @[wjhuang2016](https://github.com/wjhuang2016) - Prohibit split table operations on non-integer clustered indexes [#47350](https://github.com/pingcap/tidb/issues/47350) @[tangenta](https://github.com/tangenta) @@ -165,7 +165,7 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v7.1/quick-start-with- - Fix the issue that PITR fails to recover data from GCS [#47022](https://github.com/pingcap/tidb/issues/47022) @[Leavrth](https://github.com/Leavrth) - Fix the potential error in fine-grained backup phase in RawKV mode [#37085](https://github.com/pingcap/tidb/issues/37085) @[pingyu](https://github.com/pingyu) - Fix the issue that recovering meta-kv using PITR might cause errors [#46578](https://github.com/pingcap/tidb/issues/46578) @[Leavrth](https://github.com/Leavrth) - - Fix the errors in BR integration test cases [#45561](https://github.com/pingcap/tidb/issues/46561) @[purelind](https://github.com/purelind) + - Fix the errors in BR integration test cases [#46561](https://github.com/pingcap/tidb/issues/46561) @[purelind](https://github.com/purelind) - Fix the issue of restore failures by increasing the default values of the global parameters `TableColumnCountLimit` and `IndexLimit` used by BR to their maximum values [#45793](https://github.com/pingcap/tidb/issues/45793) @[Leavrth](https://github.com/Leavrth) - Fix the issue that the br CLI client gets stuck when scanning restored data [#45476](https://github.com/pingcap/tidb/issues/45476) @[3pointer](https://github.com/3pointer) - Fix the issue that PITR might skip restoring the `CREATE INDEX` DDL statement [#47482](https://github.com/pingcap/tidb/issues/47482) @[Leavrth](https://github.com/Leavrth) diff --git a/releases/release-7.1.3.md b/releases/release-7.1.3.md index 9f89eb7361901..e76716decfff2 100644 --- a/releases/release-7.1.3.md +++ b/releases/release-7.1.3.md @@ -135,7 +135,7 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v7.1/quick-start-with- - Fix the issue that the default values for BR SQL commands and CLI are different, which might cause OOM issues [#48000](https://github.com/pingcap/tidb/issues/48000) @[YuJuncen](https://github.com/YuJuncen) - Fix the issue that the log backup might get stuck in some scenarios when backing up large wide tables [#15714](https://github.com/tikv/tikv/issues/15714) @[YuJuncen](https://github.com/YuJuncen) - Fix the issue that BR generates incorrect URIs for external storage files [#48452](https://github.com/pingcap/tidb/issues/48452) @[3AceShowHand](https://github.com/3AceShowHand) - - Fix the issue that the retry after an EC2 metadata connection reset causes degraded backup and restore performance [#46750](https://github.com/pingcap/tidb/issues/47650) @[Leavrth](https://github.com/Leavrth) + - Fix the issue that the retry after an EC2 metadata connection reset causes degraded backup and restore performance [#47650](https://github.com/pingcap/tidb/issues/47650) @[Leavrth](https://github.com/Leavrth) - Fix the issue that the log backup task can start but does not work properly if failing to connect to PD during task initialization [#16056](https://github.com/tikv/tikv/issues/16056) @[YuJuncen](https://github.com/YuJuncen) + TiCDC diff --git a/releases/release-7.1.4.md b/releases/release-7.1.4.md new file mode 100644 index 0000000000000..1189244d47691 --- /dev/null +++ b/releases/release-7.1.4.md @@ -0,0 +1,184 @@ +--- +title: TiDB 7.1.4 Release Notes +summary: Learn about the compatibility changes, improvements, and bug fixes in TiDB 7.1.4. +--- + +# TiDB 7.1.4 Release Notes + +Release date: March 11, 2024 + +TiDB version: 7.1.4 + +Quick access: [Quick start](https://docs.pingcap.com/tidb/v7.1/quick-start-with-tidb) | [Production deployment](https://docs.pingcap.com/tidb/v7.1/production-deployment-using-tiup) | [Installation packages](https://www.pingcap.com/download/?version=v7.1.4#version-list) + +## Compatibility changes + +- To reduce the overhead of log printing, TiFlash changes the default value of `logger.level` from `"debug"` to `"info"` [#8641](https://github.com/pingcap/tiflash/issues/8641) @[JaySon-Huang](https://github.com/JaySon-Huang) +- Introduce the TiKV configuration item [`gc.num-threads`](https://docs.pingcap.com/tidb/v6.5/tikv-configuration-file#num-threads-new-in-v658) to set the number of GC threads when `enable-compaction-filter` is `false` [#16101](https://github.com/tikv/tikv/issues/16101) @[tonyxuqqi](https://github.com/tonyxuqqi) + +## Improvements + ++ TiDB + + - Enhance the ability to convert `OUTER JOIN` to `INNER JOIN` in specific scenarios [#49616](https://github.com/pingcap/tidb/issues/49616) @[qw4990](https://github.com/qw4990) + - When `force-init-stats` is set to `true`, TiDB waits for statistics initialization to finish before providing services during TiDB startup. This setting no longer blocks the startup of HTTP servers, which enables users to continue monitoring [#50854](https://github.com/pingcap/tidb/issues/50854) @[hawkingrei](https://github.com/hawkingrei) + ++ TiKV + + - When TiKV detects the existence of corrupted SST files, it logs the specific reasons for the corruption [#16308](https://github.com/tikv/tikv/issues/16308) @[overvenus](https://github.com/overvenus) + ++ PD + + - Improve the speed of PD automatically updating cluster status when the backup cluster is disconnected [#6883](https://github.com/tikv/pd/issues/6883) @[disksing](https://github.com/disksing) + ++ TiFlash + + - Reduce the impact of background GC tasks on read and write task latency [#8650](https://github.com/pingcap/tiflash/issues/8650) @[JaySon-Huang](https://github.com/JaySon-Huang) + - Reduce the impact of disk performance jitter on read latency [#8583](https://github.com/pingcap/tiflash/issues/8583) @[JaySon-Huang](https://github.com/JaySon-Huang) + ++ Tools + + + Backup & Restore (BR) + + - Support creating databases in batch during data restore [#50767](https://github.com/pingcap/tidb/issues/50767) @[Leavrth](https://github.com/Leavrth) + - Improve the table creation performance of the `RESTORE` statement in scenarios with large datasets [#48301](https://github.com/pingcap/tidb/issues/48301) @[Leavrth](https://github.com/Leavrth) + - Improve the speed of merging SST files during data restore by using a more efficient algorithm [#50613](https://github.com/pingcap/tidb/issues/50613) @[Leavrth](https://github.com/Leavrth) + - Support ingesting SST files in batch during data restore [#16267](https://github.com/tikv/tikv/issues/16267) @[3pointer](https://github.com/3pointer) + - Print the information of the slowest Region that affects global checkpoint advancement in logs and metrics during log backups [#51046](https://github.com/pingcap/tidb/issues/51046) @[YuJuncen](https://github.com/YuJuncen) + - Remove an outdated compatibility check when using Google Cloud Storage (GCS) as the external storage [#50533](https://github.com/pingcap/tidb/issues/50533) @[lance6716](https://github.com/lance6716) + - Implement a lock mechanism to avoid executing multiple log backup truncation tasks (`br log truncate`) simultaneously [#49414](https://github.com/pingcap/tidb/issues/49414) @[YuJuncen](https://github.com/YuJuncen) + + + TiCDC + + - When the downstream is Kafka, the topic expression allows `schema` to be optional and supports specifying a topic name directly [#9763](https://github.com/pingcap/tiflow/issues/9763) @[3AceShowHand](https://github.com/3AceShowHand) + - Support [querying the downstream synchronization status of a changefeed](https://docs.pingcap.com/tidb/v7.1/ticdc-open-api-v2#query-whether-a-specific-replication-task-is-completed), which helps you determine whether the upstream data changes received by TiCDC have been synchronized to the downstream system completely [#10289](https://github.com/pingcap/tiflow/issues/10289) @[hongyunyan](https://github.com/hongyunyan) + - Support searching TiCDC logs in the TiDB Dashboard [#10263](https://github.com/pingcap/tiflow/issues/10263) @[CharlesCheung96](https://github.com/CharlesCheung96) + + + TiDB Lightning + + - Improve the performance in scenarios where multiple tables are imported by removing the lock operation when executing `ALTER TABLE` [#50105](https://github.com/pingcap/tidb/issues/50105) @[D3Hunter](https://github.com/D3Hunter) + +## Bug fixes + ++ TiDB + + - Fix the issue that the `DELETE` and `UPDATE` statements using index lookup might report an error when `tidb_multi_statement_mode` mode is enabled [#50012](https://github.com/pingcap/tidb/issues/50012) @[tangenta](https://github.com/tangenta) + - Fix the issue that CTE queries might report an error `type assertion for CTEStorageMap failed` during the retry process [#46522](https://github.com/pingcap/tidb/issues/46522) @[tiancaiamao](https://github.com/tiancaiamao) + - Fix the issue of excessive statistical error in constructing statistics caused by Golang's implicit conversion algorithm [#49801](https://github.com/pingcap/tidb/issues/49801) @[qw4990](https://github.com/qw4990) + - Fix the issue that errors might be returned during the concurrent merging of global statistics for partitioned tables [#48713](https://github.com/pingcap/tidb/issues/48713) @[hawkingrei](https://github.com/hawkingrei) + - Fix the issue of wrong query results due to TiDB incorrectly eliminating constant values in `group by` [#38756](https://github.com/pingcap/tidb/issues/38756) @[hi-rustin](https://github.com/hi-rustin) + - Fix the issue that `BIT` type columns might cause query errors due to decode failures when they are involved in calculations of some functions [#49566](https://github.com/pingcap/tidb/issues/49566) [#50850](https://github.com/pingcap/tidb/issues/50850) [#50855](https://github.com/pingcap/tidb/issues/50855) @[jiyfhust](https://github.com/jiyfhust) + - Fix the issue that `LIMIT` in multi-level nested `UNION` queries might become ineffective [#49874](https://github.com/pingcap/tidb/issues/49874) @[Defined2014](https://github.com/Defined2014) + - Fix the issue that the auto-increment ID allocation reports an error due to concurrent conflicts when using an auto-increment column with `AUTO_ID_CACHE=1` [#50519](https://github.com/pingcap/tidb/issues/50519) @[tiancaiamao](https://github.com/tiancaiamao) + - Fix the `Column ... in from clause is ambiguous` error that might occur when a query uses `NATURAL JOIN` [#32044](https://github.com/pingcap/tidb/issues/32044) @[AilinKid](https://github.com/AilinKid) + - Fix the issue that enforced sorting might become ineffective when a query uses optimizer hints (such as `STREAM_AGG()`) that enforce sorting and its execution plan contains `IndexMerge` [#49605](https://github.com/pingcap/tidb/issues/49605) @[AilinKid](https://github.com/AilinKid) + - Fix the issue that query results are incorrect due to `STREAM_AGG()` incorrectly handling CI [#49902](https://github.com/pingcap/tidb/issues/49902) @[wshwsh12](https://github.com/wshwsh12) + - Fix the goroutine leak issue that might occur when the `HashJoin` operator fails to spill to disk [#50841](https://github.com/pingcap/tidb/issues/50841) @[wshwsh12](https://github.com/wshwsh12) + - Fix the issue that hints cannot be used in `REPLACE INTO` statements [#34325](https://github.com/pingcap/tidb/issues/34325) @[YangKeao](https://github.com/YangKeao) + - Fix the issue that executing queries containing the `GROUP_CONCAT(ORDER BY)` syntax might return errors [#49986](https://github.com/pingcap/tidb/issues/49986) @[AilinKid](https://github.com/AilinKid) + - Fix the issue that using a multi-valued index to access an empty JSON array might return incorrect results [#50125](https://github.com/pingcap/tidb/issues/50125) @[YangKeao](https://github.com/YangKeao) + - Fix the goroutine leak issue that occurs when the memory usage of CTE queries exceeds limits [#50337](https://github.com/pingcap/tidb/issues/50337) @[guo-shaoge](https://github.com/guo-shaoge) + - Fix the issue that using old interfaces might cause inconsistent metadata for tables [#49751](https://github.com/pingcap/tidb/issues/49751) @[hawkingrei](https://github.com/hawkingrei) + - Fix the issue that executing `UNIQUE` index lookup with an `ORDER BY` clause might cause an error [#49920](https://github.com/pingcap/tidb/issues/49920) @[jackysp](https://github.com/jackysp) + - Fix the issue that common hints do not take effect in `UNION ALL` statements [#50068](https://github.com/pingcap/tidb/issues/50068) @[hawkingrei](https://github.com/hawkingrei) + - Fix the issue that a query containing the IndexHashJoin operator gets stuck when memory exceeds `tidb_mem_quota_query` [#49033](https://github.com/pingcap/tidb/issues/49033) @[XuHuaiyu](https://github.com/XuHuaiyu) + - Fix the issue that `UPDATE` or `DELETE` statements containing `WITH RECURSIVE` CTEs might produce incorrect results [#48969](https://github.com/pingcap/tidb/issues/48969) @[winoros](https://github.com/winoros) + - Fix the issue that histogram statistics might not be parsed into readable strings when the histogram boundary contains `NULL` [#49823](https://github.com/pingcap/tidb/issues/49823) @[AilinKid](https://github.com/AilinKid) + - Fix the issue that TiDB might panic when a query contains the Apply operator and the `fatal error: concurrent map writes` error occurs [#50347](https://github.com/pingcap/tidb/issues/50347) @[SeaRise](https://github.com/SeaRise) + - Fix the `Can't find column ...` error that might occur when aggregate functions are used for group calculations [#50926](https://github.com/pingcap/tidb/issues/50926) @[qw4990](https://github.com/qw4990) + - Fix the issue that TiDB returns wrong query results when processing `ENUM` or `SET` types by constant propagation [#49440](https://github.com/pingcap/tidb/issues/49440) @[winoros](https://github.com/winoros) + - Fix the issue that the completion times of two DDL tasks with dependencies are incorrectly sequenced [#49498](https://github.com/pingcap/tidb/issues/49498) @[tangenta](https://github.com/tangenta) + - Fix the issue that TiDB might panic when using the `EXECUTE` statement to execute `PREPARE STMT` after the `tidb_enable_prepared_plan_cache` system variable is enabled and then disabled [#49344](https://github.com/pingcap/tidb/issues/49344) @[qw4990](https://github.com/qw4990) + - Fix the issue that `LIMIT` and `OPRDERBY` might be invalid in nested `UNION` queries [#49377](https://github.com/pingcap/tidb/issues/49377) @[AilinKid](https://github.com/AilinKid) + - Fix the issue that the `LEADING` hint does not take effect in `UNION ALL` statements [#50067](https://github.com/pingcap/tidb/issues/50067) @[hawkingrei](https://github.com/hawkingrei) + - Fix the issue that the `COMMIT` or `ROLLBACK` operation executed through `COM_STMT_EXECUTE` fails to terminate transactions that have timed out [#49151](https://github.com/pingcap/tidb/issues/49151) @[zyguan](https://github.com/zyguan) + - Fix the issue that illegal optimizer hints might cause valid hints to be ineffective [#49308](https://github.com/pingcap/tidb/issues/49308) @[hawkingrei](https://github.com/hawkingrei) + - Fix the issue that Daylight Saving Time is displayed incorrectly in some time zones [#49586](https://github.com/pingcap/tidb/issues/49586) @[overvenus](https://github.com/overvenus) + - Fix the issue that executing `SELECT INTO OUTFILE` using the `PREPARE` method incorrectly returns a success message instead of an error [#49166](https://github.com/pingcap/tidb/issues/49166) @[qw4990](https://github.com/qw4990) + - Fix the issue that TiDB might panic when performing a rolling upgrade using `tiup cluster upgrade/start` due to an interaction issue with PD [#50152](https://github.com/pingcap/tidb/issues/50152) @[zimulala](https://github.com/zimulala) + - Fix the issue that the expected optimization does not take effect when adding an index to an empty table [#49682](https://github.com/pingcap/tidb/issues/49682) @[zimulala](https://github.com/zimulala) + - Fix the issue that TiDB might OOM when a large number of tables or partitions are created [#50077](https://github.com/pingcap/tidb/issues/50077) @[zimulala](https://github.com/zimulala) + - Fix the issue that adding an index might cause inconsistent index data when the network is unstable [#49773](https://github.com/pingcap/tidb/issues/49773) @[tangenta](https://github.com/tangenta) + - Fix the execution order of DDL jobs to prevent TiCDC from receiving out-of-order DDLs [#49498](https://github.com/pingcap/tidb/issues/49498) @[tangenta](https://github.com/tangenta) + - Fix the issue that the `tidb_gogc_tuner_threshold` system variable is not adjusted accordingly after the `tidb_server_memory_limit` variable is modified [#48180](https://github.com/pingcap/tidb/issues/48180) @[hawkingrei](https://github.com/hawkingrei) + - Fix the issue that the query result of a range partitioned table is incorrect in some cases due to wrong partition pruning [#50082](https://github.com/pingcap/tidb/issues/50082) @[Defined2014](https://github.com/Defined2014) + - Fix the issue that DDL operations such as renaming tables are stuck when the `CREATE TABLE` statement contains specific partitions or constraints [#50972](https://github.com/pingcap/tidb/issues/50972) @[lcwangchao](https://github.com/lcwangchao) + - Fix the issue that getting the default value of a column returns an error if the column default value is dropped [#50043](https://github.com/pingcap/tidb/issues/50043) [#51324](https://github.com/pingcap/tidb/issues/51324) @[crazycs520](https://github.com/crazycs520) + - Fix the issue that the monitoring metric `tidb_statistics_auto_analyze_total` on Grafana is not displayed as an integer [#51051](https://github.com/pingcap/tidb/issues/51051) @[hawkingrei](https://github.com/hawkingrei) + - Fix the issue that the `tidb_merge_partition_stats_concurrency` variable does not take effect when `auto analyze` is processing partitioned tables [#47594](https://github.com/pingcap/tidb/issues/47594) @[hawkingrei](https://github.com/hawkingrei) + - Fix the issue that the `index out of range` error might occur when a query involves JOIN operations [#42588](https://github.com/pingcap/tidb/issues/42588) @[AilinKid](https://github.com/AilinKid) + - Fix the issue that wrong results might be returned when TiFlash late materialization processes associated columns [#49241](https://github.com/pingcap/tidb/issues/49241) [#51204](https://github.com/pingcap/tidb/issues/51204) @[Lloyd-Pottiger](https://github.com/Lloyd-Pottiger) + ++ TiKV + + - Fix the issue that hibernated Regions are not promptly awakened in exceptional circumstances [#16368](https://github.com/tikv/tikv/issues/16368) @[LykxSassinator](https://github.com/LykxSassinator) + - Fix the issue that the entire Region becomes unavailable when one replica is offline, by checking the last heartbeat time of all replicas of the Region before taking a node offline [#16465](https://github.com/tikv/tikv/issues/16465) @[tonyxuqqi](https://github.com/tonyxuqqi) + - Fix the issue that table properties stored in RocksDB might be inaccurate when Titan is enabled [#16319](https://github.com/tikv/tikv/issues/16319) @[hicqu](https://github.com/hicqu) + - Fix the issue that executing `tikv-ctl compact-cluster` fails when a cluster has TiFlash nodes [#16189](https://github.com/tikv/tikv/issues/16189) @[frew](https://github.com/frew) + - Fix the issue that TiKV might panic when gRPC threads are checking `is_shutdown` [#16236](https://github.com/tikv/tikv/issues/16236) @[pingyu](https://github.com/pingyu) + - Fix the issue that TiDB and TiKV might produce inconsistent results when processing `DECIMAL` arithmetic multiplication truncation [#16268](https://github.com/tikv/tikv/issues/16268) @[solotzg](https://github.com/solotzg) + - Fix the issue that `cast_duration_as_time` might return incorrect results [#16211](https://github.com/tikv/tikv/issues/16211) @[gengliqi](https://github.com/gengliqi) + - Fix the issue that TiKV converts the time zone incorrectly for Brazil and Egypt [#16220](https://github.com/tikv/tikv/issues/16220) @[overvenus](https://github.com/overvenus) + - Fix the issue that JSON integers greater than the maximum `INT64` value but less than the maximum `UINT64` value are parsed as `FLOAT64` by TiKV, resulting in inconsistency with TiDB [#16512](https://github.com/tikv/tikv/issues/16512) @[YangKeao](https://github.com/YangKeao) + ++ PD + + - Fix the issue that slots are not fully deleted in a resource group client, which causes the number of the allocated tokens to be less than the specified value [#7346](https://github.com/tikv/pd/issues/7346) @[guo-shaoge](https://github.com/guo-shaoge) + - Fix the issue that some TSO logs do not print the error cause [#7496](https://github.com/tikv/pd/issues/7496) @[CabinfeverB](https://github.com/CabinfeverB) + - Fix the issue that the default resource group accumulates unnecessary tokens when `BURSTABLE` is enabled [#7206](https://github.com/tikv/pd/issues/7206) @[CabinfeverB](https://github.com/CabinfeverB) + - Fix the issue that there is no output when the `evict-leader-scheduler` interface is called [#7672](https://github.com/tikv/pd/issues/7672) @[CabinfeverB](https://github.com/CabinfeverB) + - Fix the memory leak issue that occurs when `watch etcd` is not turned off correctly [#7807](https://github.com/tikv/pd/issues/7807) @[rleungx](https://github.com/rleungx) + - Fix the issue that data race occurs when the `MergeLabels` function is called [#7535](https://github.com/tikv/pd/issues/7535) @[lhy1024](https://github.com/lhy1024) + - Fix the issue that TiDB Dashboard fails to get the TiKV profile when TLS is enabled [#7561](https://github.com/tikv/pd/issues/7561) @[Connor1996](https://github.com/Connor1996) + - Fix the issue that the orphan peer is deleted when the number of replicas does not meet the requirements [#7584](https://github.com/tikv/pd/issues/7584) @[bufferflies](https://github.com/bufferflies) + - Fix the issue that `available_stores` is calculated incorrectly for clusters adopting the Data Replication Auto Synchronous (DR Auto-Sync) mode [#7221](https://github.com/tikv/pd/issues/7221) @[disksing](https://github.com/disksing) + - Fix the issue that `canSync` and `hasMajority` might be calculated incorrectly for clusters adopting the Data Replication Auto Synchronous (DR Auto-Sync) mode when the configuration of Placement Rules is complex [#7201](https://github.com/tikv/pd/issues/7201) @[disksing](https://github.com/disksing) + - Fix the issue that the primary AZ cannot add TiKV nodes when the secondary AZ is down for clusters adopting the Data Replication Auto Synchronous (DR Auto-Sync) mode [#7218](https://github.com/tikv/pd/issues/7218) @[disksing](https://github.com/disksing) + - Fix the issue that querying resource groups in batch might cause PD to panic [#7206](https://github.com/tikv/pd/issues/7206) @[nolouch](https://github.com/nolouch) + - Fix the issue that querying a Region without a leader using `pd-ctl` might cause PD to panic [#7630](https://github.com/tikv/pd/issues/7630) @[rleungx](https://github.com/rleungx) + - Fix the issue that the PD monitoring item `learner-peer-count` does not synchronize the old value after a leader switch [#7728](https://github.com/tikv/pd/issues/7728) @[CabinfeverB](https://github.com/CabinfeverB) + - Fix the issue that PD cannot read resource limitations when it is started with `systemd` [#7628](https://github.com/tikv/pd/issues/7628) @[bufferflies](https://github.com/bufferflies) + ++ TiFlash + + - Fix the issue that TiFlash might panic due to unstable network connections with PD during replica migration [#8323](https://github.com/pingcap/tiflash/issues/8323) @[JaySon-Huang](https://github.com/JaySon-Huang) + - Fix the issue that TiFlash incorrectly handles `ENUM` when the `ENUM` value is 0 [#8311](https://github.com/pingcap/tiflash/issues/8311) @[solotzg](https://github.com/solotzg) + - Fix the random invalid memory access issue that might occur with `GREATEST` or `LEAST` functions containing constant string parameters [#8604](https://github.com/pingcap/tiflash/issues/8604) @[windtalker](https://github.com/windtalker) + - Fix the issue that the `lowerUTF8` and `upperUTF8` functions do not allow characters in different cases to occupy different bytes [#8484](https://github.com/pingcap/tiflash/issues/8484) @[gengliqi](https://github.com/gengliqi) + - Fix the issue that short queries executed successfully print excessive info logs [#8592](https://github.com/pingcap/tiflash/issues/8592) @[windtalker](https://github.com/windtalker) + - Fix the issue that the memory usage increases significantly due to slow queries [#8564](https://github.com/pingcap/tiflash/issues/8564) @[JinheLin](https://github.com/JinheLin) + - Fix the issue that TiFlash panics after executing `ALTER TABLE ... MODIFY COLUMN ... NOT NULL`, which changes nullable columns to non-nullable [#8419](https://github.com/pingcap/tiflash/issues/8419) @[JaySon-Huang](https://github.com/JaySon-Huang) + - Fix the issue that after terminating a query, TiFlash crashes due to concurrent data conflicts when a large number of tasks on TiFlash are canceled at the same time [#7432](https://github.com/pingcap/tiflash/issues/7432) @[SeaRise](https://github.com/SeaRise) + - Fix the issue that TiFlash might crash during remote reads [#8685](https://github.com/pingcap/tiflash/issues/8685) @[zanmato1984](https://github.com/zanmato1984) + - Fix the issue that TiFlash Anti Semi Join might return incorrect results when the join includes non-equivalent conditions [#8791](https://github.com/pingcap/tiflash/issues/8791) @[windtalker](https://github.com/windtalker) + ++ Tools + + + Backup & Restore (BR) + + - Fix the issue that stopping a log backup task causes TiDB to crash [#50839](https://github.com/pingcap/tidb/issues/50839) @[YuJuncen](https://github.com/YuJuncen) + - Fix the issue that data restore is slowed down due to absence of a leader on a TiKV node [#50566](https://github.com/pingcap/tidb/issues/50566) @[Leavrth](https://github.com/Leavrth) + - Fix the issue that log backup gets stuck after changing the TiKV IP address on the same node [#50445](https://github.com/pingcap/tidb/issues/50445) @[3pointer](https://github.com/3pointer) + - Fix the issue that BR cannot retry when encountering an error while reading file content from S3 [#49942](https://github.com/pingcap/tidb/issues/49942) @[Leavrth](https://github.com/Leavrth) + - Fix the issue that the `Unsupported collation` error is reported when you restore data from backups of an old version [#49466](https://github.com/pingcap/tidb/issues/49466) @[3pointer](https://github.com/3pointer) + + + TiCDC + + - Fix the issue that the changefeed reports an error after `TRUNCATE PARTITION` is executed on the upstream table [#10522](https://github.com/pingcap/tiflow/issues/10522) @[sdojjy](https://github.com/sdojjy) + - Fix the issue that the changefeed `resolved ts` does not advance in extreme cases [#10157](https://github.com/pingcap/tiflow/issues/10157) @[sdojjy](https://github.com/sdojjy) + - Fix the issue that the Syncpoint table might be incorrectly replicated [#10576](https://github.com/pingcap/tiflow/issues/10576) @[asddongmen](https://github.com/asddongmen) + - Fix the issue that after filtering out `add table partition` events is configured in `ignore-event`, TiCDC does not replicate other types of DML changes for related partitions to the downstream [#10524](https://github.com/pingcap/tiflow/issues/10524) @[CharlesCheung96](https://github.com/CharlesCheung96) + - Fix the issue that the file sequence number generated by the storage service might not increment correctly when using the storage sink [#10352](https://github.com/pingcap/tiflow/issues/10352) @[CharlesCheung96](https://github.com/CharlesCheung96) + - Fix the issue that TiCDC returns the `ErrChangeFeedAlreadyExists` error when concurrently creating multiple changefeeds [#10430](https://github.com/pingcap/tiflow/issues/10430) @[CharlesCheung96](https://github.com/CharlesCheung96) + - Fix the issue that `snapshot lost caused by GC` is not reported in time when resuming a changefeed and the `checkpoint-ts` of the changefeed is smaller than the GC safepoint of TiDB [#10463](https://github.com/pingcap/tiflow/issues/10463) @[sdojjy](https://github.com/sdojjy) + - Fix the issue that TiCDC fails to validate `TIMESTAMP` type checksum due to time zone mismatch after data integrity validation for single-row data is enabled [#10573](https://github.com/pingcap/tiflow/issues/10573) @[3AceShowHand](https://github.com/3AceShowHand) + + + TiDB Data Migration (DM) + + - Fix the issue that a wrong binlog event type in the task configuration causes upgrade failures [#10282](https://github.com/pingcap/tiflow/issues/10282) @[GMHDBJD](https://github.com/GMHDBJD) + - Fix the issue that a table with `shard_row_id_bits` causes the schema tracker to fail to initialize [#10308](https://github.com/pingcap/tiflow/issues/10308) @[GMHDBJD](https://github.com/GMHDBJD) + + + TiDB Lightning + + - Fix the issue that TiDB Lightning reports an error when encountering invalid symbolic link files during file scanning [#49423](https://github.com/pingcap/tidb/issues/49423) @[lance6716](https://github.com/lance6716) + - Fix the issue that TiDB Lightning fails to correctly parse date values containing `0` when `NO_ZERO_IN_DATE` is not included in `sql_mode` [#50757](https://github.com/pingcap/tidb/issues/50757) @[GMHDBJD](https://github.com/GMHDBJD) diff --git a/releases/release-7.2.0.md b/releases/release-7.2.0.md index b41bb0363ca23..ea85bb347a999 100644 --- a/releases/release-7.2.0.md +++ b/releases/release-7.2.0.md @@ -245,7 +245,7 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v7.2/quick-start-with- + TiDB Lightning - - Optimize the retry mechanism during import to avoid errors caused by leader switching [#44478](https://github.com/pingcap/tidb/pull/44478) @[lance6716](https://github.com/lance6716) + - Optimize the retry mechanism during import to avoid errors caused by leader switching [#44263](https://github.com/pingcap/tidb/issues/44263) @[lance6716](https://github.com/lance6716) - Verify checksum through SQL after the import to improve stability of verification [#41941](https://github.com/pingcap/tidb/issues/41941) @[GMHDBJD](https://github.com/GMHDBJD) - Optimize TiDB Lightning OOM issues when importing wide tables [#43853](https://github.com/pingcap/tidb/issues/43853) @[D3Hunter](https://github.com/D3Hunter) diff --git a/releases/release-7.4.0.md b/releases/release-7.4.0.md index c9bc9287b1f1e..cfe015548dcdc 100644 --- a/releases/release-7.4.0.md +++ b/releases/release-7.4.0.md @@ -185,9 +185,9 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v7.4/quick-start-with- For more information, see [documentation](/tidb-resource-control.md#manage-background-tasks). -* Enhance the ability to lock statistics [#46351](https://github.com/pingcap/tidb/issues/46351) @[hi-rustin](https://github.com/hi-rustin) +* Lock statistics becomes generally available (GA) [#46351](https://github.com/pingcap/tidb/issues/46351) @[hi-rustin](https://github.com/hi-rustin) - In v7.4.0, TiDB has enhanced the ability to [lock statistics](/statistics.md#lock-statistics). Now, to ensure operational security, locking and unlocking statistics require the same privileges as collecting statistics. In addition, TiDB supports locking and unlocking statistics for specific partitions, providing greater flexibility. If you are confident in queries and execution plans in the database and want to prevent any changes from occurring, you can lock statistics to enhance stability. + In v7.4.0, [lock statistics](/statistics.md#lock-statistics) becomes generally available. Now, to ensure operational security, locking and unlocking statistics require the same privileges as collecting statistics. In addition, TiDB supports locking and unlocking statistics for specific partitions, providing greater flexibility. If you are confident in queries and execution plans in the database and want to prevent any changes from occurring, you can lock statistics to enhance stability. For more information, see [documentation](/statistics.md#lock-statistics). @@ -447,7 +447,7 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v7.4/quick-start-with- - Fix an issue that the misleading error message `resolve lock timeout` covers up the actual error when backup fails [#43236](https://github.com/pingcap/tidb/issues/43236) @[YuJuncen](https://github.com/YuJuncen) - Fix the issue that recovering implicit primary keys using PITR might cause conflicts [#46520](https://github.com/pingcap/tidb/issues/46520) @[3pointer](https://github.com/3pointer) - Fix the issue that recovering meta-kv using PITR might cause errors [#46578](https://github.com/pingcap/tidb/issues/46578) @[Leavrth](https://github.com/Leavrth) - - Fix the errors in BR integration test cases [#45561](https://github.com/pingcap/tidb/issues/46561) @[purelind](https://github.com/purelind) + - Fix the errors in BR integration test cases [#46561](https://github.com/pingcap/tidb/issues/46561) @[purelind](https://github.com/purelind) + TiCDC diff --git a/releases/release-7.5.1.md b/releases/release-7.5.1.md new file mode 100644 index 0000000000000..6a975f4df1186 --- /dev/null +++ b/releases/release-7.5.1.md @@ -0,0 +1,227 @@ +--- +title: TiDB 7.5.1 Release Notes +summary: Learn about the compatibility changes, improvements, and bug fixes in TiDB 7.5.1. +--- + +# TiDB 7.5.1 Release Notes + +Release date: February 29, 2024 + +TiDB version: 7.5.1 + +Quick access: [Quick start](https://docs.pingcap.com/tidb/v7.5/quick-start-with-tidb) | [Production deployment](https://docs.pingcap.com/tidb/v7.5/production-deployment-using-tiup) | [Installation packages](https://www.pingcap.com/download/?version=v7.5.1#version-list) + +## Compatibility changes + +- Prohibit setting [`require_secure_transport`](https://docs.pingcap.com/tidb/v7.5/system-variables#require_secure_transport-new-in-v610) to `ON` in Security Enhanced Mode (SEM) to prevent potential connectivity issues for users [#47665](https://github.com/pingcap/tidb/issues/47665) @[tiancaiamao](https://github.com/tiancaiamao) +- To reduce the overhead of log printing, TiFlash changes the default value of `logger.level` from `"debug"` to `"info"` [#8641](https://github.com/pingcap/tiflash/issues/8641) @[JaySon-Huang](https://github.com/JaySon-Huang) +- Introduce the TiKV configuration item [`gc.num-threads`](https://docs.pingcap.com/tidb/v7.5/tikv-configuration-file#num-threads-new-in-v658-and-v751) to set the number of GC threads when `enable-compaction-filter` is `false` [#16101](https://github.com/tikv/tikv/issues/16101) @[tonyxuqqi](https://github.com/tonyxuqqi) +- TiCDC Changefeed introduces the following new configuration items: + - [`compression`](/ticdc/ticdc-changefeed-config.md): enables you to configure the compression behavior of redo log files [#10176](https://github.com/pingcap/tiflow/issues/10176) @[sdojjy](https://github.com/sdojjy) + - [`sink.cloud-storage-config`](/ticdc/ticdc-changefeed-config.md): enables you to set the automatic cleanup of historical data when replicating data to object storage [#10109](https://github.com/pingcap/tiflow/issues/10109) @[CharlesCheung96](https://github.com/CharlesCheung96) + - [`consistent.flush-concurrency`](/ticdc/ticdc-changefeed-config.md): enables you to set the concurrency for uploading a single redo file [#10226](https://github.com/pingcap/tiflow/issues/10226) @[sdojjy](https://github.com/sdojjy) + +## Improvements + ++ TiDB + + - Use `tikv_client_read_timeout` during the DDL schema reload process to reduce the impact of Meta Region Leader read unavailability on the cluster [#48124](https://github.com/pingcap/tidb/issues/48124) @[cfzjywxk](https://github.com/cfzjywxk) + - Enhance observability related to resource control [#49318](https://github.com/pingcap/tidb/issues/49318) @[glorv](https://github.com/glorv) @[bufferflies](https://github.com/bufferflies) @[nolouch](https://github.com/nolouch) + + As more and more users use resource groups to isolate application workloads, Resource Control provides enhanced data based on resource groups. This helps you monitor resource group workloads and settings, ensuring that you can quickly identify and accurately diagnose problems, including: + + - [Slow Queries](/identify-slow-queries.md): add the resource group name, resource unit (RU) consumption, and time for waiting for resources. + - [Statement Summary Tables](/statement-summary-tables.md): add the resource group name, RU consumption, and time for waiting for resources. + - In the system variable [`tidb_last_query_info`](/system-variables.md#tidb_last_query_info-new-in-v4014), add a new entry `ru_consumption` to indicate the consumed [RU](/tidb-resource-control.md#what-is-request-unit-ru) by SQL statements. You can use this variable to get the resource consumption of the last statement in the session. + - Add database metrics based on resource groups: QPS/TPS, execution time (P999/P99/P95), number of failures, and number of connections. + + - Modify the `CANCEL IMPORT JOB` statement to a synchronous statement [#48736](https://github.com/pingcap/tidb/issues/48736) @[D3Hunter](https://github.com/D3Hunter) + - Support the [`FLASHBACK CLUSTER TO TSO`](https://docs.pingcap.com/tidb/v7.5/sql-statement-flashback-cluster) syntax [#48372](https://github.com/pingcap/tidb/issues/48372) @[BornChanger](https://github.com/BornChanger) + - Optimize the TiDB implementation when handling some type conversions and fix related issues [#47945](https://github.com/pingcap/tidb/issues/47945) [#47864](https://github.com/pingcap/tidb/issues/47864) [#47829](https://github.com/pingcap/tidb/issues/47829) [#47816](https://github.com/pingcap/tidb/issues/47816) @[YangKeao](https://github.com/YangKeao) @[lcwangchao](https://github.com/lcwangchao) + - When a non-binary collation is set and the query includes `LIKE`, the optimizer generates an `IndexRangeScan` to improve the execution efficiency [#48181](https://github.com/pingcap/tidb/issues/48181) [#49138](https://github.com/pingcap/tidb/issues/49138) @[time-and-fate](https://github.com/time-and-fate) + - Enhance the ability to convert `OUTER JOIN` to `INNER JOIN` in specific scenarios [#49616](https://github.com/pingcap/tidb/issues/49616) @[qw4990](https://github.com/qw4990) + - Support multiple accelerated `ADD INDEX` DDL tasks to be queued for execution, instead of falling back to normal `ADD INDEX` tasks [#47758](https://github.com/pingcap/tidb/issues/47758) @[tangenta](https://github.com/tangenta) + - Improve the speed of adding indexes to empty tables [#49682](https://github.com/pingcap/tidb/issues/49682) @[zimulala](https://github.com/zimulala) + ++ TiFlash + + - Improve the calculation method for [Request Unit (RU)](/tidb-resource-control.md#what-is-request-unit-ru) to make RU values more stable [#8391](https://github.com/pingcap/tiflash/issues/8391) @[guo-shaoge](https://github.com/guo-shaoge) + - Reduce the impact of disk performance jitter on read latency [#8583](https://github.com/pingcap/tiflash/issues/8583) @[JaySon-Huang](https://github.com/JaySon-Huang) + - Reduce the impact of background GC tasks on read and write task latency [#8650](https://github.com/pingcap/tiflash/issues/8650) @[JaySon-Huang](https://github.com/JaySon-Huang) + ++ Tools + + + Backup & Restore (BR) + + - Improve the speed of merging SST files during data restore by using a more efficient algorithm [#50613](https://github.com/pingcap/tidb/issues/50613) @[Leavrth](https://github.com/Leavrth) + - Support creating databases in batch during data restore [#50767](https://github.com/pingcap/tidb/issues/50767) @[Leavrth](https://github.com/Leavrth) + - Support ingesting SST files in batch during data restore [#16267](https://github.com/tikv/tikv/issues/16267) @[3pointer](https://github.com/3pointer) + - Print the information of the slowest Region that affects global checkpoint advancement in logs and metrics during log backups [#51046](https://github.com/pingcap/tidb/issues/51046) @[YuJuncen](https://github.com/YuJuncen) + - Improve the table creation performance of the `RESTORE` statement in scenarios with large datasets [#48301](https://github.com/pingcap/tidb/issues/48301) @[Leavrth](https://github.com/Leavrth) + - BR can pause Region merging by setting the `merge-schedule-limit` configuration to `0` [#7148](https://github.com/tikv/pd/issues/7148) @[BornChanger](https://github.com/3pointer) + - Refactor the BR exception handling mechanism to increase tolerance for unknown errors [#47656](https://github.com/pingcap/tidb/issues/47656) @[3pointer](https://github.com/3pointer) + + + TiCDC + + - Support searching TiCDC logs in the TiDB Dashboard [#10263](https://github.com/pingcap/tiflow/issues/10263) @[CharlesCheung96](https://github.com/CharlesCheung96) + - Support [querying the downstream synchronization status of a changefeed](https://docs.pingcap.com/tidb/v7.5/ticdc-open-api-v2#query-whether-a-specific-replication-task-is-completed), which helps you determine whether the upstream data changes received by TiCDC have been synchronized to the downstream system completely [#10289](https://github.com/pingcap/tiflow/issues/10289) @[hongyunyan](https://github.com/hongyunyan) + - Improve the performance of TiCDC replicating data to object storage by increasing parallelism [#10098](https://github.com/pingcap/tiflow/issues/10098) @[CharlesCheung96](https://github.com/CharlesCheung96) + + + TiDB Lightning + + - Improve the performance of `ALTER TABLE` when importing a large number of small tables [#50105](https://github.com/pingcap/tidb/issues/50105) @[D3Hunter](https://github.com/D3Hunter) + +## Bug fixes + ++ TiDB + + - Fix the issue that setting the system variable `tidb_service_scope` does not take effect [#49245](https://github.com/pingcap/tidb/issues/49245) @[ywqzzy](https://github.com/ywqzzy) + - Fix the issue that the communication protocol cannot handle packets larger than or equal to 16 MB when compression is enabled [#47157](https://github.com/pingcap/tidb/issues/47157) [#47161](https://github.com/pingcap/tidb/issues/47161) @[dveeden](https://github.com/dveeden) + - Fix the issue that the `approx_percentile` function might cause TiDB panic [#40463](https://github.com/pingcap/tidb/issues/40463) @[xzhangxian1008](https://github.com/xzhangxian1008) + - Fix the issue that TiDB might implicitly insert the `from_binary` function when the argument of a string function is a `NULL` constant, causing some expressions unable to be pushed down to TiFlash [#49526](https://github.com/pingcap/tidb/issues/49526) @[YangKeao](https://github.com/YangKeao) + - Fix the goroutine leak issue that might occur when the `HashJoin` operator fails to spill to disk [#50841](https://github.com/pingcap/tidb/issues/50841) @[wshwsh12](https://github.com/wshwsh12) + - Fix the issue that `BIT` type columns might cause query errors due to decode failures when they are involved in calculations of some functions [#49566](https://github.com/pingcap/tidb/issues/49566) [#50850](https://github.com/pingcap/tidb/issues/50850) [#50855](https://github.com/pingcap/tidb/issues/50855) @[jiyfhust](https://github.com/jiyfhust) + - Fix the goroutine leak issue that occurs when the memory usage of CTE queries exceed limits [#50337](https://github.com/pingcap/tidb/issues/50337) @[guo-shaoge](https://github.com/guo-shaoge) + - Fix the issue that wrong results might be returned when TiFlash late materialization processes associated columns [#49241](https://github.com/pingcap/tidb/issues/49241) [#51204](https://github.com/pingcap/tidb/issues/51204) @[Lloyd-Pottiger](https://github.com/Lloyd-Pottiger) + - Fix the issue that the background job thread of TiDB might panic when TiDB records historical statistics [#49076](https://github.com/pingcap/tidb/issues/49076) @[hawkingrei](https://github.com/hawkingrei) + - Fix the error that might occur when TiDB merges histograms of global statistics for partitioned tables [#49023](https://github.com/pingcap/tidb/issues/49023) @[hawkingrei](https://github.com/hawkingrei) + - Fix the issue that the historical statistics of the `stats_meta` table are not updated after a partition is dropped [#49334](https://github.com/pingcap/tidb/issues/49334) @[hi-rustin](https://github.com/hi-rustin) + - Fix the issue of incorrect query results caused by multi-valued indexes mistakenly selected as the `Index Join` probe side [#50382](https://github.com/pingcap/tidb/issues/50382) @[AilinKid](https://github.com/AilinKid) + - Fix the issue that the `USE_INDEX_MERGE` hint does not take effect on multi-valued indexes [#50553](https://github.com/pingcap/tidb/issues/50553) @[AilinKid](https://github.com/AilinKid) + - Fix the issue that users might get errors when querying the `INFORMATION_SCHEMA.ANALYZE_STATUS` system table [#48835](https://github.com/pingcap/tidb/issues/48835) @[hi-rustin](https://github.com/hi-rustin) + - Fix the issue of wrong query results due to TiDB incorrectly eliminating constant values in `group by` [#38756](https://github.com/pingcap/tidb/issues/38756) @[hi-rustin](https://github.com/hi-rustin) + - Fix the issue that the `processed_rows` of the `ANALYZE` task on a table might exceed the total number of rows in that table [#50632](https://github.com/pingcap/tidb/issues/50632) @[hawkingrei](https://github.com/hawkingrei) + - Fix the issue that TiDB might panic when using the `EXECUTE` statement to execute `PREPARE STMT` after the `tidb_enable_prepared_plan_cache` system variable is enabled and then disabled [#49344](https://github.com/pingcap/tidb/issues/49344) @[qw4990](https://github.com/qw4990) + - Fix the `Column ... in from clause is ambiguous` error that might occur when a query uses `NATURAL JOIN` [#32044](https://github.com/pingcap/tidb/issues/32044) @[AilinKid](https://github.com/AilinKid) + - Fix the issue that using a multi-valued index to access an empty JSON array might return incorrect results [#50125](https://github.com/pingcap/tidb/issues/50125) @[YangKeao](https://github.com/YangKeao) + - Fix the `Can't find column ...` error that might occur when aggregate functions are used for group calculations [#50926](https://github.com/pingcap/tidb/issues/50926) @[qw4990](https://github.com/qw4990) + - Fix the issue that the control of `SET_VAR` for variables of the string type might become invalid [#50507](https://github.com/pingcap/tidb/issues/50507) @[qw4990](https://github.com/qw4990) + - Fix the issue that high CPU usage of TiDB occurs due to long-term memory pressure caused by `tidb_server_memory_limit` [#48741](https://github.com/pingcap/tidb/issues/48741) @[XuHuaiyu](https://github.com/XuHuaiyu) + - Fix the issue that the completion times of two DDL tasks with dependencies are incorrectly sequenced [#49498](https://github.com/pingcap/tidb/issues/49498) @[tangenta](https://github.com/tangenta) + - Fix the issue that illegal optimizer hints might cause valid hints to be ineffective [#49308](https://github.com/pingcap/tidb/issues/49308) @[hawkingrei](https://github.com/hawkingrei) + - Fix the issue that DDL statements with the `CHECK` constraint are stuck [#47632](https://github.com/pingcap/tidb/issues/47632) @[jiyfhust](https://github.com/jiyfhust) + - Fix the issue that the behavior of the `ENFORCED` option in the `CHECK` constraint is inconsistent with MySQL 8.0 [#47567](https://github.com/pingcap/tidb/issues/47567) [#47631](https://github.com/pingcap/tidb/issues/47631) @[jiyfhust](https://github.com/jiyfhust) + - Fix the issue that CTE queries might report an error `type assertion for CTEStorageMap failed` during the retry process [#46522](https://github.com/pingcap/tidb/issues/46522) @[tiancaiamao](https://github.com/tiancaiamao) + - Fix the issue that the `DELETE` and `UPDATE` statements using index lookup might report an error when `tidb_multi_statement_mode` mode is enabled [#50012](https://github.com/pingcap/tidb/issues/50012) @[tangenta](https://github.com/tangenta) + - Fix the issue that `UPDATE` or `DELETE` statements containing `WITH RECURSIVE` CTEs might produce incorrect results [#48969](https://github.com/pingcap/tidb/issues/48969) @[winoros](https://github.com/winoros) + - Fix the issue that the optimizer incorrectly converts TiFlash selection path to the DUAL table in specific scenarios [#49285](https://github.com/pingcap/tidb/issues/49285) @[AilinKid](https://github.com/AilinKid) + - Fix the issue that the same query plan has different `PLAN_DIGEST` values in some cases [#47634](https://github.com/pingcap/tidb/issues/47634) @[King-Dylan](https://github.com/King-Dylan) + - Fix the issue that after the time window for automatic statistics updates is configured, statistics might still be updated outside that time window [#49552](https://github.com/pingcap/tidb/issues/49552) @[hawkingrei](https://github.com/hawkingrei) + - Fix the issue that the query result is incorrect when an `ENUM` type column is used as the join key [#48991](https://github.com/pingcap/tidb/issues/48991) @[winoros](https://github.com/winoros) + - Fix the issue that executing `UNIQUE` index lookup with an `ORDER BY` clause might cause an error [#49920](https://github.com/pingcap/tidb/issues/49920) @[jackysp](https://github.com/jackysp) + - Fix the issue that `LIMIT` in multi-level nested `UNION` queries might become ineffective [#49874](https://github.com/pingcap/tidb/issues/49874) @[Defined2014](https://github.com/Defined2014) + - Fix the issue that the result of `COUNT(INT)` calculated by MPP might be incorrect [#48643](https://github.com/pingcap/tidb/issues/48643) @[AilinKid](https://github.com/AilinKid) + - Fix the issue that parsing invalid values of `ENUM` or `SET` types would directly cause SQL statement errors [#49487](https://github.com/pingcap/tidb/issues/49487) @[winoros](https://github.com/winoros) + - Fix the issue that TiDB panics and reports an error `invalid memory address or nil pointer dereference` [#42739](https://github.com/pingcap/tidb/issues/42739) @[CbcWestwolf](https://github.com/CbcWestwolf) + - Fix the issue that executing `UNION ALL` with the DUAL table as the first subnode might cause an error [#48755](https://github.com/pingcap/tidb/issues/48755) @[winoros](https://github.com/winoros) + - Fix the issue that common hints do not take effect in `UNION ALL` statements [#50068](https://github.com/pingcap/tidb/issues/50068) @[hawkingrei](https://github.com/hawkingrei) + - Fix the issue that TiDB server might panic during graceful shutdown [#36793](https://github.com/pingcap/tidb/issues/36793) @[bb7133](https://github.com/bb7133) + - Fix the issue that Daylight Saving Time is displayed incorrectly in some time zones [#49586](https://github.com/pingcap/tidb/issues/49586) @[overvenus](https://github.com/overvenus) + - Fix the issue that static `CALIBRATE RESOURCE` relies on the Prometheus data [#49174](https://github.com/pingcap/tidb/issues/49174) @[glorv](https://github.com/glorv) + - Fix the issue that hints cannot be used in `REPLACE INTO` statements [#34325](https://github.com/pingcap/tidb/issues/34325) @[YangKeao](https://github.com/YangKeao) + - Fix the issue that executing queries containing the `GROUP_CONCAT(ORDER BY)` syntax might return errors [#49986](https://github.com/pingcap/tidb/issues/49986) @[AilinKid](https://github.com/AilinKid) + - Fix the issue that TiDB server might consume a significant amount of resources when the enterprise plugin for audit logging is used [#49273](https://github.com/pingcap/tidb/issues/49273) @[lcwangchao](https://github.com/lcwangchao) + - Fix the issue that using old interfaces might cause inconsistent metadata for tables [#49751](https://github.com/pingcap/tidb/issues/49751) @[hawkingrei](https://github.com/hawkingrei) + - Fix the issue that disabling `tidb_enable_collect_execution_info` causes the coprocessor cache to panic [#48212](https://github.com/pingcap/tidb/issues/48212) @[you06](https://github.com/you06) + - Fix the issue that executing `ALTER TABLE ... LAST PARTITION` fails when the partition column type is `DATETIME` [#48814](https://github.com/pingcap/tidb/issues/48814) @[crazycs520](https://github.com/crazycs520) + - Fix the issue that the `COMMIT` or `ROLLBACK` operation executed through `COM_STMT_EXECUTE` fails to terminate transactions that have timed out [#49151](https://github.com/pingcap/tidb/issues/49151) @[zyguan](https://github.com/zyguan) + - Fix the issue that histogram statistics might not be parsed into readable strings when the histogram boundary contains `NULL` [#49823](https://github.com/pingcap/tidb/issues/49823) @[AilinKid](https://github.com/AilinKid) + - Fix the issue that queries containing common table expressions (CTEs) unexpectedly get stuck when the memory limit is exceeded [#49096](https://github.com/pingcap/tidb/issues/49096) @[AilinKid](https://github.com/AilinKid) + - Fix the issue that data is inconsistent under the TiDB Distributed eXecution Framework (DXF) when executing `ADD INDEX` after the DDL Owner is network isolated [#49773](https://github.com/pingcap/tidb/issues/49773) @[tangenta](https://github.com/tangenta) + - Fix the issue that the auto-increment ID allocation reports an error due to concurrent conflicts when using an auto-increment column with `AUTO_ID_CACHE=1` [#50519](https://github.com/pingcap/tidb/issues/50519) @[tiancaiamao](https://github.com/tiancaiamao) + - Fix the issue that TiDB might panic when a query contains the Apply operator and the `fatal error: concurrent map writes` error occurs [#50347](https://github.com/pingcap/tidb/issues/50347) @[SeaRise](https://github.com/SeaRise) + - Fix the TiDB node panic issue that occurs when DDL `jobID` is restored to 0 [#46296](https://github.com/pingcap/tidb/issues/46296) @[jiyfhust](https://github.com/jiyfhust) + - Fix the issue that query results are incorrect due to `STREAM_AGG()` incorrectly handling CI [#49902](https://github.com/pingcap/tidb/issues/49902) @[wshwsh12](https://github.com/wshwsh12) + - Mitigate the issue that TiDB nodes might encounter OOM errors when dealing with a large number of tables or partitions [#50077](https://github.com/pingcap/tidb/issues/50077) @[zimulala](https://github.com/zimulala) + - Fix the issue that the `LEADING` hint does not take effect in `UNION ALL` statements [#50067](https://github.com/pingcap/tidb/issues/50067) @[hawkingrei](https://github.com/hawkingrei) + - Fix the issue that `LIMIT` and `OPRDERBY` might be invalid in nested `UNION` queries [#49377](https://github.com/pingcap/tidb/issues/49377) @[AilinKid](https://github.com/AilinKid) + - Fix the issue that a query containing the IndexHashJoin operator gets stuck when memory exceeds `tidb_mem_quota_query` [#49033](https://github.com/pingcap/tidb/issues/49033) @[XuHuaiyu](https://github.com/XuHuaiyu) + - Fix the issue that TiDB returns wrong query results when processing `ENUM` or `SET` types by constant propagation [#49440](https://github.com/pingcap/tidb/issues/49440) @[winoros](https://github.com/winoros) + - Fix the issue that executing `SELECT INTO OUTFILE` using the `PREPARE` method incorrectly returns a success message instead of an error [#49166](https://github.com/pingcap/tidb/issues/49166) @[qw4990](https://github.com/qw4990) + - Fix the issue that enforced sorting might become ineffective when a query uses optimizer hints (such as `STREAM_AGG()`) that enforce sorting and its execution plan contains `IndexMerge` [#49605](https://github.com/pingcap/tidb/issues/49605) @[AilinKid](https://github.com/AilinKid) + - Fix the issue that tables with `AUTO_ID_CACHE=1` might lead to gRPC client leaks when there are a large number of tables [#48869](https://github.com/pingcap/tidb/issues/48869) @[tiancaiamao](https://github.com/tiancaiamao) + - Fix the issue that in non-strict mode (`sql_mode = ''`), truncation during executing `INSERT` still reports an error [#49369](https://github.com/pingcap/tidb/issues/49369) @[tiancaiamao](https://github.com/tiancaiamao) + - Fix the issue that using the `_` wildcard in `LIKE` when the data contains trailing spaces can result in incorrect query results [#48983](https://github.com/pingcap/tidb/issues/48983) @[time-and-fate](https://github.com/time-and-fate) + - Fix the issue that executing `ADMIN CHECK` after updating the `tidb_mem_quota_query` system variable returns `ERROR 8175` [#49258](https://github.com/pingcap/tidb/issues/49258) @[tangenta](https://github.com/tangenta) + - Fix the issue of excessive statistical error in constructing statistics caused by Golang's implicit conversion algorithm [#49801](https://github.com/pingcap/tidb/issues/49801) @[qw4990](https://github.com/qw4990) + - Fix the issue that queries containing CTEs report `runtime error: index out of range [32] with length 32` when `tidb_max_chunk_size` is set to a small value [#48808](https://github.com/pingcap/tidb/issues/48808) @[guo-shaoge](https://github.com/guo-shaoge) + ++ TiKV + + - Fix the issue that enabling `tidb_enable_row_level_checksum` might cause TiKV to panic [#16371](https://github.com/tikv/tikv/issues/16371) @[cfzjywxk](https://github.com/cfzjywxk) + - Fix the issue that TiKV might panic when gRPC threads are checking `is_shutdown` [#16236](https://github.com/tikv/tikv/issues/16236) @[pingyu](https://github.com/pingyu) + - Fix the issue that TiKV converts the time zone incorrectly for Brazil and Egypt [#16220](https://github.com/tikv/tikv/issues/16220) @[overvenus](https://github.com/overvenus) + - Fix the issue that `blob-run-mode` in Titan cannot be updated online [#15978](https://github.com/tikv/tikv/issues/15978) @[tonyxuqqi](https://github.com/tonyxuqqi) + - Fix the issue that TiDB and TiKV might produce inconsistent results when processing `DECIMAL` arithmetic multiplication truncation [#16268](https://github.com/tikv/tikv/issues/16268) @[solotzg](https://github.com/solotzg) + - Fix the issue that Flashback might get stuck when encountering `notLeader` or `regionNotFound` [#15712](https://github.com/tikv/tikv/issues/15712) @[HuSharp](https://github.com/HuSharp) + - Fix the issue that damaged SST files might be spread to other TiKV nodes [#15986](https://github.com/tikv/tikv/issues/15986) @[Connor1996](https://github.com/Connor1996) + - Fix the issue that if TiKV runs extremely slowly, it might panic after Region merge [#16111](https://github.com/tikv/tikv/issues/16111) @[overvenus](https://github.com/overvenus) + - Fix the issue that the joint state of DR Auto-Sync might time out when scaling out [#15817](https://github.com/tikv/tikv/issues/15817) @[Connor1996](https://github.com/Connor1996) + - Fix the issue that Resolved TS might be blocked for two hours [#11847](https://github.com/tikv/tikv/issues/11847) [#15520](https://github.com/tikv/tikv/issues/15520) [#39130](https://github.com/pingcap/tidb/issues/39130) @[overvenus](https://github.com/overvenus) + - Fix the issue that `cast_duration_as_time` might return incorrect results [#16211](https://github.com/tikv/tikv/issues/16211) @[gengliqi](https://github.com/gengliqi) + ++ PD + + - Fix the issue that querying resource groups in batch might cause PD to panic [#7206](https://github.com/tikv/pd/issues/7206) @[nolouch](https://github.com/nolouch) + - Fix the issue that PD cannot read resource limitations when it is started with `systemd` [#7628](https://github.com/tikv/pd/issues/7628) @[bufferflies](https://github.com/bufferflies) + - Fix the issue that continuous jitter in PD disk latency might cause PD to fail to select a new leader [#7251](https://github.com/tikv/pd/issues/7251) @[HuSharp](https://github.com/HuSharp) + - Fix the issue that a network partition in PD might cause scheduling not to be started immediately [#7016](https://github.com/tikv/pd/issues/7016) @[HuSharp](https://github.com/HuSharp) + - Fix the issue that the PD monitoring item `learner-peer-count` does not synchronize the old value after a leader switch [#7728](https://github.com/tikv/pd/issues/7728) @[CabinfeverB](https://github.com/CabinfeverB) + - Fix the issue that when PD leader is transferred and there is a network partition between the new leader and the PD client, the PD client fails to update the information of the leader [#7416](https://github.com/tikv/pd/issues/7416) @[CabinfeverB](https://github.com/CabinfeverB) + - Fix some security issues by upgrading the version of Gin Web Framework from v1.8.1 to v1.9.1 [#7438](https://github.com/tikv/pd/issues/7438) @[niubell](https://github.com/niubell) + - Fix the issue that the orphan peer is deleted when the number of replicas does not meet the requirements [#7584](https://github.com/tikv/pd/issues/7584) @[bufferflies](https://github.com/bufferflies) + - Fix the issue that querying a Region without a leader using `pd-ctl` might cause PD to panic [#7630](https://github.com/tikv/pd/issues/7630) @[rleungx](https://github.com/rleungx) + ++ TiFlash + + - Fix the issue that TiFlash might panic due to unstable network connections with PD during replica migration [#8323](https://github.com/pingcap/tiflash/issues/8323) @[JaySon-Huang](https://github.com/JaySon-Huang) + - Fix the issue that removing and then re-adding TiFlash replicas might lead to data corruption in TiFlash [#8695](https://github.com/pingcap/tiflash/issues/8695) @[JaySon-Huang](https://github.com/JaySon-Huang) + - Fix a potential issue that `FLASHBACK TABLE` or `RECOVER TABLE` might fail to recover data of some TiFlash replicas if `DROP TABLE` is executed immediately after data insertion [#8395](https://github.com/pingcap/tiflash/issues/8395) @[JaySon-Huang](https://github.com/JaySon-Huang) + - Fix incorrect display of maximum percentile time for some panels in Grafana [#8076](https://github.com/pingcap/tiflash/issues/8076) @[JaySon-Huang](https://github.com/JaySon-Huang) + - Fix the issue that TiFlash might crash during remote reads [#8685](https://github.com/pingcap/tiflash/issues/8685) @[guo-shaoge](https://github.com/guo-shaoge) + - Fix the issue that TiFlash incorrectly handles `ENUM` when the `ENUM` value is 0 [#8311](https://github.com/pingcap/tiflash/issues/8311) @[solotzg](https://github.com/solotzg) + - Fix the issue that short queries executed successfully print excessive info logs [#8592](https://github.com/pingcap/tiflash/issues/8592) @[windtalker](https://github.com/windtalker) + - Fix the issue that the memory usage increases significantly due to slow queries [#8564](https://github.com/pingcap/tiflash/issues/8564) @[JinheLin](https://github.com/JinheLin) + - Fix the issue that the `lowerUTF8` and `upperUTF8` functions do not allow characters in different cases to occupy different bytes [#8484](https://github.com/pingcap/tiflash/issues/8484) @[gengliqi](https://github.com/gengliqi) + - Fix the potential OOM issue that might occur when scanning multiple partitioned tables during stream read [#8505](https://github.com/pingcap/tiflash/issues/8505) @[gengliqi](https://github.com/gengliqi) + - Fix the issue of memory leak when TiFlash encounters memory limitation during query [#8447](https://github.com/pingcap/tiflash/issues/8447) @[JinheLin](https://github.com/JinheLin) + - Fix the TiFlash panic issue when TiFlash encounters conflicts during concurrent DDL execution [#8578](https://github.com/pingcap/tiflash/issues/8578) @[JaySon-Huang](https://github.com/JaySon-Huang) + - Fix the issue that TiFlash panics after executing `ALTER TABLE ... MODIFY COLUMN ... NOT NULL`, which changes nullable columns to non-nullable [#8419](https://github.com/pingcap/tiflash/issues/8419) @[JaySon-Huang](https://github.com/JaySon-Huang) + - Fix the issue that query results are incorrect when querying with filtering conditions like `ColumnRef in (Literal, Func...)` [#8631](https://github.com/pingcap/tiflash/issues/8631) @[Lloyd-Pottiger](https://github.com/Lloyd-Pottiger) + - Fix the issue that data of TiFlash replicas would still be garbage collected after executing `FLASHBACK DATABASE` [#8450](https://github.com/pingcap/tiflash/issues/8450) @[JaySon-Huang](https://github.com/JaySon-Huang) + - Fix the issue that TiFlash might not be able to select the GC owner of object storage data under the disaggregated storage and compute architecture [#8519](https://github.com/pingcap/tiflash/issues/8519) @[JaySon-Huang](https://github.com/JaySon-Huang) + - Fix the random invalid memory access issue that might occur with `GREATEST` or `LEAST` functions containing constant string parameters [#8604](https://github.com/pingcap/tiflash/issues/8604) @[windtalker](https://github.com/windtalker) + - Fix the issue that TiFlash replica data might be accidentally deleted after performing point-in-time recovery (PITR) or executing `FLASHBACK CLUSTER TO`, which might result in data anomalies [#8777](https://github.com/pingcap/tiflash/issues/8777) @[JaySon-Huang](https://github.com/JaySon-Huang) + - Fix the issue that TiFlash Anti Semi Join might return incorrect results when the join includes non-equivalent conditions [#8791](https://github.com/pingcap/tiflash/issues/8791) @[windtalker](https://github.com/windtalker) ++ Tools + + + Backup & Restore (BR) + + - Fix the issue that data restore is slowed down due to absence of a leader on a TiKV node [#50566](https://github.com/pingcap/tidb/issues/50566) @[Leavrth](https://github.com/Leavrth) + - Fix the issue that full restore still requires the target cluster to be empty after the `--filter` option is specified [#51009](https://github.com/pingcap/tidb/issues/51009) @[3pointer](https://github.com/3pointer) + - Fix the issue that when resuming from a checkpoint after data restore fails, an error `the target cluster is not fresh` occurs [#50232](https://github.com/pingcap/tidb/issues/50232) @[Leavrth](https://github.com/Leavrth) + - Fix the issue that stopping a log backup task causes TiDB to crash [#50839](https://github.com/pingcap/tidb/issues/50839) @[YuJuncen](https://github.com/YuJuncen) + - Fix the issue that the `Unsupported collation` error is reported when you restore data from backups of an old version [#49466](https://github.com/pingcap/tidb/issues/49466) @[3pointer](https://github.com/3pointer) + - Fix the issue that the log backup task can start but does not work properly if failing to connect to PD during task initialization [#16056](https://github.com/tikv/tikv/issues/16056) @[YuJuncen](https://github.com/YuJuncen) + - Fix the issue that BR generates incorrect URIs for external storage files [#48452](https://github.com/pingcap/tidb/issues/48452) @[3AceShowHand](https://github.com/3AceShowHand) + - Fix the issue that log backup gets stuck after changing the TiKV IP address on the same node [#50445](https://github.com/pingcap/tidb/issues/50445) @[3pointer](https://github.com/3pointer) + - Fix the issue that BR cannot retry when encountering an error while reading file content from S3 [#49942](https://github.com/pingcap/tidb/issues/49942) @[Leavrth](https://github.com/Leavrth) + + + TiCDC + + - Fix the issue that the sink module fails to restart correctly after encountering an error when Syncpoint is enabled (`enable-sync-point = true`) [#10091](https://github.com/pingcap/tiflow/issues/10091) @[hicqu](https://github.com/hicqu) + - Fix the issue that the file sequence number generated by the storage service might not increment correctly when using the storage sink [#10352](https://github.com/pingcap/tiflow/issues/10352) @[CharlesCheung96](https://github.com/CharlesCheung96) + - Fix the issue that the Syncpoint table might be incorrectly replicated [#10576](https://github.com/pingcap/tiflow/issues/10576) @[asddongmen](https://github.com/asddongmen) + - Fix the issue that OAuth2.0, TLS, and mTLS cannot be enabled properly when using Apache Pulsar as the downstream [#10602](https://github.com/pingcap/tiflow/issues/10602) @[asddongmen](https://github.com/asddongmen) + - Fix the issue that TiCDC returns the `ErrChangeFeedAlreadyExists` error when concurrently creating multiple changefeeds [#10430](https://github.com/pingcap/tiflow/issues/10430) @[CharlesCheung96](https://github.com/CharlesCheung96) + - Fix the issue that the changefeed `resolved ts` does not advance in extreme cases [#10157](https://github.com/pingcap/tiflow/issues/10157) @[sdojjy](https://github.com/sdojjy) + - Fix the issue that TiCDC mistakenly closes the connection with TiKV in certain special scenarios [#10239](https://github.com/pingcap/tiflow/issues/10239) @[hicqu](https://github.com/hicqu) + - Fix the issue that the TiCDC server might panic when replicating data to an object storage service [#10137](https://github.com/pingcap/tiflow/issues/10137) @[sdojjy](https://github.com/sdojjy) + - Fix the issue that the changefeed reports an error after `TRUNCATE PARTITION` is executed on the upstream table [#10522](https://github.com/pingcap/tiflow/issues/10522) @[sdojjy](https://github.com/sdojjy) + - Fix the issue that after filtering out `add table partition` events is configured in `ignore-event`, TiCDC does not replicate other types of DML changes for related partitions to the downstream [#10524](https://github.com/pingcap/tiflow/issues/10524) @[CharlesCheung96](https://github.com/CharlesCheung96) + - Fix the potential data race issue during `kv-client` initialization [#10095](https://github.com/pingcap/tiflow/issues/10095) @[3AceShowHand](https://github.com/3AceShowHand) + + + TiDB Data Migration (DM) + + - Fix the issue that a migration task error occurs when the downstream table structure contains `shard_row_id_bits` [#10308](https://github.com/pingcap/tiflow/issues/10308) @[GMHDBJD](https://github.com/GMHDBJD) + - Fix the issue that DM encounters "event type truncate not valid" error that causes the upgrade to fail [#10282](https://github.com/pingcap/tiflow/issues/10282) @[GMHDBJD](https://github.com/GMHDBJD) diff --git a/releases/release-7.6.0.md b/releases/release-7.6.0.md index c7a98e069a5a1..b5bee4048eb99 100644 --- a/releases/release-7.6.0.md +++ b/releases/release-7.6.0.md @@ -111,9 +111,9 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v7.6/quick-start-with- * Improve the performance of creating tables by 10 times (experimental) [#49752](https://github.com/pingcap/tidb/issues/49752) @[gmhdbjd](https://github.com/gmhdbjd) - In previous versions, when migrating tens of thousands of tables from the upstream database to TiDB, it is time-consuming and inefficient for TiDB to create these tables. Starting from v7.6.0, TiDB introduces a new TiDB DDL V2 architecture. You can enable it by configuring the system variable [`tidb_ddl_version`](/system-variables.md#tidb_ddl_version-new-in-v760). Compared with previous versions, the new version of the DDL improves the performance of creating batch tables by 10 times, and significantly reduces time for creating tables. + In previous versions, when migrating tens of thousands of tables from the upstream database to TiDB, it is time-consuming and inefficient for TiDB to create these tables. Starting from v7.6.0, TiDB introduces a new TiDB DDL V2 architecture. You can enable it by configuring the system variable [`tidb_ddl_version`](https://docs.pingcap.com/tidb/v7.6/system-variables#tidb_ddl_version-new-in-v760). Compared with previous versions, the new version of the DDL improves the performance of creating batch tables by 10 times, and significantly reduces time for creating tables. - For more information, see [documentation](/ddl-v2.md). + For more information, see [documentation](https://docs.pingcap.com/tidb/v7.6/ddl-v2). * Support periodic full compaction (experimental) [#12729](https://github.com/tikv/tikv/issues/12729) [afeinberg](https://github.com/afeinberg) @@ -269,7 +269,7 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v7.6/quick-start-with- | [`tidb_auto_analyze_partition_batch_size`](/system-variables.md#tidb_auto_analyze_partition_batch_size-new-in-v640) | Modified | Changes the default value from `1` to `128` after further tests. | | [`tidb_sysproc_scan_concurrency`](/system-variables.md#tidb_sysproc_scan_concurrency-new-in-v650) | Modified | In a large-scale cluster, the concurrency of `scan` operations can be adjusted higher to meet the needs of `ANALYZE`. Therefore, change the maximum value from `256` to `4294967295`. | | [`tidb_analyze_distsql_scan_concurrency`](/system-variables.md#tidb_analyze_distsql_scan_concurrency-new-in-v760) | Newly added | Sets the concurrency of the `scan` operation when executing the `ANALYZE` operation. The default value is `4`. | -| [`tidb_ddl_version`](/system-variables.md#tidb_ddl_version-new-in-v760) | Newly added | Controls whether to enable [TiDB DDL V2](/ddl-v2.md). Set the value to `2` to enable it and `1` to disable it. The default value is `1`. When TiDB DDL V2 is enabled, DDL statements will be executed using TiDB DDL V2. The execution speed of DDL statements for creating tables is increased by 10 times compared with TiDB DDL V1. | +| [`tidb_ddl_version`](https://docs.pingcap.com/tidb/v7.6/system-variables#tidb_ddl_version-new-in-v760) | Newly added | Controls whether to enable [TiDB DDL V2](https://docs.pingcap.com/tidb/v7.6/ddl-v2). Set the value to `2` to enable it and `1` to disable it. The default value is `1`. When TiDB DDL V2 is enabled, DDL statements will be executed using TiDB DDL V2. The execution speed of DDL statements for creating tables is increased by 10 times compared with TiDB DDL V1. | | [`tidb_enable_global_index`](/system-variables.md#tidb_enable_global_index-new-in-v760) | Newly added | Controls whether to support creating `Global indexes` for partitioned tables. The default value is `OFF`. `Global index` is currently in the development stage. **It is not recommended to modify the value of this system variable**. | | [`tidb_idle_transaction_timeout`](/system-variables.md#tidb_idle_transaction_timeout-new-in-v760) | Newly added | Controls the idle timeout for transactions in a user session. When a user session is in a transactional state and remains idle for a duration exceeding the value of this variable, TiDB will terminate the session. The default value `0` means unlimited. | | [`tidb_opt_enable_fuzzy_binding`](/system-variables.md#tidb_opt_enable_fuzzy_binding-new-in-v760) | Newly added | Controls whether to enable the cross-database binding feature. The default value `OFF` means cross-database binding is disabled. | @@ -284,7 +284,7 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v7.6/quick-start-with- | TiKV | [`blob-file-compression`](/tikv-configuration-file.md#blob-file-compression) | Modified | The algorithm used for compressing values in Titan, which takes value as the unit. Starting from TiDB v7.6.0, the default compression algorithm is `zstd`. | | TiKV | [`rocksdb.defaultcf.titan.min-blob-size`](/tikv-configuration-file.md#min-blob-size) | Modified | Starting from TiDB v7.6.0, the default value for new clusters is `32KB`. For existing clusters upgrading to v7.6.0, the default value `1KB` remains unchanged. | | TiKV | [`rocksdb.titan.enabled`](/tikv-configuration-file.md#enabled) | Modified | Enables or disables Titan. For v7.5.0 and earlier versions, the default value is `false`. Starting from v7.6.0, the default value is `true` for only new clusters. Existing clusters upgraded to v7.6.0 or later versions will retain the original configuration. | -| TiKV | [`gc.num-threads`](/tikv-configuration-file.md#num-threads-new-in-v658-and-v760) | Newly added | When `enable-compaction-filter` is set to `false`, this parameter controls the number of GC threads. The default value is `1`. | +| TiKV | [`gc.num-threads`](/tikv-configuration-file.md#num-threads-new-in-v658-v714-v751-and-v760) | Newly added | When `enable-compaction-filter` is set to `false`, this parameter controls the number of GC threads. The default value is `1`. | | TiKV | [`raftstore.periodic-full-compact-start-times`](/tikv-configuration-file.md#periodic-full-compact-start-times-new-in-v760) | Newly added | Sets the specific times that TiKV initiates periodic full compaction. The default value `[]` means periodic full compaction is disabled. | | TiKV | [`raftstore.periodic-full-compact-start-max-cpu`](/tikv-configuration-file.md#periodic-full-compact-start-max-cpu-new-in-v760) | Newly added | Limits the maximum CPU usage rate for TiKV periodic full compaction. The default value is `0.1`. | | TiKV | [`zstd-dict-size`](/tikv-configuration-file.md#zstd-dict-size) | Newly added | Specifies the `zstd` dictionary compression size. The default value is `"0KB"`, which means to disable the `zstd` dictionary compression. | diff --git a/releases/release-notes.md b/releases/release-notes.md index 29b8db0f8b366..0d18efd07d047 100644 --- a/releases/release-notes.md +++ b/releases/release-notes.md @@ -11,6 +11,7 @@ aliases: ['/docs/dev/releases/release-notes/','/docs/dev/releases/rn/'] ## 7.5 +- [7.5.1](/releases/release-7.5.1.md): 2024-02-29 - [7.5.0](/releases/release-7.5.0.md): 2023-12-01 ## 7.4 @@ -27,6 +28,7 @@ aliases: ['/docs/dev/releases/release-notes/','/docs/dev/releases/rn/'] ## 7.1 +- [7.1.4](/releases/release-7.1.4.md): 2024-03-11 - [7.1.3](/releases/release-7.1.3.md): 2023-12-21 - [7.1.2](/releases/release-7.1.2.md): 2023-10-25 - [7.1.1](/releases/release-7.1.1.md): 2023-07-24 diff --git a/releases/release-timeline.md b/releases/release-timeline.md index 736de96273f97..32ef079de9802 100644 --- a/releases/release-timeline.md +++ b/releases/release-timeline.md @@ -9,6 +9,8 @@ This document shows all the released TiDB versions in reverse chronological orde | Version | Release Date | | :--- | :--- | +| [7.1.4](/releases/release-7.1.4.md) | 2024-03-11 | +| [7.5.1](/releases/release-7.5.1.md) | 2024-02-29 | | [6.5.8](/releases/release-6.5.8.md) | 2024-02-02 | | [7.6.0-DMR](/releases/release-7.6.0.md) | 2024-01-25 | | [6.5.7](/releases/release-6.5.7.md) | 2024-01-08 | diff --git a/security-compatibility-with-mysql.md b/security-compatibility-with-mysql.md index e96c74c830afa..ba6a5b0972cc8 100644 --- a/security-compatibility-with-mysql.md +++ b/security-compatibility-with-mysql.md @@ -108,9 +108,9 @@ The implementation mechanisms are consistent between TiDB and MySQL. Both use th ## Authentication plugin status -TiDB supports multiple authentication methods. These methods can be specified on a per user basis using [`CREATE USER`](/sql-statements/sql-statement-create-user.md) and [`ALTER USER`](/sql-statements/sql-statement-create-user.md). These methods are compatible with the authentication methods of MySQL with the same names. +TiDB supports multiple authentication methods. These methods can be specified on a per user basis using [`CREATE USER`](/sql-statements/sql-statement-create-user.md) and [`ALTER USER`](/sql-statements/sql-statement-alter-user.md). These methods are compatible with the authentication methods of MySQL with the same names. -You can use one of the following supported authentication methods in the table. To specify a default method that the server advertises when the client-server connection is being established, set the [`default_authentication_plugin`](/system-variables.md#default_authentication_plugin) variable. `tidb_sm3_password` is the SM3 authentication method only supported in TiDB. Therefore, to authenticate using this method, you must connect to TiDB using [TiDB-JDBC](https://github.com/pingcap/mysql-connector-j/tree/release/8.0-sm3). `tidb_auth_token` is a JSON Web Token (JWT) based authentication method used only in TiDB Cloud. +You can use one of the following supported authentication methods in the table. To specify a default method that the server advertises when the client-server connection is being established, set the [`default_authentication_plugin`](/system-variables.md#default_authentication_plugin) variable. `tidb_sm3_password` is the SM3 authentication method only supported in TiDB. Therefore, to authenticate using this method, you must connect to TiDB using [TiDB-JDBC](https://github.com/pingcap/mysql-connector-j/tree/release/8.0-sm3). `tidb_auth_token` is a JSON Web Token (JWT)-based authentication method used in TiDB Cloud, and you can also configure it for use in TiDB Self-Hosted. @@ -140,3 +140,122 @@ The support for TLS authentication is configured differently. For detailed infor | ed25519 (MariaDB) | No | | GSSAPI (MariaDB) | No | | FIDO | No | + +### `tidb_auth_token` + +`tidb_auth_token` is a passwordless authentication method based on [JSON Web Token (JWT)](https://datatracker.ietf.org/doc/html/rfc7519). In v6.4.0, `tidb_auth_token` is only used for user authentication in TiDB Cloud. Starting from v6.5.0, you can also configure `tidb_auth_token` as a user authentication method for TiDB Self-Hosted. Different from password-based authentication methods such as `mysql_native_passsword` and `caching_sha2_password`, when you create users using `tidb_auth_token`, there is no need to set or store custom passwords. To log into TiDB, users only need to use a signed token instead of a password, which simplifies the authentication process and improves security. + +#### JWT + +JWT consists of three parts: Header, Payload, and Signature. After being encoded using base64, they are concatenated into a string separated by dots (`.`) for transmission between the client and server. + +The Header describes the metadata of the JWT, including 3 parameters: + +* `alg`: the algorithm for signature, which is `RS256` by default. +* `typ`: the type of token, which is `JWT`. +* `kid`: the key ID for generating token signature. + +Here is an example for Header: + +```json +{ + "alg": "RS256", + "kid": "the-key-id-0", + "typ": "JWT" +} +``` + +Payload is the main part of JWT, which stores the user information. Each field in the Payload is called a claim. The claims required for TiDB user authentication are as follows: + +* `iss`: if `TOKEN_ISSUER` is not specified or set to empty when [`CREATE USER`](/sql-statements/sql-statement-create-user.md), this claim is not required; otherwise, `iss` should use the same value as `TOKEN_ISSUER`. +* `sub`: this claim is required to be the same as the username to be authenticated. +* `iat`: it means `issued at`, the timestamp when the token is issued. In TiDB, this value must not be later than the authentication time or earlier than 15 minutes before authentication. +* `exp`: the timestamp when the token expires. If it is earlier than the time of authentication, the authentication fails. +* `email`: the email can be specified when creating a user by `ATTRIBUTE '{"email": "xxxx@pingcap.com"}`. If no email is specified when a user is created, this claim must be set as an empty string; otherwise, this claim must be the same as the specified value when the user is created. + +Here is an example for Payload: + +```json +{ + "email": "user@pingcap.com", + "exp": 1703305494, + "iat": 1703304594, + "iss": "issuer-abc", + "sub": "user@pingcap.com" +} +``` + +Signature is used to sign the Header and Payload data. + +> **Warning:** +> +> - The encoding of the Header and Payload in base64 is reversible. Do **Not** attach any sensitive information to them. +> - The `tidb_auth_token` authentication method requires clients to support the [`mysql_clear_password`](https://dev.mysql.com/doc/refman/8.0/en/cleartext-pluggable-authentication.html) plugin to send the token to TiDB in plain text. Therefore, you need to [enale TLS between clients and servers](/enable-tls-between-clients-and-servers.md) before using `tidb_auth_token`. + +#### Usage + +To configure and use `tidb_auth_token` as the authentication method for TiDB Self-Hosted users, take the following steps: + +1. Configure [`auth-token-jwks`](/tidb-configuration-file.md#auth-token-jwks-new-in-v640) and [`auth-token-refresh-interval`](/tidb-configuration-file.md#auth-token-refresh-interval-new-in-v640) in the TiDB configuration file. + + For example, you can get an example JWKS using the following command: + + ```bash + wget https://raw.githubusercontent.com/CbcWestwolf/generate_jwt/master/JWKS.json + ``` + + Then, configure the path of the example JWKS in `config.toml`: + + ```toml + [security] + auth-token-jwks = "JWKS.json" + ``` + +2. Start `tidb-server` and periodically update and save the JWKS to the path specified by `auth-token-jwks`. + +3. Create a user with `tidb_auth_token`, and specify `iss` and `email` as needed using `REQUIRE TOKEN_ISSUER` and `ATTRIBUTE '{"email": "xxxx@pingcap.com"}`. + + For example, create a user `user@pingcap.com` with `tidb_auth_token`: + + ```sql + CREATE USER 'user@pingcap.com' IDENTIFIED WITH 'tidb_auth_token' REQUIRE TOKEN_ISSUER 'issuer-abc' ATTRIBUTE '{"email": "user@pingcap.com"}'; + ``` + +4. Generate and sign a token for authentication, and authenticate using the `mysql_clear_text` plugin of the MySQL client. + + Install the JWT generation tool via `go install github.com/cbcwestwolf/generate_jwt` (this tool is only used for testing `tidb_auth_token`). For example: + + ```text + generate_jwt --kid "the-key-id-0" --sub "user@pingcap.com" --email "user@pingcap.com" --iss "issuer-abc" + ``` + + It prints the public key and token as follows: + + ```text + -----BEGIN PUBLIC KEY----- + MIIBCgKCAQEAq8G5n9XBidxmBMVJKLOBsmdOHrCqGf17y9+VUXingwDUZxRp2Xbu + LZLbJtLgcln1lC0L9BsogrWf7+pDhAzWovO6Ai4Aybu00tJ2u0g4j1aLiDdsy0gy + vSb5FBoL08jFIH7t/JzMt4JpF487AjzvITwZZcnsrB9a9sdn2E5B/aZmpDGi2+Is + f5osnlw0zvveTwiMo9ba416VIzjntAVEvqMFHK7vyHqXbfqUPAyhjLO+iee99Tg5 + AlGfjo1s6FjeML4xX7sAMGEy8FVBWNfpRU7ryTWoSn2adzyA/FVmtBvJNQBCMrrA + hXDTMJ5FNi8zHhvzyBKHU0kBTS1UNUbP9wIDAQAB + -----END PUBLIC KEY----- + + eyJhbGciOiJSUzI1NiIsImtpZCI6InRoZS1rZXktaWQtMCIsInR5cCI6IkpXVCJ9.eyJlbWFpbCI6InVzZXJAcGluZ2NhcC5jb20iLCJleHAiOjE3MDMzMDU0OTQsImlhdCI6MTcwMzMwNDU5NCwiaXNzIjoiaXNzdWVyLWFiYyIsInN1YiI6InVzZXJAcGluZ2NhcC5jb20ifQ.T4QPh2hTB5on5xCuvtWiZiDTuuKvckggNHtNaovm1F4RvwUv15GyOqj9yMstE-wSoV5eLEcPC2HgE6eN1C6yH_f4CU-A6n3dm9F1w-oLbjts7aYCl8OHycVYnq609fNnb8JLsQAmd1Zn9C0JW899-WSOQtvjLqVSPe9prH-cWaBVDQXzUJKxwywQzk9v-Z1Njt9H3Rn9vvwwJEEPI16VnaNK38I7YG-1LN4fAG9jZ6Zwvz7vb_s4TW7xccFf3dIhWTEwOQ5jDPCeYkwraRXU8NC6DPF_duSrYJc7d7Nu9Z2cr-E4i1Rt_IiRTuIIzzKlcQGg7jd9AGEfGe_SowsA-w + ``` + + Copy the preceding token in the last line for login: + + ```Shell + mycli -h 127.0.0.1 -P 4000 -u 'user@pingcap.com' -p '' + ``` + + Ensure that the MySQL client here supports the `mysql_clear_password` plugin. [mycli](https://www.mycli.net/) supports and enables this plugin by default. If you are using the [mysql command-line client](https://dev.mysql.com/doc/refman/8.0/en/mysql.html), you need to use the `--enable-cleartext-plugin` option to enable this plugin: + + ```Shell + mysql -h 127.0.0.1 -P 4000 -u 'user@pingcap.com' -p'' --enable-cleartext-plugin + ``` + + If an incorrect `--sub` is specified when the token is generated (such as `--sub "wronguser@pingcap.com"`), the authentication using this token would fail. + +You can encode and decode a token using the debugger provided by [jwt.io](https://jwt.io/). \ No newline at end of file diff --git a/smooth-upgrade-tidb.md b/smooth-upgrade-tidb.md index 3901bc018e8dd..c6fb1facc6559 100644 --- a/smooth-upgrade-tidb.md +++ b/smooth-upgrade-tidb.md @@ -75,7 +75,10 @@ When using the smooth upgrade feature, note the following limitations. ### Limitations on user operations -* Before the upgrade, if there is a canceling DDL job in the cluster, that is, an ongoing DDL job is being canceled by a user, because the job in the canceling state cannot be paused, TiDB will retry canceling the job. If the retry fails, an error is reported and the upgrade is exited. +* Before the upgrade, consider the following restrictions: + + * If there is a canceling DDL job in the cluster, that is, an ongoing DDL job is being canceled by a user, because the job in the canceling state cannot be paused, TiDB will retry canceling the job. If the retry fails, an error is reported and the upgrade is exited. + * If the TiDB Distributed eXecution Framework (DXF) is enabled, disable it by setting [`tidb_enable_dist_task`](/system-variables.md#tidb_enable_dist_task-new-in-v710) to `OFF` (its default value). Make sure that all ongoing distributed `ADD INDEX` and `IMPORT INTO` tasks are completed. Alternatively, you can cancel these tasks and wait until the upgrade is complete to restart them. Otherwise, the `ADD INDEX` operations during the upgrade might cause data index inconsistency. * In scenarios of using TiUP to upgrade TiDB, because TiUP upgrade has a timeout period, if the cluster has a large number of DDL jobs (more than 300) waiting in queues before the upgrade, the upgrade might fail. diff --git a/sql-non-prepared-plan-cache.md b/sql-non-prepared-plan-cache.md index 35eb61467fd5b..1378361f75c97 100644 --- a/sql-non-prepared-plan-cache.md +++ b/sql-non-prepared-plan-cache.md @@ -87,7 +87,7 @@ Due to the preceding risks and the fact that the execution plan cache only provi - Queries that filter on columns of `JSON`, `ENUM`, `SET`, or `BIT` type are not supported, such as `SELECT * FROM t WHERE json_col = '{}'`. - Queries that filter on `NULL` values are not supported, such as `SELECT * FROM t WHERE a is NULL`. - Queries with more than 200 parameters after parameterization are not supported by default, such as `SELECT * FROM t WHERE a in (1, 2, 3, ... 201)`. Starting from v7.3.0, you can modify this limit by setting the [`44823`](/optimizer-fix-controls.md#44823-new-in-v730) fix in the [`tidb_opt_fix_control`](/system-variables.md#tidb_opt_fix_control-new-in-v653-and-v710) system variable. -- Queries that access partitioned tables, virtual columns, temporary tables, views, or memory tables are not supported, such as `SELECT * FROM INFORMATION_SCHEMA.COLUMNS`, where `COLUMNS` is a TiDB memory table. +- Queries that access virtual columns, temporary tables, views, or memory tables are not supported, such as `SELECT * FROM INFORMATION_SCHEMA.COLUMNS`, where `COLUMNS` is a TiDB memory table. - Queries with hints or bindings are not supported. - DML statements or `SELECT` statements with the `FOR UPDATE` clause are not supported by default. To remove this restriction, you can execute `SET tidb_enable_non_prepared_plan_cache_for_dml = ON`. diff --git a/sql-plan-management.md b/sql-plan-management.md index 7f8a6d2258e62..e402c92b4d7c6 100644 --- a/sql-plan-management.md +++ b/sql-plan-management.md @@ -236,7 +236,7 @@ To make the execution plan of a SQL statement fixed to a historical execution pl When using this feature, note the following: - The feature generates hints according to historical execution plans and uses the generated hints for binding. Because historical execution plans are stored in [Statement Summary Tables](/statement-summary-tables.md), before using this feature, you need to enable the [`tidb_enable_stmt_summary`](/system-variables.md#tidb_enable_stmt_summary-new-in-v304) system variable first. -- This feature does not support TiFlash queries, Join queries with three or more tables, and queries that contain subqueries. +- For TiFlash queries, Join queries with three or more tables, and queries that contain subqueries, the auto-generated hints are not adequate, which might result in the plan not being fully bound. In such cases, a warning will occur when creating a binding. - If a historical execution plan is for a SQL statement with hints, the hints will be added to the binding. For example, after executing `SELECT /*+ max_execution_time(1000) */ * FROM t`, the binding created with its `plan_digest` will include `max_execution_time(1000)`. The SQL statement of this binding method is as follows: @@ -484,6 +484,52 @@ SHOW binding_cache status; 1 row in set (0.00 sec) ``` +## Utilize the statement summary table to obtain queries that need to be bound + +[Statement summary](/statement-summary-tables.md) records recent SQL execution information, such as latency, execution times, and corresponding query plans. You can query statement summary tables to get qualified `plan_digest`, and then [create bindings according to these historical execution plans](/sql-plan-management.md#create-a-binding-according-to-a-historical-execution-plan). + +The following example queries `SELECT` statements that have been executed more than 10 times in the past two weeks, and have multiple execution plans without SQL binding. It sorts the queries by the execution times, and binds the top 100 queries to their fastest plans. + +```sql +WITH stmts AS ( -- Gets all information + SELECT * FROM INFORMATION_SCHEMA.CLUSTER_STATEMENTS_SUMMARY + UNION ALL + SELECT * FROM INFORMATION_SCHEMA.CLUSTER_STATEMENTS_SUMMARY_HISTORY +), +best_plans AS ( + SELECT plan_digest, `digest`, avg_latency, + CONCAT('create global binding from history using plan digest "', plan_digest, '"') as binding_stmt + FROM stmts t1 + WHERE avg_latency = (SELECT min(avg_latency) FROM stmts t2 -- The plan with the lowest query latency + WHERE t2.`digest` = t1.`digest`) +) + +SELECT any_value(digest_text) as query, + SUM(exec_count) as exec_count, + plan_hint, binding_stmt +FROM stmts, best_plans +WHERE stmts.`digest` = best_plans.`digest` + AND summary_begin_time > DATE_SUB(NOW(), interval 14 day) -- Executed in the past 2 weeks + AND stmt_type = 'Select' -- Only consider select statements + AND schema_name NOT IN ('INFORMATION_SCHEMA', 'mysql') -- Not an internal query + AND plan_in_binding = 0 -- No binding yet +GROUP BY stmts.`digest` + HAVING COUNT(DISTINCT(stmts.plan_digest)) > 1 -- This query is unstable. It has more than 1 plan. + AND SUM(exec_count) > 10 -- High-frequency, and has been executed more than 10 times. +ORDER BY SUM(exec_count) DESC LIMIT 100; -- Top 100 high-frequency queries. +``` + +By applying certain filtering conditions to obtain queries that meet the criteria, you can then directly execute the statements in the corresponding `binding_stmt` column to create bindings. + +``` ++---------------------------------------------+------------+-----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------+ +| query | exec_count | plan_hint | binding_stmt | ++---------------------------------------------+------------+-----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------+ +| select * from `t` where `a` = ? and `b` = ? | 401 | use_index(@`sel_1` `test`.`t` `a`), no_order_index(@`sel_1` `test`.`t` `a`) | create global binding from history using plan digest "0d6e97fb1191bbd08dddefa7bd007ec0c422b1416b152662768f43e64a9958a6" | +| select * from `t` where `b` = ? and `c` = ? | 104 | use_index(@`sel_1` `test`.`t` `b`), no_order_index(@`sel_1` `test`.`t` `b`) | create global binding from history using plan digest "80c2aa0aa7e6d3205755823aa8c6165092c8521fb74c06a9204b8d35fc037dd9" | ++---------------------------------------------+------------+-----------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------+ +``` + ## Cross-database binding Starting from v7.6.0, you can create cross-database bindings in TiDB by using the wildcard `*` to represent a database name in the binding creation syntax. Before creating cross-database bindings, you need to first enable the [`tidb_opt_enable_fuzzy_binding`](/system-variables.md#tidb_opt_enable_fuzzy_binding-new-in-v760) system variable. diff --git a/sql-plan-replayer.md b/sql-plan-replayer.md index d849aafc9e3e3..c5a3bc5801b4f 100644 --- a/sql-plan-replayer.md +++ b/sql-plan-replayer.md @@ -143,7 +143,7 @@ With an existing `ZIP` file exported using `PLAN REPLAYER`, you can use the `PLA PLAN REPLAYER LOAD 'file_name'; ``` -In the statement above, `file_name` is the name of the `ZIP` file to be exported. +In the statement above, `file_name` is the name of the `ZIP` file to be imported. For example: @@ -153,6 +153,16 @@ For example: PLAN REPLAYER LOAD 'plan_replayer.zip'; ``` +> **Note:** +> +> You need to disable auto analyze. Otherwise the imported statistics will be overwritten by analyze. + +You can disable auto analyze by setting the [`tidb_enable_auto_analyze`](/system-variables.md#tidb_enable_auto_analyze-new-in-v610) system variable to `OFF`: + +```sql +set @@global.tidb_enable_auto_analyze = OFF; +``` + After the cluster information is imported, the TiDB cluster is loaded with the required table schema, statistics and other information that affects the construction of the execution plan. You can view the execution plan and verify statistics in the following way: ```sql diff --git a/sql-prepared-plan-cache.md b/sql-prepared-plan-cache.md index 2c3bb789993f3..07786ff8d2e05 100644 --- a/sql-prepared-plan-cache.md +++ b/sql-prepared-plan-cache.md @@ -20,7 +20,7 @@ TiDB also supports execution plan caching for some non-`PREPARE` statements, sim In the current version of TiDB, if a `Prepare` statement meets any of the following conditions, the query or the plan is not cached: - The query contains SQL statements other than `SELECT`, `UPDATE`, `INSERT`, `DELETE`, `Union`, `Intersect`, and `Except`. -- The query accesses partitioned tables or temporary tables, or a table that contains generated columns. +- The query accesses temporary tables, or a table that contains generated columns. - The query contains non-correlated sub-queries, such as `SELECT * FROM t1 WHERE t1.a > (SELECT 1 FROM t2 WHERE t2.b < 1)`. - The query contains correlated sub-queries with `PhysicalApply` operators in the execution plan, such as `SELECT * FROM t1 WHERE t1.a > (SELECT a FROM t2 WHERE t1.b > t2.b)`. - The query contains the `ignore_plan_cache` or `set_var` hint, such as `SELECT /*+ ignore_plan_cache() */ * FROM t` or `SELECT /*+ set_var(max_execution_time=1) */ * FROM t`. diff --git a/sql-statements/sql-statement-alter-index.md b/sql-statements/sql-statement-alter-index.md index 3224a68dfaa8a..f729fb3ef24d7 100644 --- a/sql-statements/sql-statement-alter-index.md +++ b/sql-statements/sql-statement-alter-index.md @@ -6,7 +6,7 @@ aliases: ['/docs/dev/sql-statements/sql-statement-alter-index/'] # ALTER INDEX -The `ALTER INDEX` statement is used to modify the visibility of the index to `Visible` or `Invisible`. Invisible indexes are maintained by DML statements, but will not be used by the query optimizer. This is useful in scenarios where you want to double-check before removing an index permanently. +The `ALTER INDEX` statement is used to modify the visibility of the index to `Visible` or `Invisible`. Invisible indexes are maintained by DML statements, but will not be used by the query optimizer. This is useful in scenarios where you want to double-check before removing an index permanently. Starting from TiDB v8.0.0, you can make the optimizer select invisible indexes by modifying the system variable [`tidb_opt_use_invisible_indexes`](/system-variables.md#tidb_opt_use_invisible_indexes-new-in-v800). ## Synopsis @@ -119,7 +119,6 @@ Query OK, 0 rows affected (0.02 sec) * Invisible indexes in TiDB are modeled on the equivalent feature from MySQL 8.0. * Similiar to MySQL, TiDB does not permit `PRIMARY KEY` indexes to be made invisible. -* MySQL provides an optimizer switch `use_invisible_indexes=on` to make all invisible indexes _visible_ again. This functionality is not available in TiDB. ## See also diff --git a/sql-statements/sql-statement-alter-user.md b/sql-statements/sql-statement-alter-user.md index 62cacea484759..3adddf1d8af80 100644 --- a/sql-statements/sql-statement-alter-user.md +++ b/sql-statements/sql-statement-alter-user.md @@ -20,6 +20,12 @@ UserSpecList ::= UserSpec ::= Username AuthOption +RequireClauseOpt ::= + ( 'REQUIRE' 'NONE' | 'REQUIRE' 'SSL' | 'REQUIRE' 'X509' | 'REQUIRE' RequireList )? + +RequireList ::= + ( "ISSUER" stringLit | "SUBJECT" stringLit | "CIPHER" stringLit | "SAN" stringLit | "TOKEN_ISSUER" stringLit )* + Username ::= StringName ('@' StringName | singleAtIdentifier)? | 'CURRENT_USER' OptionalBraces @@ -33,6 +39,10 @@ LockOption ::= ( 'ACCOUNT' 'LOCK' | 'ACCOUNT' 'UNLOCK' )? AttributeOption ::= ( 'COMMENT' CommentString | 'ATTRIBUTE' AttributeString )? ResourceGroupNameOption::= ( 'RESOURCE' 'GROUP' Identifier)? + +RequireClauseOpt ::= ('REQUIRE' ('NONE' | 'SSL' | 'X509' | RequireListElement ('AND'? RequireListElement)*))? + +RequireListElement ::= 'ISSUER' Issuer | 'SUBJECT' Subject | 'CIPHER' Cipher | 'SAN' SAN | 'TOKEN_ISSUER' TokenIssuer ``` ## Examples diff --git a/sql-statements/sql-statement-create-index.md b/sql-statements/sql-statement-create-index.md index 5749fe1efffce..2211011be0ddd 100644 --- a/sql-statements/sql-statement-create-index.md +++ b/sql-statements/sql-statement-create-index.md @@ -342,18 +342,19 @@ See [Index Selection - Use multi-valued indexes](/choose-index.md#use-multi-valu - Compared with normal indexes, DML operations will modify more index records for multi-valued indexes, so multi-valued indexes will have a greater performance impact than normal indexes. - Because multi-valued indexes are a special type of expression index, multi-valued indexes have the same limitations as expression indexes. - If a table uses multi-valued indexes, you cannot back up, replicate, or import the table using BR, TiCDC, or TiDB Lightning to a TiDB cluster earlier than v6.6.0. -- Since improvements to the statistics for multi-valued indexes are still ongoing, when a query hits multiple multi-valued indexes, TiDB might not be able to select the optimal index. In such cases, it is recommended to use the [`use_index_merge`](/optimizer-hints.md#use_index_merget1_name-idx1_name--idx2_name-) optimizer hint to enforce a fixed execution plan. For detailed steps, refer to [Use multi-valued indexes](/choose-index.md#use-multi-valued-indexes). - For a query with complex conditions, TiDB might not be able to select multi-valued indexes. For information on the condition patterns supported by multi-valued indexes, refer to [Use multi-valued indexes](/choose-index.md#use-multi-valued-indexes). ## Invisible index -Invisible indexes are indexes that are ignored by the query optimizer: +By default, invisible indexes are indexes that are ignored by the query optimizer: ```sql CREATE TABLE t1 (c1 INT, c2 INT, UNIQUE(c2)); CREATE UNIQUE INDEX c1 ON t1 (c1) INVISIBLE; ``` +Starting from TiDB v8.0.0, you can make the optimizer select invisible indexes by modifying the system variable [`tidb_opt_use_invisible_indexes`](/system-variables.md#tidb_opt_use_invisible_indexes-new-in-v800). + For details, see [`ALTER INDEX`](/sql-statements/sql-statement-alter-index.md). ## Associated system variables diff --git a/sql-statements/sql-statement-create-table.md b/sql-statements/sql-statement-create-table.md index b9651433d6b09..cc0106328bf86 100644 --- a/sql-statements/sql-statement-create-table.md +++ b/sql-statements/sql-statement-create-table.md @@ -133,7 +133,7 @@ The following *table_options* are supported. Other options such as `AVG_ROW_LENG | `AUTO_INCREMENT` | The initial value of the increment field | `AUTO_INCREMENT` = 5 | | [`SHARD_ROW_ID_BITS`](/shard-row-id-bits.md)| To set the number of bits for the implicit `_tidb_rowid` shards |`SHARD_ROW_ID_BITS` = 4| |`PRE_SPLIT_REGIONS`| To pre-split `2^(PRE_SPLIT_REGIONS)` Regions when creating a table |`PRE_SPLIT_REGIONS` = 4| -|`AUTO_ID_CACHE`| To set the auto ID cache size in a TiDB instance. By default, TiDB automatically changes this size according to allocation speed of auto ID |`AUTO_ID_CACHE` = 200. Note that this option is not available on [TiDB Serverless](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters.| +|`AUTO_ID_CACHE`| To set the auto ID cache size in a TiDB instance. By default, TiDB automatically changes this size according to allocation speed of auto ID |`AUTO_ID_CACHE` = 200 | |`AUTO_RANDOM_BASE`| To set the initial incremental part value of auto_random. This option can be considered as a part of the internal interface. Users can ignore this parameter |`AUTO_RANDOM_BASE` = 0| | `CHARACTER SET` | To specify the [character set](/character-set-and-collation.md) for the table | `CHARACTER SET` = 'utf8mb4' | | `COMMENT` | The comment information | `COMMENT` = 'comment info' | diff --git a/sql-statements/sql-statement-create-user.md b/sql-statements/sql-statement-create-user.md index 76d0c78569c07..39ef112943a66 100644 --- a/sql-statements/sql-statement-create-user.md +++ b/sql-statements/sql-statement-create-user.md @@ -20,6 +20,12 @@ IfNotExists ::= UserSpecList ::= UserSpec ( ',' UserSpec )* +RequireClauseOpt ::= + ( 'REQUIRE' 'NONE' | 'REQUIRE' 'SSL' | 'REQUIRE' 'X509' | 'REQUIRE' RequireList )? + +RequireList ::= + ( "ISSUER" stringLit | "SUBJECT" stringLit | "CIPHER" stringLit | "SAN" stringLit | "TOKEN_ISSUER" stringLit )* + UserSpec ::= Username AuthOption @@ -37,6 +43,10 @@ LockOption ::= ( 'ACCOUNT' 'LOCK' | 'ACCOUNT' 'UNLOCK' )? AttributeOption ::= ( 'COMMENT' CommentString | 'ATTRIBUTE' AttributeString )? ResourceGroupNameOption::= ( 'RESOURCE' 'GROUP' Identifier)? + +RequireClauseOpt ::= ('REQUIRE' ('NONE' | 'SSL' | 'X509' | RequireListElement ('AND'? RequireListElement)*))? + +RequireListElement ::= 'ISSUER' Issuer | 'SUBJECT' Subject | 'CIPHER' Cipher | 'SAN' SAN | 'TOKEN_ISSUER' TokenIssuer ``` ## Examples diff --git a/sql-statements/sql-statement-explain.md b/sql-statements/sql-statement-explain.md index 029736828d8a1..e99e5b1d12598 100644 --- a/sql-statements/sql-statement-explain.md +++ b/sql-statements/sql-statement-explain.md @@ -6,7 +6,7 @@ aliases: ['/docs/dev/sql-statements/sql-statement-explain/','/docs/dev/reference # `EXPLAIN` -The `EXPLAIN` statement shows the execution plan for a query without executing it. It is complimented by `EXPLAIN ANALYZE` which will execute the query. If the output of `EXPLAIN` does not match the expected result, consider executing `ANALYZE TABLE` on each table in the query. +The `EXPLAIN` statement shows the execution plan for a query without executing it. It complements the `EXPLAIN ANALYZE` statement, which executes the query. If the output of `EXPLAIN` does not match the expected result, consider executing `ANALYZE TABLE` on each table in the query to make sure the table statistics are up to date. The statements `DESC` and `DESCRIBE` are aliases of this statement. The alternative usage of `EXPLAIN ` is documented under [`SHOW [FULL] COLUMNS FROM`](/sql-statements/sql-statement-show-columns-from.md). diff --git a/sql-statements/sql-statement-flashback-cluster.md b/sql-statements/sql-statement-flashback-cluster.md index a9152831a174f..e57553f1a6b20 100644 --- a/sql-statements/sql-statement-flashback-cluster.md +++ b/sql-statements/sql-statement-flashback-cluster.md @@ -8,12 +8,16 @@ aliases: ['/tidb/dev/sql-statement-flashback-to-timestamp'] TiDB v6.4.0 introduces the `FLASHBACK CLUSTER TO TIMESTAMP` syntax. You can use it to restore a cluster to a specific point in time. When specifying the timestamp, you can either set a datetime value or use a time function. The format of datetime is like '2016-10-08 16:45:26.999', with millisecond as the minimum time unit. But in most cases, specifying the timestamp with second as the time unit is sufficient, for example, '2016-10-08 16:45:26'. -Starting from v6.5.6, v7.1.3, and v7.6.0, TiDB introduces the `FLASHBACK CLUSTER TO TSO` syntax. This syntax enables you to use [TSO](/tso.md) to specify a more precise recovery point in time, thereby enhancing flexibility in data recovery. +Starting from v6.5.6, v7.1.3, v7.5.1, and v7.6.0, TiDB introduces the `FLASHBACK CLUSTER TO TSO` syntax. This syntax enables you to use [TSO](/tso.md) to specify a more precise recovery point in time, thereby enhancing flexibility in data recovery. > **Warning:** > > The `FLASHBACK CLUSTER TO [TIMESTAMP|TSO]` syntax is not applicable to [TiDB Serverless](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters. To avoid unexpected results, do not execute this statement on TiDB Serverless clusters. +> **Warning:** +> +> When specifying a recovery point in time, make sure to check the validity of your target timestamp or TSO and avoid specifying a future time that exceeds the maximum TSO currently allocated by PD (see `Current TSO` on the Grafana PD panel). Otherwise, concurrent processing linear consistency and transaction isolation levels might be violated, leading to serious data correctness issues. + > **Warning:** diff --git a/sql-statements/sql-statement-grant-privileges.md b/sql-statements/sql-statement-grant-privileges.md index b8b36190047f6..adedcb211f1af 100644 --- a/sql-statements/sql-statement-grant-privileges.md +++ b/sql-statements/sql-statement-grant-privileges.md @@ -55,6 +55,10 @@ PrivLevel ::= UserSpecList ::= UserSpec ( ',' UserSpec )* + +RequireClauseOpt ::= ('REQUIRE' ('NONE' | 'SSL' | 'X509' | RequireListElement ('AND'? RequireListElement)*))? + +RequireListElement ::= 'ISSUER' Issuer | 'SUBJECT' Subject | 'CIPHER' Cipher | 'SAN' SAN | 'TOKEN_ISSUER' TokenIssuer ``` ## Examples diff --git a/sql-statements/sql-statement-import-into.md b/sql-statements/sql-statement-import-into.md index af32895c5f9a8..db0fc465d20c5 100644 --- a/sql-statements/sql-statement-import-into.md +++ b/sql-statements/sql-statement-import-into.md @@ -5,47 +5,53 @@ summary: An overview of the usage of IMPORT INTO in TiDB. # IMPORT INTO -The `IMPORT INTO` statement is used to import data in formats such as `CSV`, `SQL`, and `PARQUET` into an empty table in TiDB via the [Physical Import Mode](https://docs.pingcap.com/tidb/stable/tidb-lightning-physical-import-mode) of TiDB Lightning. +The `IMPORT INTO` statement lets you import data to TiDB via the [Physical Import Mode](https://docs.pingcap.com/tidb/stable/tidb-lightning-physical-import-mode) of TiDB Lightning. You can use `IMPORT INTO` in the following two ways: -> **Note:** -> -> This feature is not available on [TiDB Serverless](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters. - -For TiDB Self-Hosted, `IMPORT INTO` supports importing data from files stored in Amazon S3, GCS, and the TiDB local storage. For [TiDB Dedicated](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-dedicated), `IMPORT INTO` supports importing data from files stored in Amazon S3 and GCS. +- `IMPORT INTO ... FROM FILE`: imports data files in formats such as `CSV`, `SQL`, and `PARQUET` into an empty table in TiDB. +- `IMPORT INTO ... FROM SELECT`: imports the query result of a `SELECT` statement into an empty table in TiDB. You can also use it to import historical data queried with [`AS OF TIMESTAMP`](/as-of-timestamp.md). -- For data files stored in Amazon S3 or GCS, `IMPORT INTO` supports running in the [TiDB Distributed eXecution Framework (DXF)](/tidb-distributed-execution-framework.md). - - - When this DXF is enabled ([tidb_enable_dist_task](/system-variables.md#tidb_enable_dist_task-new-in-v710) is `ON`), `IMPORT INTO` splits a data import job into multiple sub-jobs and distributes these sub-jobs to different TiDB nodes for execution to improve the import efficiency. - - When this DXF is disabled, `IMPORT INTO` only supports running on the TiDB node where the current user is connected. - -- For data files stored locally in TiDB, `IMPORT INTO` only supports running on the TiDB node where the current user is connected. Therefore, the data files need to be placed on the TiDB node where the current user is connected. If you access TiDB through a proxy or load balancer, you cannot import data files stored locally in TiDB. +> **Warning:** +> +> Currently, `IMPORT INTO ... FROM SELECT` is experimental. It is not recommended that you use it in production environments. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. ## Restrictions -- For TiDB Self-Hosted, `IMPORT INTO` supports importing data within 10 TiB. For [TiDB Dedicated](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-dedicated), `IMPORT INTO` supports importing data within 50 GiB. - `IMPORT INTO` only supports importing data into existing empty tables in the database. - `IMPORT INTO` does not support transactions or rollback. Executing `IMPORT INTO` within an explicit transaction (`BEGIN`/`END`) will return an error. -- The execution of `IMPORT INTO` blocks the current connection until the import is completed. To execute the statement asynchronously, you can add the `DETACHED` option. - `IMPORT INTO` does not support working simultaneously with features such as [Backup & Restore](https://docs.pingcap.com/tidb/stable/backup-and-restore-overview), [`FLASHBACK CLUSTER`](/sql-statements/sql-statement-flashback-cluster.md), [acceleration of adding indexes](/system-variables.md#tidb_ddl_enable_fast_reorg-new-in-v630), data import using TiDB Lightning, data replication using TiCDC, or [Point-in-Time Recovery (PITR)](https://docs.pingcap.com/tidb/stable/br-log-architecture). -- Only one `IMPORT INTO` job can run on a cluster at a time. Although `IMPORT INTO` performs a precheck for running jobs, it is not a hard limit. Starting multiple import jobs might work when multiple clients execute `IMPORT INTO` simultaneously, but you need to avoid that because it might result in data inconsistency or import failures. - During the data import process, do not perform DDL or DML operations on the target table, and do not execute [`FLASHBACK DATABASE`](/sql-statements/sql-statement-flashback-database.md) for the target database. These operations can lead to import failures or data inconsistencies. In addition, it is **NOT** recommended to perform read operations during the import process, as the data being read might be inconsistent. Perform read and write operations only after the import is completed. - The import process consumes system resources significantly. For TiDB Self-Hosted, to get better performance, it is recommended to use TiDB nodes with at least 32 cores and 64 GiB of memory. TiDB writes sorted data to the TiDB [temporary directory](https://docs.pingcap.com/tidb/stable/tidb-configuration-file#temp-dir-new-in-v630) during import, so it is recommended to configure high-performance storage media for TiDB Self-Hosted, such as flash memory. For more information, see [Physical Import Mode limitations](https://docs.pingcap.com/tidb/stable/tidb-lightning-physical-import-mode#requirements-and-restrictions). - For TiDB Self-Hosted, the TiDB [temporary directory](https://docs.pingcap.com/tidb/stable/tidb-configuration-file#temp-dir-new-in-v630) is expected to have at least 90 GiB of available space. It is recommended to allocate storage space that is equal to or greater than the volume of data to be imported. -- One import job supports importing data into one target table only. To import data into multiple target tables, after the import for a target table is completed, you need to create a new job for the next target table. +- One import job supports importing data into one target table only. - `IMPORT INTO` is not supported during TiDB cluster upgrades. -- When the [Global Sort](/tidb-global-sort.md) feature is used for data import, the data size of a single row after encoding must not exceed 32 MiB. -- When the Global Sort feature is used for data import, if the target TiDB cluster is deleted before the import task is completed, temporary data used for global sorting might remain on Amazon S3. In this case, you need to delete the residual data manually to avoid increasing S3 storage costs. - Ensure that the data to be imported does not contain any records with primary key or non-null unique index conflicts. Otherwise, the conflicts can result in import task failures. -- If an `IMPORT INTO` task scheduled by the Distributed eXecution Framework (DXF) is already running, it cannot be scheduled to a new TiDB node. If the TiDB node that executes the data import task is restarted, it will no longer execute the data import task, but transfers the task to another TiDB node to continue executing. However, if the imported data is from a local file, the task will not be transferred to another TiDB node to continue executing. - Known issue: the `IMPORT INTO` task might fail if the PD address in the TiDB node configuration file is inconsistent with the current PD topology of the cluster. This inconsistency can arise in situations such as that PD was scaled in previously, but the TiDB configuration file was not updated accordingly or the TiDB node was not restarted after the configuration file update. +### `IMPORT INTO ... FROM FILE` restrictions + +- For TiDB Self-Hosted, each `IMPORT INTO` task supports importing data within 10 TiB. If you enable the [Global Sort](/tidb-global-sort.md) feature, each `IMPORT INTO` task supports importing data within 40 TiB. +- For [TiDB Dedicated](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-dedicated), if your data to be imported exceeds 500 GiB, it is recommended to use TiDB nodes with at least 16 cores and enable the [Global Sort](/tidb-global-sort.md) feature, then each `IMPORT INTO` task supports importing data within 40 TiB. If your data to be imported is within 500 GiB or if the cores of your TiDB nodes are less than 16, it is not recommended to enable the [Global Sort](/tidb-global-sort.md) feature. +- The execution of `IMPORT INTO ... FROM FILE` blocks the current connection until the import is completed. To execute the statement asynchronously, you can add the `DETACHED` option. +- Up to 16 `IMPORT INTO` tasks can run simultaneously on each cluster (see [TiDB Distributed eXecution Framework (DXF) usage limitations](/tidb-distributed-execution-framework.md#limitation)). When a cluster lacks sufficient resources or reaches the maximum number of tasks, newly submitted import tasks are queued for execution. +- When the [Global Sort](/tidb-global-sort.md) feature is used for data import, the value of the `THREAD` option must be at least `16`. +- When the [Global Sort](/tidb-global-sort.md) feature is used for data import, the data size of a single row after encoding must not exceed 32 MiB. +- All `IMPORT INTO` tasks that are created when [TiDB Distributed eXecution Framework (DXF)](/tidb-distributed-execution-framework.md) is not enabled run directly on the nodes where the tasks are submitted, and these tasks will not be scheduled for execution on other TiDB nodes even after DXF is enabled later. After DXF is enabled, only newly created `IMPORT INTO` tasks that import data from S3 or GCS are automatically scheduled or failed over to other TiDB nodes for execution. + +### `IMPORT INTO ... FROM SELECT` restrictions + +- `IMPORT INTO ... FROM SELECT` can only be executed on the TiDB node that the current user is connected to, and it blocks the current connection until the import is complete. +- `IMPORT INTO ... FROM SELECT` only supports two [import options](#withoptions): `THREAD` and `DISABLE_PRECHECK`. +- `IMPORT INTO ... FROM SELECT` does not support the task management statements such as `SHOW IMPORT JOB(s)` and `CANCEL IMPORT JOB `. +- The [temporary directory](https://docs.pingcap.com/tidb/stable/tidb-configuration-file#temp-dir-new-in-v630) of TiDB requires sufficient space to store the entire query result of the `SELECT` statement (configuring the `DISK_QUOTA` option is not supported currently). +- Importing historical data using [`tidb_snapshot`](/read-historical-data.md) is not supported. + ## Prerequisites for import Before using `IMPORT INTO` to import data, make sure the following requirements are met: - The target table to be imported is already created in TiDB and it is empty. - The target cluster has sufficient space to store the data to be imported. -- For TiDB Self-Hosted, the [temporary directory](https://docs.pingcap.com/tidb/stable/tidb-configuration-file#temp-dir-new-in-v630) of the TiDB node connected to the current session has at least 90 GiB of available space. If [`tidb_enable_dist_task`](/system-variables.md#tidb_enable_dist_task-new-in-v710) is enabled, also make sure that the temporary directory of each TiDB node in the cluster has sufficient disk space. +- For TiDB Self-Hosted, the [temporary directory](https://docs.pingcap.com/tidb/stable/tidb-configuration-file#temp-dir-new-in-v630) of the TiDB node connected to the current session has at least 90 GiB of available space. If [`tidb_enable_dist_task`](/system-variables.md#tidb_enable_dist_task-new-in-v710) is enabled and the data for import is from S3 or GCS, also make sure that the temporary directory of each TiDB node in the cluster has sufficient disk space. ## Required privileges @@ -56,10 +62,15 @@ Executing `IMPORT INTO` requires the `SELECT`, `UPDATE`, `INSERT`, `DELETE`, and ```ebnf+diagram ImportIntoStmt ::= 'IMPORT' 'INTO' TableName ColumnNameOrUserVarList? SetClause? FROM fileLocation Format? WithOptions? + | + 'IMPORT' 'INTO' TableName ColumnNameList? FROM SelectStatement WithOptions? ColumnNameOrUserVarList ::= '(' ColumnNameOrUserVar (',' ColumnNameOrUserVar)* ')' +ColumnNameList ::= + '(' ColumnName (',' ColumnName)* ')' + SetClause ::= 'SET' SetItem (',' SetItem)* @@ -103,13 +114,14 @@ It specifies the storage location of the data file, which can be an Amazon S3 or > > If [SEM](/system-variables.md#tidb_enable_enhanced_security) is enabled in the target cluster, the `fileLocation` cannot be specified as a local file path. -In the `fileLocation` parameter, you can specify a single file or use the `*` wildcard to match multiple files for import. Note that the wildcard can only be used in the file name, because it does not match directories or recursively match files in subdirectories. Taking files stored on Amazon S3 as examples, you can configure the parameter as follows: +In the `fileLocation` parameter, you can specify a single file, or use the `*` and `[]` wildcards to match multiple files for import. Note that the wildcard can only be used in the file name, because it does not match directories or recursively match files in subdirectories. Taking files stored on Amazon S3 as examples, you can configure the parameter as follows: - Import a single file: `s3:///path/to/data/foo.csv` - Import all files in a specified path: `s3:///path/to/data/*` - Import all files with the `.csv` suffix in a specified path: `s3:///path/to/data/*.csv` - Import all files with the `foo` prefix in a specified path: `s3:///path/to/data/foo*` - Import all files with the `foo` prefix and the `.csv` suffix in a specified path: `s3:///path/to/data/foo*.csv` +- Import `1.csv` and `2.csv` in a specified path: `s3:///path/to/data/[12].csv` ### Format @@ -117,11 +129,11 @@ The `IMPORT INTO` statement supports three data file formats: `CSV`, `SQL`, and ### WithOptions -You can use `WithOptions` to specify import options and control the data import process. For example, to execute the import asynchronously in the backend, you can enable the `DETACHED` mode for the import by adding the `WITH DETACHED` option to the `IMPORT INTO` statement. +You can use `WithOptions` to specify import options and control the data import process. For example, to execute the import of data files asynchronously in the backend, you can enable the `DETACHED` mode for the import by adding the `WITH DETACHED` option to the `IMPORT INTO` statement. The supported options are described as follows: -| Option name | Supported data formats | Description | +| Option name | Supported data sources and formats | Description | |:---|:---|:---| | `CHARACTER_SET=''` | CSV | Specifies the character set of the data file. The default character set is `utf8mb4`. The supported character sets include `binary`, `utf8`, `utf8mb4`, `gb18030`, `gbk`, `latin1`, and `ascii`. | | `FIELDS_TERMINATED_BY=''` | CSV | Specifies the field separator. The default separator is `,`. | @@ -131,17 +143,33 @@ The supported options are described as follows: | `LINES_TERMINATED_BY=''` | CSV | Specifies the line terminator. By default, `IMPORT INTO` automatically identifies `\n`, `\r`, or `\r\n` as line terminators. If the line terminator is one of these three, you do not need to explicitly specify this option. | | `SKIP_ROWS=` | CSV | Specifies the number of rows to skip. The default value is `0`. You can use this option to skip the header in a CSV file. If you use a wildcard to specify the source files for import, this option applies to all source files that are matched by the wildcard in `fileLocation`. | | `SPLIT_FILE` | CSV | Splits a single CSV file into multiple smaller chunks of around 256 MiB for parallel processing to improve import efficiency. This parameter only works for **non-compressed** CSV files and has the same usage restrictions as that of TiDB Lightning [`strict-format`](https://docs.pingcap.com/tidb/stable/tidb-lightning-data-source#strict-format). | -| `DISK_QUOTA=''` | All formats | Specifies the disk space threshold that can be used during data sorting. The default value is 80% of the disk space in the TiDB [temporary directory](https://docs.pingcap.com/tidb/stable/tidb-configuration-file#temp-dir-new-in-v630). If the total disk size cannot be obtained, the default value is 50 GiB. When specifying `DISK_QUOTA` explicitly, make sure that the value does not exceed 80% of the disk space in the TiDB temporary directory. | -| `DISABLE_TIKV_IMPORT_MODE` | All formats | Specifies whether to disable switching TiKV to import mode during the import process. By default, switching TiKV to import mode is not disabled. If there are ongoing read-write operations in the cluster, you can enable this option to avoid impact from the import process. | -| `THREAD=` | All formats | Specifies the concurrency for import. The default value is 50% of the CPU cores, with a minimum value of 1. You can explicitly specify this option to control the resource usage, but make sure that the value does not exceed the number of CPU cores. To import data into a new cluster without any data, it is recommended to increase this concurrency appropriately to improve import performance. If the target cluster is already used in a production environment, it is recommended to adjust this concurrency according to your application requirements. | -| `MAX_WRITE_SPEED=''` | All formats | Controls the write speed to a TiKV node. By default, there is no speed limit. For example, you can specify this option as `1MiB` to limit the write speed to 1 MiB/s. | -| `CHECKSUM_TABLE=''` | All formats | Configures whether to perform a checksum check on the target table after the import to validate the import integrity. The supported values include `"required"` (default), `"optional"`, and `"off"`. `"required"` means performing a checksum check after the import. If the checksum check fails, TiDB will return an error and the import will exit. `"optional"` means performing a checksum check after the import. If an error occurs, TiDB will return a warning and ignore the error. `"off"` means not performing a checksum check after the import. | -| `DETACHED` | All Formats | Controls whether to execute `IMPORT INTO` asynchronously. When this option is enabled, executing `IMPORT INTO` immediately returns the information of the import job (such as the `Job_ID`), and the job is executed asynchronously in the backend. | -| `CLOUD_STORAGE_URI` | All formats | Specifies the target address where encoded KV data for [Global Sort](/tidb-global-sort.md) is stored. When `CLOUD_STORAGE_URI` is not specified, `IMPORT INTO` determines whether to use Global Sort based on the value of the system variable [`tidb_cloud_storage_uri`](/system-variables.md#tidb_cloud_storage_uri-new-in-v740). If this system variable specifies a target storage address, `IMPORT INTO` uses this address for Global Sort. When `CLOUD_STORAGE_URI` is specified with a non-empty value, `IMPORT INTO` uses that value as the target storage address. When `CLOUD_STORAGE_URI` is specified with an empty value, local sorting is enforced. Currently, the target storage address only supports S3. For details about the URI configuration, see [Amazon S3 URI format](/external-storage-uri.md#amazon-s3-uri-format). When this feature is used, all TiDB nodes must have read and write access for the target S3 bucket. | +| `DISK_QUOTA=''` | All file formats | Specifies the disk space threshold that can be used during data sorting. The default value is 80% of the disk space in the TiDB [temporary directory](https://docs.pingcap.com/tidb/stable/tidb-configuration-file#temp-dir-new-in-v630). If the total disk size cannot be obtained, the default value is 50 GiB. When specifying `DISK_QUOTA` explicitly, make sure that the value does not exceed 80% of the disk space in the TiDB temporary directory. | +| `DISABLE_TIKV_IMPORT_MODE` | All file formats | Specifies whether to disable switching TiKV to import mode during the import process. By default, switching TiKV to import mode is not disabled. If there are ongoing read-write operations in the cluster, you can enable this option to avoid impact from the import process. | +| `THREAD=` | All file formats and query results of `SELECT` | Specifies the concurrency for import. For `IMPORT INTO ... FROM FILE`, the default value of `THREAD` is 50% of the number of CPU cores on the TiDB node, the minimum value is `1`, and the maximum value is the number of CPU cores. For `IMPORT INTO ... FROM SELECT`, the default value of `THREAD` is `2`, the minimum value is `1`, and the maximum value is two times the number of CPU cores on the TiDB node. To import data into a new cluster without any data, it is recommended to increase this concurrency appropriately to improve import performance. If the target cluster is already used in a production environment, it is recommended to adjust this concurrency according to your application requirements. | +| `MAX_WRITE_SPEED=''` | All file formats | Controls the write speed to a TiKV node. By default, there is no speed limit. For example, you can specify this option as `1MiB` to limit the write speed to 1 MiB/s. | +| `CHECKSUM_TABLE=''` | All file formats | Configures whether to perform a checksum check on the target table after the import to validate the import integrity. The supported values include `"required"` (default), `"optional"`, and `"off"`. `"required"` means performing a checksum check after the import. If the checksum check fails, TiDB will return an error and the import will exit. `"optional"` means performing a checksum check after the import. If an error occurs, TiDB will return a warning and ignore the error. `"off"` means not performing a checksum check after the import. | +| `DETACHED` | All file formats | Controls whether to execute `IMPORT INTO` asynchronously. When this option is enabled, executing `IMPORT INTO` immediately returns the information of the import job (such as the `Job_ID`), and the job is executed asynchronously in the backend. | +| `CLOUD_STORAGE_URI` | All file formats | Specifies the target address where encoded KV data for [Global Sort](/tidb-global-sort.md) is stored. When `CLOUD_STORAGE_URI` is not specified, `IMPORT INTO` determines whether to use Global Sort based on the value of the system variable [`tidb_cloud_storage_uri`](/system-variables.md#tidb_cloud_storage_uri-new-in-v740). If this system variable specifies a target storage address, `IMPORT INTO` uses this address for Global Sort. When `CLOUD_STORAGE_URI` is specified with a non-empty value, `IMPORT INTO` uses that value as the target storage address. When `CLOUD_STORAGE_URI` is specified with an empty value, local sorting is enforced. Currently, the target storage address only supports S3. For details about the URI configuration, see [Amazon S3 URI format](/external-storage-uri.md#amazon-s3-uri-format). When this feature is used, all TiDB nodes must have read and write access for the target S3 bucket, including at least these permissions: `s3:ListBucket`, `s3:GetObject`, `s3:DeleteObject`, `s3:PutObject`, `s3: AbortMultipartUpload`. | +| `DISABLE_PRECHECK` | All file formats and query results of `SELECT` | Setting this option disables pre-checks of non-critical itemes, such as checking whether there are CDC or PITR tasks. | + +## `IMPORT INTO ... FROM FILE` usage + +> **Note:** +> +> `IMPORT INTO ... FROM FILE` is not available on [TiDB Serverless](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters. + +For TiDB Self-Hosted, `IMPORT INTO ... FROM FILE` supports importing data from files stored in Amazon S3, GCS, and the TiDB local storage. For [TiDB Dedicated](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-dedicated), `IMPORT INTO ... FROM FILE` supports importing data from files stored in Amazon S3 and GCS. + +- For data files stored in Amazon S3 or GCS, `IMPORT INTO ... FROM FILE` supports running in the [TiDB Distributed eXecution Framework (DXF)](/tidb-distributed-execution-framework.md). + + - When the DXF is enabled ([tidb_enable_dist_task](/system-variables.md#tidb_enable_dist_task-new-in-v710) is `ON`), `IMPORT INTO` splits a data import job into multiple sub-jobs and distributes these sub-jobs to different TiDB nodes for execution to improve the import efficiency. + - When the DXF is disabled, `IMPORT INTO ... FROM FILE` only supports running on the TiDB node where the current user is connected. -## Compressed files +- For data files stored locally in TiDB, `IMPORT INTO ... FROM FILE` only supports running on the TiDB node where the current user is connected. Therefore, the data files need to be placed on the TiDB node where the current user is connected. If you access TiDB through a proxy or load balancer, you cannot import data files stored locally in TiDB. -`IMPORT INTO` supports importing compressed `CSV` and `SQL` files. It can automatically determine whether a file is compressed and the compression format based on the file extension: +### Compressed files + +`IMPORT INTO ... FROM FILE` supports importing compressed `CSV` and `SQL` files. It can automatically determine whether a file is compressed and the compression format based on the file extension: | Extension | Compression format | |:---|:---| @@ -154,13 +182,9 @@ The supported options are described as follows: > - The Snappy compressed file must be in the [official Snappy format](https://github.com/google/snappy). Other variants of Snappy compression are not supported. > - Because TiDB Lightning cannot concurrently decompress a single large compressed file, the size of the compressed file affects the import speed. It is recommended that a source file is no greater than 256 MiB after decompression. -## Global Sort +### Global Sort -> **Warning:** -> -> The Global Sort feature is experimental. It is not recommended to use it in production environments. - -`IMPORT INTO` splits the data import job of a source data file into multiple sub-jobs, each sub-job independently encoding and sorting data before importing. If the encoded KV ranges of these sub-jobs have significant overlap (to learn how TiDB encodes data to KV, see [TiDB computing](/tidb-computing.md)), TiKV needs to keep compaction during import, leading to a decrease in import performance and stability. +`IMPORT INTO ... FROM FILE` splits the data import job of a source data file into multiple sub-jobs, each sub-job independently encoding and sorting data before importing. If the encoded KV ranges of these sub-jobs have significant overlap (to learn how TiDB encodes data to KV, see [TiDB computing](/tidb-computing.md)), TiKV needs to keep compaction during import, leading to a decrease in import performance and stability. In the following scenarios, there can be significant overlap in KV ranges: @@ -168,13 +192,13 @@ In the following scenarios, there can be significant overlap in KV ranges: - `IMPORT INTO` splits sub-jobs based on the traversal order of data files, usually sorted by file name in lexicographic order. - If the target table has many indexes, or the index column values are scattered in the data file, the index KV generated by the encoding of each sub-job will also overlap. -When the [TiDB Distributed eXecution Framework (DXF)](/tidb-distributed-execution-framework.md) is enabled, you can enable [Global Sort](/tidb-global-sort.md) by specifying the `CLOUD_STORAGE_URI` option in the `IMPORT INTO` statement or by specifying the target storage address for encoded KV data using the system variable [`tidb_cloud_storage_uri`](/system-variables.md#tidb_cloud_storage_uri-new-in-v740). Note that currently, only S3 is supported as the Global Sort storage address. When Global Sort is enabled, `IMPORT INTO` writes encoded KV data to the cloud storage, performs Global Sort in the cloud storage, and then parallelly imports the globally sorted index and table data into TiKV. This prevents problems caused by KV overlap and enhances import stability. +When the [TiDB Distributed eXecution Framework (DXF)](/tidb-distributed-execution-framework.md) is enabled, you can enable [Global Sort](/tidb-global-sort.md) by specifying the `CLOUD_STORAGE_URI` option in the `IMPORT INTO` statement or by specifying the target storage address for encoded KV data using the system variable [`tidb_cloud_storage_uri`](/system-variables.md#tidb_cloud_storage_uri-new-in-v740). Currently, Global Sort supports using Amazon S3 as the storage address. When Global Sort is enabled, `IMPORT INTO` writes encoded KV data to the cloud storage, performs Global Sort in the cloud storage, and then parallelly imports the globally sorted index and table data into TiKV. This prevents problems caused by KV overlap and enhances import stability and performance. Global Sort consumes a significant amount of memory resources. Before the data import, it is recommended to configure the [`tidb_server_memory_limit_gc_trigger`](/system-variables.md#tidb_server_memory_limit_gc_trigger-new-in-v640) and [`tidb_server_memory_limit`](/system-variables.md#tidb_server_memory_limit-new-in-v640) variables, which avoids golang GC being frequently triggered and thus affecting the import efficiency. ```sql -SET GLOBAL tidb_server_memory_limit_gc_trigger=0.99; -SET GLOBAL tidb_server_memory_limit='88%'; +SET GLOBAL tidb_server_memory_limit_gc_trigger=1; +SET GLOBAL tidb_server_memory_limit='75%'; ``` > **Note:** @@ -182,11 +206,11 @@ SET GLOBAL tidb_server_memory_limit='88%'; > - If the KV range overlap in a source data file is low, enabling Global Sort might decrease import performance. This is because when Global Sort is enabled, TiDB needs to wait for the completion of local sorting in all sub-jobs before proceeding with the Global Sort operations and subsequent import. > - After an import job using Global Sort completes, the files stored in the cloud storage for Global Sort are cleaned up asynchronously in a background thread. -## Output +### Output -When `IMPORT INTO` completes the import or when the `DETACHED` mode is enabled, `IMPORT INTO` will return the current job information in the output, as shown in the following examples. For the description of each field, see [`SHOW IMPORT JOB(s)`](/sql-statements/sql-statement-show-import-job.md). +When `IMPORT INTO ... FROM FILE` completes the import or when the `DETACHED` mode is enabled, TiDB returns the current job information in the output, as shown in the following examples. For the description of each field, see [`SHOW IMPORT JOB(s)`](/sql-statements/sql-statement-show-import-job.md). -When `IMPORT INTO` completes the import, the example output is as follows: +When `IMPORT INTO ... FROM FILE` completes the import, the example output is as follows: ```sql IMPORT INTO t FROM '/path/to/small.csv'; @@ -197,7 +221,7 @@ IMPORT INTO t FROM '/path/to/small.csv'; +--------+--------------------+--------------+----------+-------+----------+------------------+---------------+----------------+----------------------------+----------------------------+----------------------------+------------+ ``` -When the `DETACHED` mode is enabled, executing the `IMPORT INTO` statement will immediately return the job information in the output. From the output, you can see that the status of the job is `pending`, which means waiting for execution. +When the `DETACHED` mode is enabled, executing the `IMPORT INTO ... FROM FILE` statement will immediately return the job information in the output. From the output, you can see that the status of the job is `pending`, which means waiting for execution. ```sql IMPORT INTO t FROM '/path/to/small.csv' WITH DETACHED; @@ -208,27 +232,27 @@ IMPORT INTO t FROM '/path/to/small.csv' WITH DETACHED; +--------+--------------------+--------------+----------+-------+---------+------------------+---------------+----------------+----------------------------+------------+----------+------------+ ``` -## View and manage import jobs +### View and manage import jobs For an import job with the `DETACHED` mode enabled, you can use [`SHOW IMPORT`](/sql-statements/sql-statement-show-import-job.md) to view its current job progress. After an import job is started, you can cancel it using [`CANCEL IMPORT JOB `](/sql-statements/sql-statement-cancel-import-job.md). -## Examples +### Examples -### Import a CSV file with headers +#### Import a CSV file with headers ```sql IMPORT INTO t FROM '/path/to/file.csv' WITH skip_rows=1; ``` -### Import a file asynchronously in the `DETACHED` mode +#### Import a file asynchronously in the `DETACHED` mode ```sql IMPORT INTO t FROM '/path/to/file.csv' WITH DETACHED; ``` -### Skip importing a specific field in your data file +#### Skip importing a specific field in your data file Assume that your data file is in the CSV format and its content is as follows: @@ -244,15 +268,21 @@ And assume that the target table schema for the import is `CREATE TABLE t(id int IMPORT INTO t(id, name, @1) FROM '/path/to/file.csv' WITH skip_rows=1; ``` -### Import multiple data files using the wildcard `*` +#### Import multiple data files using wildcards Assume that there are three files named `file-01.csv`, `file-02.csv`, and `file-03.csv` in the `/path/to/` directory. To import these three files into a target table `t` using `IMPORT INTO`, you can execute the following SQL statement: ```sql -IMPORT INTO t FROM '/path/to/file-*.csv' +IMPORT INTO t FROM '/path/to/file-*.csv'; +``` + +If you only need to import `file-01.csv` and `file-03.csv` into the target table, execute the following SQL statement: + +```sql +IMPORT INTO t FROM '/path/to/file-0[13].csv'; ``` -### Import data files from Amazon S3 or GCS +#### Import data files from Amazon S3 or GCS - Import data files from Amazon S3: @@ -268,7 +298,7 @@ IMPORT INTO t FROM '/path/to/file-*.csv' For details about the URI path configuration for Amazon S3 or GCS, see [URI Formats of External Storage Services](/external-storage-uri.md). -### Calculate column values using SetClause +#### Calculate column values using SetClause Assume that your data file is in the CSV format and its content is as follows: @@ -284,13 +314,13 @@ And assume that the target table schema for the import is `CREATE TABLE t(id int IMPORT INTO t(id, name, @1) SET val=@1*100 FROM '/path/to/file.csv' WITH skip_rows=1; ``` -### Import a data file in the SQL format +#### Import a data file in the SQL format ```sql IMPORT INTO t FROM '/path/to/file.sql' FORMAT 'sql'; ``` -### Limit the write speed to TiKV +#### Limit the write speed to TiKV To limit the write speed to a TiKV node to 10 MiB/s, execute the following SQL statement: @@ -298,6 +328,26 @@ To limit the write speed to a TiKV node to 10 MiB/s, execute the following SQL s IMPORT INTO t FROM 's3://bucket/path/to/file.parquet?access-key=XXX&secret-access-key=XXX' FORMAT 'parquet' WITH MAX_WRITE_SPEED='10MiB'; ``` +## `IMPORT INTO ... FROM SELECT` usage + +`IMPORT INTO ... FROM SELECT` lets you import the query result of a `SELECT` statement to an empty table in TiDB. You can also use it to import historical data queried with [`AS OF TIMESTAMP`](/as-of-timestamp.md). + +### Import the query result of `SELECT` + +To import the `UNION` result to the target table `t`, with the import concurrency specified as `8` and precheck of non-critical items configured as disabled, execute the following SQL statement: + +```sql +IMPORT INTO t FROM SELECT * FROM src UNION SELECT * FROM src2 WITH THREAD = 8, DISABLE_PRECHECK; +``` + +### Import historical data at a specified time point + +To import historical data at a specified time point to the target table `t`, execute the following SQL statement: + +```sql +IMPORT INTO t FROM SELECT * FROM src AS OF TIMESTAMP '2024-02-27 11:38:00'; +``` + ## MySQL compatibility This statement is a TiDB extension to MySQL syntax. diff --git a/sql-statements/sql-statement-lock-stats.md b/sql-statements/sql-statement-lock-stats.md index a924a0b2c1a36..25057e7ba9476 100644 --- a/sql-statements/sql-statement-lock-stats.md +++ b/sql-statements/sql-statement-lock-stats.md @@ -7,10 +7,6 @@ summary: An overview of the usage of LOCK STATS for the TiDB database. `LOCK STATS` is used to lock the statistics of tables or partitions. When the statistics is locked, TiDB does not automatically update the statistics of the table or partition. For details on the behavior, see [Behaviors of locking statistics](/statistics.md#behaviors-of-locking-statistics). -> **Warning:** -> -> Locking statistics is an experimental feature for the current version. It is not recommended to use it in the production environment. - ## Synopsis ```ebnf+diagram diff --git a/sql-statements/sql-statement-revoke-privileges.md b/sql-statements/sql-statement-revoke-privileges.md index dc60a2658b236..532b601db0949 100644 --- a/sql-statements/sql-statement-revoke-privileges.md +++ b/sql-statements/sql-statement-revoke-privileges.md @@ -55,6 +55,10 @@ PrivLevel ::= UserSpecList ::= UserSpec ( ',' UserSpec )* + +RequireClauseOpt ::= ('REQUIRE' ('NONE' | 'SSL' | 'X509' | RequireListElement ('AND'? RequireListElement)*))? + +RequireListElement ::= 'ISSUER' Issuer | 'SUBJECT' Subject | 'CIPHER' Cipher | 'SAN' SAN | 'TOKEN_ISSUER' TokenIssuer ``` ## Examples diff --git a/sql-statements/sql-statement-show-stats-locked.md b/sql-statements/sql-statement-show-stats-locked.md index 51e026bdcc8da..a3f9d66e41273 100644 --- a/sql-statements/sql-statement-show-stats-locked.md +++ b/sql-statements/sql-statement-show-stats-locked.md @@ -7,10 +7,6 @@ summary: An overview of the usage of SHOW STATS_LOCKED for the TiDB database. `SHOW STATS_LOCKED` shows the tables whose statistics are locked. -> **Warning:** -> -> Locking statistics is an experimental feature for the current version. It is not recommended to use it in the production environment. - ## Synopsis ```ebnf+diagram diff --git a/sql-statements/sql-statement-unlock-stats.md b/sql-statements/sql-statement-unlock-stats.md index 059229a2889be..cd6446a8bafaf 100644 --- a/sql-statements/sql-statement-unlock-stats.md +++ b/sql-statements/sql-statement-unlock-stats.md @@ -7,10 +7,6 @@ summary: An overview of the usage of UNLOCK STATS for the TiDB database. `UNLOCK STATS` is used to unlock the statistics of a table or tables. -> **Warning:** -> -> Locking statistics is an experimental feature for the current version. It is not recommended to use it in the production environment. - ## Synopsis ```ebnf+diagram diff --git a/statement-summary-tables.md b/statement-summary-tables.md index 0a4f814738941..13c1708c76198 100644 --- a/statement-summary-tables.md +++ b/statement-summary-tables.md @@ -87,7 +87,7 @@ The following is a sample output of querying `statements_summary`: > **Note:** > > - In TiDB, the time unit of fields in statement summary tables is nanosecond (ns), whereas in MySQL the time unit is picosecond (ps). -> - Starting from v7.6.0, for clusters with [resource control](/tidb-resource-control.md) enabled, `statements_summary` will be aggregated by resource group, for example, the same statements executed in different resource groups will be collected as different records. +> - Starting from v7.5.1 and v7.6.0, for clusters with [resource control](/tidb-resource-control.md) enabled, `statements_summary` will be aggregated by resource group, for example, the same statements executed in different resource groups will be collected as different records. ## `statements_summary_history` diff --git a/statistics.md b/statistics.md index 6c4ff858a538c..7b6db2d637880 100644 --- a/statistics.md +++ b/statistics.md @@ -352,6 +352,7 @@ Three system variables related to automatic update of statistics are as follows: | [`tidb_auto_analyze_start_time`](/system-variables.md#tidb_auto_analyze_start_time) | `00:00 +0000` | The start time in a day when TiDB can perform automatic update | | [`tidb_auto_analyze_end_time`](/system-variables.md#tidb_auto_analyze_end_time) | `23:59 +0000` | The end time in a day when TiDB can perform automatic update | | [`tidb_auto_analyze_partition_batch_size`](/system-variables.md#tidb_auto_analyze_partition_batch_size-new-in-v640) | `1` | The number of partitions that TiDB automatically analyzes when analyzing a partitioned table (that is, when automatically updating statistics on a partitioned table) | +| [`tidb_enable_auto_analyze_priority_queue`](/system-variables.md#tidb_enable_auto_analyze_priority_queue-new-in-v800) | `ON` | Controls whether to enable the priority queue to schedule the tasks of automatically collecting statistics. When this variable is enabled, TiDB prioritizes collecting statistics for tables that are more valuable to collect, such as newly created indexes and partitioned tables with partition changes. Additionally, TiDB prioritizes tables with lower health scores, placing them at the front of the queue. | When the ratio of the number of modified rows to the total number of rows of `tbl` in a table is greater than `tidb_auto_analyze_ratio`, and the current time is between `tidb_auto_analyze_start_time` and `tidb_auto_analyze_end_time`, TiDB executes the `ANALYZE TABLE tbl` statement in the background to automatically update the statistics on this table. @@ -807,10 +808,6 @@ LOAD STATS 'file_name' ## Lock statistics -> **Warning:** -> -> Locking statistics is an experimental feature for the current version. It is not recommended to use it in the production environment. - Starting from v6.5.0, TiDB supports locking statistics. After the statistics of a table or a partition are locked, the statistics of the table cannot be modified and the `ANALYZE` statement cannot be executed on the table. For example: Create table `t`, and insert data into it. When the statistics of table `t` are not locked, the `ANALYZE` statement can be successfully executed. diff --git a/status-variables.md b/status-variables.md index 4ecdf114fd93e..ec577d88c6173 100644 --- a/status-variables.md +++ b/status-variables.md @@ -90,3 +90,33 @@ Additionally, the [FLUSH STATUS](/sql-statements/sql-statement-flush-status.md) - Scope: SESSION | GLOBAL - Type: String - The UUID of the server. + +### tidb_gc_last_run_time + +- Scope: SESSION | GLOBAL +- Type: String +- The timestamp of the last run of [GC](/garbage-collection-overview.md). + +### tidb_gc_leader_desc + +- Scope: SESSION | GLOBAL +- Type: String +- Information about [GC](/garbage-collection-overview.md) leader, including the hostname and process id (pid). + +### tidb_gc_leader_lease + +- Scope: SESSION | GLOBAL +- Type: String +- The timestamp of the [GC](/garbage-collection-overview.md) lease. + +### tidb_gc_leader_uuid + +- Scope: SESSION | GLOBAL +- Type: String +- The UUID of the [GC](/garbage-collection-overview.md) leader. + +### tidb_gc_safe_point + +- Scope: SESSION | GLOBAL +- Type: String +- The timestamp of the [GC](/garbage-collection-overview.md) safe point. diff --git a/storage-engine/titan-configuration.md b/storage-engine/titan-configuration.md index 5f17f5f641b79..5f5ca31268eb0 100644 --- a/storage-engine/titan-configuration.md +++ b/storage-engine/titan-configuration.md @@ -105,6 +105,10 @@ If you observe that the Titan GC thread is in full load for a long time from **T You can adjust [`rate-bytes-per-sec`](/tikv-configuration-file.md#rate-bytes-per-sec) to limit the I/O rate of RocksDB compaction, reducing its impact on foreground read and write performance during high traffic. +### `shared-blob-cache` (New in v8.0.0) + +You can control whether to enable the shared cache for Titan blob files and RocksDB block files through [`shared-blob-cache`](/tikv-configuration-file.md#shared-blob-cache-new-in-v800). The default value is `true`. When the shared cache is enabled, block files have higher priority. This means that TiKV prioritizes meeting the cache needs of block files and then uses the remaining cache for blob files. + ### Titan configuration example The following is an example of the Titan configuration file. You can either [use TiUP to modify the configuration](/maintain-tidb-using-tiup.md#modify-the-configuration) or [configure a TiDB cluster on Kubernetes](https://docs.pingcap.com/tidb-in-kubernetes/stable/configure-a-tidb-cluster). @@ -121,7 +125,6 @@ max-background-gc = 1 min-blob-size = "32KB" blob-file-compression = "zstd" zstd-dict-size = "16KB" -blob-cache-size = "0GB" discardable-ratio = 0.5 blob-run-mode = "normal" level-merge = false @@ -154,6 +157,10 @@ To fully disable Titan for all existing and future data, you can follow these st 2. (Optional) Perform a full compaction using tikv-ctl. This process will consume a large amount of I/O and CPU resources. + > **Warning:** + > + > When disk space is insufficient, executing the following command might result in the entire cluster running out of available space and thus unable to write data. + ```bash tikv-ctl --pd compact-cluster --bottommost force ``` diff --git a/sys-schema.md b/sys-schema.md new file mode 100644 index 0000000000000..4c3beda035e2a --- /dev/null +++ b/sys-schema.md @@ -0,0 +1,54 @@ +--- +title: sys Schema +summary: Learn about the system tables in the `sys` schema. +--- + +# `sys` Schema + +Starting from v8.0.0, TiDB provides the `sys` schema. You can use the views in `sys` schema to understand the data in the system tables, [`INFORMATION_SCHEMA`](/information-schema/information-schema.md), and [`PERFORMANCE SCHEMA`](/performance-schema/performance-schema.md) of TiDB. + +## Manually create the `sys` schema and views + +For clusters upgraded from versions earlier than v8.0.0, the `sys` schema and the views in it are not created automatically. You can manually create them using the following SQL statements: + +```sql +CREATE DATABASE IF NOT EXISTS sys; +CREATE OR REPLACE VIEW sys.schema_unused_indexes AS + SELECT + table_schema as object_schema, + table_name as object_name, + index_name + FROM information_schema.cluster_tidb_index_usage + WHERE + table_schema not in ('sys', 'mysql', 'INFORMATION_SCHEMA', 'PERFORMANCE_SCHEMA') and + index_name != 'PRIMARY' + GROUP BY table_schema, table_name, index_name + HAVING + sum(last_access_time) is null; +``` + +## `schema_unused_index` + +`schema_unused_index` records indexes that have not been used since the last start of TiDB. It includes the following columns: + +- `OBJECT_SCHEMA`: The name of the database to which the table containing the index belongs. +- `OBJECT_NAME`: The name of the table containing the index. +- `INDEX_NAME`: The name of the index. + +```sql +USE SYS; +DESC SCHEMA_UNUSED_INDEXES; +``` + +The output is as follows: + +```sql ++---------------+-------------+------+------+---------+-------+ +| Field | Type | Null | Key | Default | Extra | ++---------------+-------------+------+------+---------+-------+ +| object_schema | varchar(64) | YES | | NULL | | +| object_name | varchar(64) | YES | | NULL | | +| index_name | varchar(64) | YES | | NULL | | ++---------------+-------------+------+------+---------+-------+ +3 rows in set (0.00 sec) +``` \ No newline at end of file diff --git a/system-variables.md b/system-variables.md index f27cb58b8f76e..387983a80e371 100644 --- a/system-variables.md +++ b/system-variables.md @@ -443,7 +443,6 @@ mysql> SELECT * FROM t1; - Type: Enumeration - Default value: `mysql_native_password` - Possible values: `mysql_native_password`, `caching_sha2_password`, `tidb_sm3_password`, `tidb_auth_token`, `authentication_ldap_sasl`, and `authentication_ldap_simple`. -- The `tidb_auth_token` authentication method is used only for the internal operation of TiDB Cloud. **DO NOT** set the variable to this value. - This variable sets the authentication method that the server advertises when the server-client connection is being established. - To authenticate using the `tidb_sm3_password` method, you can connect to TiDB using [TiDB-JDBC](https://github.com/pingcap/mysql-connector-j/tree/release/8.0-sm3). @@ -507,6 +506,16 @@ For more possible values of this variable, see [Authentication plugin status](/s +### div_precision_increment New in v8.0.0 + +- Scope: SESSION | GLOBAL +- Persists to cluster: Yes +- Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): Yes +- Type: Integer +- Default value: `4` +- Range: `[0, 30]` +- This variable specifies the number of digits by which to increase the scale of the result of a division operation performed using the `/` operator. This variable is the same as MySQL. + ### error_count - Scope: SESSION @@ -1616,9 +1625,9 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; ### tidb_cloud_storage_uri New in v7.4.0 -> **Warning:** +> **Note:** > -> This feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. +> Currently, the [Global Sort](/tidb-global-sort.md) process consumes a large amount of computing and memory resources of TiDB nodes. In scenarios such as adding indexes online while user business applications are running, it is recommended to add new TiDB nodes to the cluster and set the [`tidb_service_scope`](/system-variables.md#tidb_service_scope-new-in-v740) variable of these nodes to `"background"`. In this way, the distributed framework schedules tasks to these nodes, isolating the workload from other TiDB nodes to reduce the impact of executing backend tasks such as `ADD INDEX` and `IMPORT INTO` on user business applications. - Scope: GLOBAL - Persists to cluster: Yes @@ -1703,7 +1712,7 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; - Unit: Threads - This variable is used to set the concurrency of the DDL operation in the `re-organize` phase. -### `tidb_ddl_version` New in v7.6.0 +### `tidb_enable_fast_create_table` New in v8.0.0 > **Warning:** > @@ -1712,10 +1721,11 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; - Scope: GLOBAL - Persists to cluster: Yes - Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): No -- Default value: `1` -- Value range: `1` or `2` -- This variable is used to control whether to enable [TiDB DDL V2](/ddl-v2.md). Setting it to `2` enables this feature, and setting it to `1` disables it. The default value is `1`. After you enable it, TiDB uses TiDB DDL V2 to execute DDL statements. -- Starting from v7.6.0, TiDB only supports accelerating table creation by the [`CREATE TABLE`](/sql-statements/sql-statement-create-table.md) statement. +- Type: Boolean +- Default value: `OFF` +- This variable is used to control whether to enable [TiDB Accelerated Table Creation](/accelerated-table-creation.md). +- Starting from v8.0.0, TiDB supports accelerating table creation by the [`CREATE TABLE`](/sql-statements/sql-statement-create-table.md) statement using `tidb_enable_fast_create_table`. +- This variable is renamed from the variable [`tidb_ddl_version`](https://docs.pingcap.com/tidb/v7.6/system-variables#tidb_ddl_version-new-in-v760) that is introduced in v7.6.0. Starting from v8.0.0, `tidb_ddl_version` no longer takes effect. ### tidb_default_string_match_selectivity New in v6.2.0 @@ -1735,7 +1745,7 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; > **Warning:** > -> Starting from TiDB v8.0.0, this variable will be deprecated, and TiDB will no longer support automatic retries of optimistic transactions. As an alternative, when encountering transaction conflicts, you can capture the error and retry transactions in your application, or use the [Pessimistic transaction mode](/pessimistic-transaction.md) instead. +> Starting from v8.0.0, this variable is deprecated, and TiDB no longer supports automatic retries of optimistic transactions. As an alternative, when encountering optimistic transaction conflicts, you can capture the error and retry transactions in your application, or use the [Pessimistic transaction mode](/pessimistic-transaction.md) instead. - Scope: SESSION | GLOBAL - Persists to cluster: Yes @@ -1797,6 +1807,31 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; > > Starting from v7.0.0, `tidb_dml_batch_size` no longer takes effect on the [`LOAD DATA` statement](/sql-statements/sql-statement-load-data.md). +### tidb_dml_type New in v8.0.0 + +> **Warning:** +> +> The bulk DML execution mode (`tidb_dml_type = "bulk"`) is an experimental feature. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues). When TiDB performs large transactions using the bulk DML mode, it might affect the memory usage and execution efficiency of TiCDC, TiFlash, and the resolved-ts module of TiKV, and might cause OOM issues. Therefore, it is not recommended to use this mode when these components or features are enabled. + +- Scope: SESSION +- Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): Yes +- Type: String +- Default value: `"standard"` +- Value options: `"standard"`, `"bulk"` +- This variable controls the execution mode of DML statements. + - `"standard"` indicates the standard DML execution mode, where TiDB transactions are cached in memory before being committed. This mode is suitable for high-concurrency transaction scenarios with potential conflicts and is the default recommended execution mode. + - `"bulk"` indicates the bulk DML execution mode, which is suitable for scenarios where a large amount of data is written, causing excessive memory usage in TiDB. + - During the execution of TiDB transactions, the data is not fully cached in the TiDB memory, but is continuously written to TiKV to reduce memory usage and smooth the write pressure. + - Only `INSERT`, `UPDATE`, `REPLACE`, and `DELETE` statements are affected by the `"bulk"` mode. + - `"bulk"` mode is only suitable for scenarios where a large amount of **data is written without conflicts**. This mode is not efficient for handling write conflicts, as write-write conflicts might cause large transactions to fail and be rolled back. + - `"bulk"` mode only takes effect on statements with auto-commit enabled, and requires the [`pessimistic-auto-commit`](https://docs.pingcap.com/tidb/stable/tidb-configuration-file#pessimistic-auto-commit-new-in-v600) configuration item to be set to `false`. + - When using the `"bulk"` mode to execute statements, ensure that the [metadata lock](/metadata-lock.md) remains enabled during the execution process. + - `"bulk"` mode cannot be used on [temporary tables](/temporary-tables.md) and [cached tables](/cached-tables.md). + - `"bulk"` mode cannot be used on tables containing foreign keys and tables referenced by foreign keys when the foreign key constraint check is enabled (`foreign_key_checks = ON`). + - In situations that the environment does not support or is incompatible with the `"bulk"` mode, TiDB falls back to the `"standard"` mode and returns a warning message. To verify if the `"bulk"` mode is used, you can check the `pipelined` field using [`tidb_last_txn_info`](#tidb_last_txn_info-new-in-v409). A `true` value indicates that the `"bulk"` mode is used. + - When executing large transactions in the `"bulk"` mode, the transaction duration might be long. For transactions in this mode, the maximum TTL of the transaction lock is the greater value between [`max-txn-ttl`](https://docs.pingcap.com/tidb/stable/tidb-configuration-file#max-txn-ttl) and 24 hours. Additionally, if the transaction execution time exceeds the value set by [`tidb_gc_max_wait_time`](#tidb_gc_max_wait_time-new-in-v610), the GC might force a rollback of the transaction, leading to its failure. + - This mode is implemented by the Pipelined DML feature. For detailed design and GitHub issues, see [Pipelined DML](https://github.com/pingcap/tidb/blob/master/docs/design/2024-01-09-pipelined-DML.md) and [#50215](https://github.com/pingcap/tidb/issues/50215). + ### tidb_enable_1pc New in v5.0 > **Note:** @@ -1863,6 +1898,15 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; - Determines whether TiDB automatically updates table statistics as a background operation. - This setting was previously a `tidb.toml` option (`performance.run-auto-analyze`), but changed to a system variable starting from TiDB v6.1.0. +### tidb_enable_auto_analyze_priority_queue New in v8.0.0 + +- Scope: GLOBAL +- Persists to cluster: Yes +- Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): No +- Type: Boolean +- Default value: `ON` +- This variable is used to control whether to enable the priority queue to schedule the tasks of automatically collecting statistics. When this variable is enabled, TiDB prioritizes collecting statistics for tables that are more valuable to collect, such as newly created indexes and partitioned tables with partition changes. Additionally, TiDB prioritizes tables with lower health scores, placing them at the front of the queue. + ### tidb_enable_auto_increment_in_generated - Scope: SESSION | GLOBAL @@ -1952,7 +1996,7 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; - Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): No - Type: Boolean - Default value: `ON` -- This variable controls whether to record the execution information of each operator in the slow query log. +- This variable controls whether to record the execution information of each operator in the slow query log and whether to record the [usage statistics of indexes](/information-schema/information-schema-tidb-index-usage.md). ### tidb_enable_column_tracking New in v5.4.0 @@ -1967,6 +2011,19 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; - Default value: `OFF` - This variable controls whether to enable TiDB to collect `PREDICATE COLUMNS`. After enabling the collection, if you disable it, the information of previously collected `PREDICATE COLUMNS` is cleared. For details, see [Collect statistics on some columns](/statistics.md#collect-statistics-on-some-columns). +### tidb_enable_concurrent_hashagg_spill New in v8.0.0 + +> **Warning:** +> +> Currently, the feature controlled by this variable is experimental. It is not recommended that you use it in production environments. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. + +- Scope: SESSION | GLOBAL +- Persists to cluster: Yes +- Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): No +- Type: Boolean +- Default value: `ON` +- This variable controls whether TiDB supports disk spill for the concurrent HashAgg algorithm. When it is `ON`, disk spill can be triggered for the concurrent HashAgg algorithm. This variable will be deprecated when this feature is generally available in a future release. + ### tidb_enable_enhanced_security - Scope: NONE @@ -3357,6 +3414,17 @@ For a system upgraded to v5.0 from an earlier version, if you have not modified +### `tidb_load_binding_timeout` New in v8.0.0 + +- Scope: GLOBAL +- Persists to cluster: Yes +- Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): No +- Type: Integer +- Default value: `200` +- Range: `(0, 2147483647]` +- Unit: Milliseconds +- This variable is used to control the timeout of loading bindings. If the execution time of loading bindings exceeds this value, the loading will stop. + ### `tidb_lock_unchanged_keys` New in v7.1.1 and v7.3.0 - Scope: SESSION | GLOBAL @@ -3400,9 +3468,21 @@ For a system upgraded to v5.0 from an earlier version, if you have not modified - Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): No - Type: Boolean - Default value: `OFF` -- This variable is used to set whether to enable the low precision TSO feature. After this feature is enabled, new transactions use a timestamp updated every 2 seconds to read data. +- This variable is used to set whether to enable the low-precision TSO feature. After this feature is enabled, TiDB uses the cached timestamp to read data. The cached timestamp is updated every 2 seconds by default. Starting from v8.0.0, you can configure the update interval by [`tidb_low_resolution_tso_update_interval`](#tidb_low_resolution_tso_update_interval-new-in-v800). - The main applicable scenario is to reduce the overhead of acquiring TSO for small read-only transactions when reading old data is acceptable. +### `tidb_low_resolution_tso_update_interval` New in v8.0.0 + +- Scope:GLOBAL +- Persists to cluster: Yes +- Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): No +- Type:Integer +- Default value:`2000` +- Range:`[10, 60000]` +- Unit: Milliseconds +- This variable is used to set the update interval of the cached timestamp used in the low-precision TSO feature, in milliseconds. +- This variable is only available when [`tidb_low_resolution_tso`](#tidb_low_resolution_tso) is enabled. + ### tidb_max_auto_analyze_time New in v6.1.0 - Scope: GLOBAL @@ -4494,6 +4574,14 @@ SHOW WARNINGS; - Default value: `24.0` - Indicates the concurrency number of TiFlash computation. This variable is internally used in the Cost Model, and it is NOT recommended to modify its value. +### tidb_opt_use_invisible_indexes New in v8.0.0 + +- Scope: SESSION +- Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): Yes +- Type: Boolean +- Default value: `OFF` +- This variable controls whether the optimizer can select [invisible indexes](/sql-statements/sql-statement-create-index.md#invisible-index) for query optimization in the current session. Invisible indexes are maintained by DML statements, but will not be used by the query optimizer. This is useful in scenarios where you want to double-check before removing an index permanently. When the variable is set to `ON`, the optimizer can select invisible indexes for query optimization in the session. + ### tidb_opt_write_row_id > **Note:** @@ -4583,8 +4671,8 @@ SHOW WARNINGS; - Scope: SESSION | GLOBAL - Persists to cluster: Yes - Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): Yes -- Default value: `2097152` (which is 2 MB) -- Range: `[0, 9223372036854775807]`, in bytes. The memory format with the units "KB|MB|GB|TB" is also supported. `0` means no limit. +- Default value: `2097152` (which is 2 MiB) +- Range: `[0, 9223372036854775807]`, in bytes. The memory format with the units "KiB|MiB|GiB|TiB" is also supported. `0` means no limit. - This variable controls the maximum size of a plan that can be cached in prepared or non-prepared plan cache. If the size of a plan exceeds this value, the plan will not be cached. For more details, see [Memory management of prepared plan cache](/sql-prepared-plan-cache.md#memory-management-of-prepared-plan-cache) and [Non-prepared plan cache](/sql-plan-management.md#usage). ### tidb_pprof_sql_cpu New in v4.0 @@ -4735,10 +4823,17 @@ SHOW WARNINGS; - Scope: SESSION | GLOBAL - Persists to cluster: Yes - Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): No -- Type: Boolean +- Type: Enumeration - Default value: `OFF` -- This variable controls whether to hide user information in the SQL statement being recorded into the TiDB log and slow log. -- When you set the variable to `1`, user information is hidden. For example, if the executed SQL statement is `insert into t values (1,2)`, the statement is recorded as `insert into t values (?,?)` in the log. +- Possible values: `OFF`, `ON`, `MARKER` +- This variable controls whether to hide the user information in the SQL statement being recorded into the TiDB log and slow log. +- The default value is `OFF`, which means that the user information is not processed in any way. +- When you set the variable to `ON`, the user information is hidden. For example, if the executed SQL statement is `INSERT INTO t VALUES (1,2)`, the statement is recorded as `INSERT INTO t VALUES (?,?)` in the log. +- When you set the variable to `MARKER`, the user information is wrapped in `‹ ›`. For example, if the executed SQL statement is `INSERT INTO t VALUES (1,2)`, the statement is recorded as `INSERT INTO t VALUES (‹1›,‹2›)` in the log. If the input has `‹`, it is escaped as `‹‹`, and `›` is escaped as `››`. Based on the marked logs, you can decide whether to desensitize the marked information when the logs are displayed. + +> **Warning:** +> +> The `MARKER` option is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. ### tidb_regard_null_as_point New in v5.4.0 @@ -4866,6 +4961,20 @@ SHOW WARNINGS; - By default, Regions are split for a new table when it is being created in TiDB. After this variable is enabled, the newly split Regions are scattered immediately during the execution of the `CREATE TABLE` statement. This applies to the scenario where data need to be written in batches right after the tables are created in batches, because the newly split Regions can be scattered in TiKV beforehand and do not have to wait to be scheduled by PD. To ensure the continuous stability of writing data in batches, the `CREATE TABLE` statement returns success only after the Regions are successfully scattered. This makes the statement's execution time multiple times longer than that when you disable this variable. - Note that if `SHARD_ROW_ID_BITS` and `PRE_SPLIT_REGIONS` have been set when a table is created, the specified number of Regions are evenly split after the table creation. +### tidb_schema_cache_size New in v8.0.0 + +> **Warning:** +> +> This feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. + +- Scope: GLOBAL +- Persists to cluster: Yes +- Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): Yes +- Type: Integer +- Default value: `0` +- Range: `[0, 9223372036854775807]` +- This variable controls the size of the schema cache in TiDB. The unit is byte. The default value is `0`, which means that the cache limit feature is not enabled. When this feature is enabled, TiDB uses the value you set as the maximum available memory limit, and uses the Least Recently Used (LRU) algorithm to cache the required tables, effectively reducing the memory occupied by the schema information. + ### tidb_schema_version_cache_limit New in v7.4.0 - Scope: GLOBAL @@ -4889,8 +4998,8 @@ SHOW WARNINGS; - Default value: `80%` - Range: - You can set the value in the percentage format, which means the percentage of the memory usage relative to the total memory. The value range is `[1%, 99%]`. - - You can also set the value in memory size. The value range is `0` and `[536870912, 9223372036854775807]` in bytes. The memory format with the units "KB|MB|GB|TB" is supported. `0` means no memory limit. - - If this variable is set to a memory size that is less than 512 MB but not `0`, TiDB uses 512 MB as the actual size. + - You can also set the value in memory size. The value range is `0` and `[536870912, 9223372036854775807]` in bytes. The memory format with the units "KiB|MiB|GiB|TiB" is supported. `0` means no memory limit. + - If this variable is set to a memory size that is less than 512 MiB but not `0`, TiDB uses 512 MiB as the actual size. - This variable specifies the memory limit for a TiDB instance. When the memory usage of TiDB reaches the limit, TiDB cancels the currently running SQL statement with the highest memory usage. After the SQL statement is successfully canceled, TiDB tries to call Golang GC to immediately reclaim memory to relieve memory stress as soon as possible. - Only the SQL statements with more memory usage than the [`tidb_server_memory_limit_sess_min_size`](/system-variables.md#tidb_server_memory_limit_sess_min_size-new-in-v640) limit are selected as the SQL statements to be canceled first. - Currently, TiDB cancels only one SQL statement at a time. After TiDB completely cancels a SQL statement and recovers resources, if the memory usage is still greater than the limit set by this variable, TiDB starts the next cancel operation. @@ -4917,8 +5026,8 @@ SHOW WARNINGS; - Scope: GLOBAL - Persists to cluster: Yes - Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): No -- Default value: `134217728` (which is 128 MB) -- Range: `[128, 9223372036854775807]`, in bytes. The memory format with the units "KB|MB|GB|TB" is also supported. +- Default value: `134217728` (which is 128 MiB) +- Range: `[128, 9223372036854775807]`, in bytes. The memory format with the units "KiB|MiB|GiB|TiB" is also supported. - After you enable the memory limit, TiDB will terminate the SQL statement with the highest memory usage on the current instance. This variable specifies the minimum memory usage of the SQL statement to be terminated. If the memory usage of a TiDB instance that exceeds the limit is caused by too many sessions with low memory usage, you can properly lower the value of this variable to allow more sessions to be canceled. ### tidb_service_scope New in v7.4.0 @@ -5683,7 +5792,7 @@ For details, see [Identify Slow Queries](/identify-slow-queries.md). -- This variable is used to control the batch size of transaction commit requests that TiDB sends to TiKV. If most of the transactions in the application workload have a large number of write operations, adjusting this variable to a larger value can improve the performance of batch processing. However, if this variable is set to too large a value and exceeds the limit of TiKV's maximum size of a single log (which is 8 MB by default), the commits might fail. +- This variable is used to control the batch size of transaction commit requests that TiDB sends to TiKV. If most of the transactions in the application workload have a large number of write operations, adjusting this variable to a larger value can improve the performance of batch processing. However, if this variable is set to too large a value and exceeds the limit of TiKV's maximum size of a single log (which is 8 MiB by default), the commits might fail. diff --git a/templates/template.tex b/templates/template.tex index cf372e561d9ee..acbf021b06e2f 100644 --- a/templates/template.tex +++ b/templates/template.tex @@ -8,6 +8,8 @@ \usepackage{setspace} \setstretch{$linestretch$} $endif$ +\usepackage{etoolbox} +\usepackage{xstring} \usepackage{amssymb,amsmath} \usepackage{ifxetex,ifluatex} \usepackage{fixltx2e} % provides \textsubscript @@ -68,6 +70,7 @@ \usepackage{xcolor} \usepackage{listings} \lstset{ + literate={_}{\_}1, basicstyle=\ttfamily, keywordstyle=\color[rgb]{0.13,0.29,0.53}\bfseries, stringstyle=\color[rgb]{0.31,0.60,0.02}, @@ -93,6 +96,14 @@ $endif$ $if(tables)$ \usepackage{longtable,booktabs} +% set table to left-aligned +\setlength\LTleft{0pt} +\setlength\LTright{0pt} +% fill the columns to page width +\makeatletter +\patchcmd\LT@array{\tabskip\z@}{\extracolsep{\fill}} +\makeatletter + $endif$ $if(graphics)$ \usepackage{graphicx} @@ -114,8 +125,8 @@ \fi \hypersetup{breaklinks=true, bookmarks=true, - pdfauthor={$author-meta$}, - pdftitle={$title-meta$}, + pdfauthor={$author$}, + pdftitle={$title$}, colorlinks=true, citecolor=$if(citecolor)$$citecolor$$else$blue$endif$, urlcolor=$if(urlcolor)$$urlcolor$$else$blue$endif$, diff --git a/ticdc/ticdc-alert-rules.md b/ticdc/ticdc-alert-rules.md index 91c4c8c54870d..10e9f0235271a 100644 --- a/ticdc/ticdc-alert-rules.md +++ b/ticdc/ticdc-alert-rules.md @@ -16,7 +16,7 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr - Alert rule: - (time() - ticdc_owner_checkpoint_ts / 1000) > 600 + `(time() - ticdc_owner_checkpoint_ts / 1000) > 600` - Description: @@ -24,13 +24,13 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr - Solution: - See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). + See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). ### `cdc_resolvedts_high_delay` - Alert rule: - (time() - ticdc_owner_resolved_ts / 1000) > 300 + `(time() - ticdc_owner_resolved_ts / 1000) > 300` - Description: @@ -38,7 +38,21 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr - Solution: - See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). + See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). + +### `ticdc_changefeed_failed` + +- Alert rule: + + `(max_over_time(ticdc_owner_status[1m]) == 2) > 0` + +- Description: + + A replication task encounters an unrecoverable error and enters the failed state. + +- Solution: + + This alert is similar to replication interruption. See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). ### `ticdc_processor_exit_with_error_count` @@ -52,7 +66,7 @@ For critical alerts, you need to pay close attention to abnormal monitoring metr - Solution: - See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). + See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). ## Warning alerts @@ -98,7 +112,7 @@ Warning alerts are a reminder for an issue or error. - Solution: - See [TiCDC Handle Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). + See [TiCDC Handles Replication Interruption](/ticdc/troubleshoot-ticdc.md#how-do-i-handle-replication-interruptions). ### `ticdc_puller_entry_sorter_sort_bucket` @@ -132,7 +146,7 @@ Warning alerts are a reminder for an issue or error. - Alert rule: - `changes(tikv_cdc_min_resolved_ts[1m]) < 1 and ON (instance) tikv_cdc_region_resolve_status{status="resolved"} > 0` + `changes(tikv_cdc_min_resolved_ts[1m]) < 1 and ON (instance) tikv_cdc_region_resolve_status{status="resolved"} > 0 and ON (instance) tikv_cdc_captured_region_total > 0` - Description: diff --git a/ticdc/ticdc-changefeed-config.md b/ticdc/ticdc-changefeed-config.md index 8bf5671154d39..64eb617a93c75 100644 --- a/ticdc/ticdc-changefeed-config.md +++ b/ticdc/ticdc-changefeed-config.md @@ -138,7 +138,7 @@ enable-table-across-nodes = false # ] # The protocol configuration item specifies the protocol format used for encoding messages. -# When the downstream is Kafka, the protocol can only be canal-json, avro, or open-protocol. +# When the downstream is Kafka, the protocol can be canal-json, avro, debezium, open-protocol, or simple. # When the downstream is Pulsar, the protocol can only be canal-json. # When the downstream is a storage service, the protocol can only be canal-json or csv. # Note: This configuration item only takes effect if the downstream is Kafka, Pulsar, or a storage service. @@ -161,7 +161,7 @@ delete-only-output-handle-key-columns = false # encoder-concurrency = 32 # Specifies whether to enable kafka-sink-v2 that uses the kafka-go sink library. -# Note: This configuration item only takes effect if the downstream is MQ. +# Note: This configuration item is experimental, and only takes effect if the downstream is MQ. # The default value is false. # enable-kafka-sink-v2 = false @@ -194,6 +194,28 @@ enable-partition-separator = true # The encoding method of binary data, which can be 'base64' or 'hex'. The default value is 'base64'. # binary-encoding-method = 'base64' +# Starting from v8.0.0, TiCDC supports the Simple message encoding protocol. The following are the configuration parameters for the Simple protocol. +# For more information about the protocol, see . +# The following configuration parameters control the sending behavior of bootstrap messages. +# send-bootstrap-interval-in-sec controls the time interval for sending bootstrap messages, in seconds. +# The default value is 120 seconds, which means that a bootstrap message is sent every 120 seconds for each table. +# send-bootstrap-interval-in-sec = 120 + +# send-bootstrap-in-msg-count controls the message interval for sending bootstrap, in message count. +# The default value is 10000, which means that a bootstrap message is sent every 10000 row changed messages for each table. +# send-bootstrap-in-msg-count = 10000 +# Note: If you want to disable the sending of bootstrap messages, set both send-bootstrap-interval-in-sec and send-bootstrap-in-msg-count to 0. + +# send-bootstrap-to-all-partition controls whether to send bootstrap messages to all partitions. +# The default value is true, which means that bootstrap messages are sent to all partitions of the corresponding table topic. +# Setting it to false means bootstrap messages are sent to only the first partition of the corresponding table topic. +# send-bootstrap-to-all-partition = true + +[sink.kafka-config.codec-config] +# encoding-format controls the encoding format of the Simple protocol messages. Currently, the Simple protocol message supports "json" and "avro" encoding formats. +# The default value is "json". +# encoding-format = "json" + # Specifies the replication consistency configurations for a changefeed when using the redo log. For more information, see https://docs.pingcap.com/tidb/stable/ticdc-sink-to-mysql#eventually-consistent-replication-in-disaster-scenarios. # Note: The consistency-related configuration items only take effect when the downstream is a database and the redo log feature is enabled. [consistent] diff --git a/ticdc/ticdc-debezium.md b/ticdc/ticdc-debezium.md new file mode 100644 index 0000000000000..253c3b9da6016 --- /dev/null +++ b/ticdc/ticdc-debezium.md @@ -0,0 +1,138 @@ +--- +title: TiCDC Debezium Protocol +summary: Learn the concept of the TiCDC Debezium Protocol and how to use it. +--- + +# TiCDC Debezium Protocol + +[Debezium](https://debezium.io/) is a tool for capturing database changes. It converts each captured database change into a message called an "event" and sends these events to Kafka. Starting from v8.0.0, TiCDC supports sending TiDB changes to Kafka using a Debezium style output format, simplifying migration from MySQL databases for users who had previously been using Debezium's MySQL integration. + +## Use the Debezium message format + +When you use Kafka as the downstream sink, specify the `protocol` field as `debezium` in `sink-uri` configuration. Then TiCDC encapsulates the Debezium messages based on the events and sends TiDB data change events to the downstream. + +Currently, the Debezium protocol only supports Row Changed events and directly ignores DDL events and WATERMARK events. A Row changed event represents a data change in a row. When a row changes, the Row Changed event is sent, including relevant information about the row both before and after the change. A WATERMARK event marks the replication progress of a table, indicating that all events earlier than the watermark have been sent to the downstream. + +The configuration example for using the Debezium message format is as follows: + +```shell +cdc cli changefeed create --server=http://127.0.0.1:8300 --changefeed-id="kafka-debezium" --sink-uri="kafka://127.0.0.1:9092/topic-name?kafka-version=2.4.0&protocol=debezium" +``` + +The Debezium output format contains the schema information of the current row so that downstream consumers can better understand the data structure of the current row. For scenarios where schema information is unnecessary, you can also disable the schema output by setting the `debezium-disable-schema` parameter to `true` in the changefeed configuration file or `sink-uri`. + +In addition, the original Debezium format does not include important fields such as the unique transaction identifier of the `CommitTS` in TiDB. To ensure data integrity, TiCDC adds two fields, `CommitTs` and `ClusterID`, to the Debezium format to identify the relevant information of TiDB data changes. + +## Message format definition + +This section describes the format definition of the DML event output in the Debezium format. + +### DML event + +TiCDC encodes a DML event in the following format: + +```json +{ + "payload":{ + "ts_ms":1707103832957, + "transaction":null, + "op":"c", + "before":null, + "after":{ + "a":4, + "b":2 + }, + "source":{ + "version":"2.4.0.Final", + "connector":"TiCDC", + "name":"default", + "ts_ms":1707103832263, + "snapshot":"false", + "db":"test", + "table":"t2", + "server_id":0, + "gtid":null, + "file":"", + "pos":0, + "row":0, + "thread":0, + "query":null, + "commit_ts":447507027004751877, + "cluster_id":"default" + } + }, + "schema":{ + "type":"struct", + "optional":false, + "name":"default.test.t2.Envelope", + "version":1, + "fields":{ + { + "type":"struct", + "optional":true, + "name":"default.test.t2.Value", + "field":"before", + "fields":[ + { + "type":"int32", + "optional":false, + "field":"a" + }, + { + "type":"int32", + "optional":true, + "field":"b" + } + ] + }, + { + "type":"struct", + "optional":true, + "name":"default.test.t2.Value", + "field":"after", + "fields":[ + { + "type":"int32", + "optional":false, + "field":"a" + }, + { + "type":"int32", + "optional":true, + "field":"b" + } + ] + }, + { + "type":"string", + "optional":false, + "field":"op" + }, + ... + } + } +} +``` + +The key fields of the preceding JSON data are explained as follows: + +| Field | Type | Description | +|:----------|:-------|:-------------------------------------------------------| +| payload.op | String | The type of the change event. `"c"` indicates an `INSERT` event, `"u"` indicates an `UPDATE` event, and `"d"` indicates a `DELETE` event. | +| payload.ts_ms | Number | The timestamp (in milliseconds) when TiCDC generates this message. | +| payload.before | JSON | The data value before the change event of a statement. For `"c"` events, the value of the `before` field is `null`. | +| payload.after | JSON | The data value after the change event of a statement. For `"d"` events, the value of the `after` field is `null`. | +| payload.source.commit_ts | Number | The `CommitTs` identifier when TiCDC generates this message. | +| payload.source.db | String | The name of the database where the event occurs. | +| payload.source.table | String | The name of the table where the event occurs. | +| schema.fields | JSON | The type information of each field in the payload, including the schema information of the row data before and after the change. | + +### Data type mapping + +The data format mapping in the TiCDC Debezium message basically follows the [Debezium data type mapping rules](https://debezium.io/documentation/reference/2.4/connectors/mysql.html#mysql-data-types), which is generally consistent with the native message of the Debezium Connector for MySQL. However, for some data types, the following differences exist between TiCDC Debezium and Debezium Connector messages: + +- Currently, TiDB does not support spatial data types, including GEOMETRY, LINESTRING, POLYGON, MULTIPOINT, MULTILINESTRING, MULTIPOLYGON, and GEOMETRYCOLLECTION. + +- For string-like data types, including Varchar, String, VarString, TinyBlob, MediumBlob, BLOB, and LongBlob, when the column has the BINARY flag, TiCDC encodes it as a String type after encoding it in Base64; when the column does not have the BINARY flag, TiCDC encodes it directly as a String type. The native Debezium Connector encodes it in different ways according to `binary.handling.mode`. + +- For the Decimal data type, including `DECIMAL` and `NUMERIC`, TiCDC uses the float64 type to represent it. The native Debezium Connector encodes it in float32 or float64 according to the different precision of the data type. diff --git a/ticdc/ticdc-faq.md b/ticdc/ticdc-faq.md index 41a2454713f4b..28b64d442dd98 100644 --- a/ticdc/ticdc-faq.md +++ b/ticdc/ticdc-faq.md @@ -108,20 +108,20 @@ If you use the `cdc cli changefeed create` command without specifying the `-conf - Replicates all tables except system tables - Only replicates tables that contain [valid indexes](/ticdc/ticdc-overview.md#best-practices) -## Does TiCDC support outputting data changes in the Canal format? +## Does TiCDC support outputting data changes in the Canal protocol? -Yes. To enable Canal output, specify the protocol as `canal` in the `--sink-uri` parameter. For example: +Yes. Note that for the Canal protocol, TiCDC only supports the JSON output format, while the protobuf format is not officially supported yet. To enable Canal output, specify `protocol` as `canal-json` in the `--sink-uri` configuration. For example: {{< copyable "shell-regular" >}} ```shell -cdc cli changefeed create --server=http://127.0.0.1:8300 --sink-uri="kafka://127.0.0.1:9092/cdc-test?kafka-version=2.4.0&protocol=canal" --config changefeed.toml +cdc cli changefeed create --server=http://127.0.0.1:8300 --sink-uri="kafka://127.0.0.1:9092/cdc-test?kafka-version=2.4.0&protocol=canal-json" --config changefeed.toml ``` > **Note:** > > * This feature is introduced in TiCDC 4.0.2. -> * TiCDC currently supports outputting data changes in the Canal format only to MQ sinks such as Kafka. +> * TiCDC currently supports outputting data changes in the Canal-JSON format only to MQ sinks such as Kafka. For more information, refer to [TiCDC changefeed configurations](/ticdc/ticdc-changefeed-config.md). diff --git a/ticdc/ticdc-filter.md b/ticdc/ticdc-filter.md index 6ad1c17ec9250..fbd0273eb9ed1 100644 --- a/ticdc/ticdc-filter.md +++ b/ticdc/ticdc-filter.md @@ -57,31 +57,55 @@ Description of configuration parameters: - `matcher`: the database and table that this event filter rule applies to. The syntax is the same as [table filter](/table-filter.md). - `ignore-event`: the event type to be ignored. This parameter accepts an array of strings. You can configure multiple event types. Currently, the following event types are supported: -| Event | Type | Alias | Description | -| --------------- | ---- | -|--------------------------| -| all dml | | |Matches all DML events | -| all ddl | | |Matches all DDL events | -| insert | DML | |Matches `insert` DML event | -| update | DML | |Matches `update` DML event | -| delete | DML | |Matches `delete` DML event | -| create schema | DDL | create database |Matches `create database` event | -| drop schema | DDL | drop database |Matches `drop database` event | -| create table | DDL | |Matches `create table` event | -| drop table | DDL | |Matches `drop table` event | -| rename table | DDL | |Matches `rename table` event | -| truncate table | DDL | |Matches `truncate table` event | -| alter table | DDL | |Matches `alter table` event, including all clauses of `alter table`, `create index` and `drop index` | -| add table partition | DDL | |Matches `add table partition` event | -| drop table partition | DDL | |Matches `drop table partition` event | -| truncate table partition | DDL | |Matches `truncate table partition` event | -| create view | DDL | |Matches `create view`event | -| drop view | DDL | |Matches `drop view` event | - -- `ignore-sql`: the DDL statements to be ignored. This parameter accepts an array of strings, in which you can configure multiple regular expressions. This rule only applies to DDL events. -- `ignore-delete-value-expr`: this parameter accepts a SQL expression. This rule only applies to delete DML events with the specified value. -- `ignore-insert-value-expr`: this parameter accepts a SQL expression. This rule only applies to insert DML events with the specified value. -- `ignore-update-old-value-expr`: this parameter accepts a SQL expression. This rule only applies to update DML events whose old value contains the specified value. -- `ignore-update-new-value-expr`: this parameter accepts a SQL expression. This rule only applies to update DML events whose new value contains the specified value. + | Event | Type | Alias | Description | + | --------------- | ---- | -|--------------------------| + | all dml | | |Matches all DML events | + | all ddl | | |Matches all DDL events | + | insert | DML | |Matches `insert` DML event | + | update | DML | |Matches `update` DML event | + | delete | DML | |Matches `delete` DML event | + | create schema | DDL | create database |Matches `create database` event | + | drop schema | DDL | drop database |Matches `drop database` event | + | create table | DDL | |Matches `create table` event | + | drop table | DDL | |Matches `drop table` event | + | rename table | DDL | |Matches `rename table` event | + | truncate table | DDL | |Matches `truncate table` event | + | alter table | DDL | |Matches `alter table` event, including all clauses of `alter table`, `create index` and `drop index` | + | add table partition | DDL | |Matches `add table partition` event | + | drop table partition | DDL | |Matches `drop table partition` event | + | truncate table partition | DDL | |Matches `truncate table partition` event | + | create view | DDL | |Matches `create view`event | + | drop view | DDL | |Matches `drop view` event | + | modify schema charset and collate | DDL | |Matches `modify schema charset and collate` event | + | recover table | DDL | |Matches `recover table` event | + | rebase auto id | DDL | |Matches `rebase auto id` event | + | modify table comment | DDL | |Matches `modify table comment` event | + | modify table charset and collate | DDL | |Matches `modify table charset and collate` event | + | exchange table partition | DDL | |Matches `exchange table partition` event | + | reorganize table partition | DDL | |Matches `reorganize table partition` event | + | alter table partitioning | DDL | |Matches `alter table partitioning` event | + | remove table partitioning | DDL | |Matches `remove table partitioning` event | + | add column | DDL | |Matches `add column` event | + | drop column | DDL | |Matches `drop column` event | + | modify column | DDL | |Matches `modify column` event | + | set default value | DDL | |Matches `set default value` event | + | add primary key | DDL | |Matches `add primary key` event | + | drop primary key | DDL | |Matches `drop primary key` event | + | rename index | DDL | |Matches `rename index` event | + | alter index visibility | DDL | |Matches `alter index visibility` event | + | alter ttl info | DDL | |Matches `alter ttl info` event | + | alter ttl remove| DDL | |Matches DDL events that remove all TTL attributes of a table | + | multi schema change | DDL | |Matches DDL events that change multiple attributes of a table within the same DDL statement | + + > **Note:** + > + > TiDB's DDL statements support changing multiple attributes of a single table at the same time, such as `ALTER TABLE t MODIFY COLUMN a INT, ADD COLUMN b INT, DROP COLUMN c;`. This operation is defined as MultiSchemaChange. If you want to filter out this type of DDL, you need to configure `"multi schema change"` in `ignore-event`. + +- `ignore-sql`: the regular expressions of the DDL statements to be filtered out. This parameter accepts an array of strings, in which you can configure multiple regular expressions. This configuration only applies to DDL events. +- `ignore-delete-value-expr`: this parameter accepts a SQL expression that follows the default SQL mode, used to filter out the `DELETE` type of DML events with a specified value. +- `ignore-insert-value-expr`: this parameter accepts a SQL expression that follows the default SQL mode, used to filter out the `INSERT` type of DML events with a specified value. +- `ignore-update-old-value-expr`: this parameter accepts a SQL expression that follows the default SQL mode, used to filter out the `UPDATE` type of DML events with a specified old value. +- `ignore-update-new-value-expr`: this parameter accepts a SQL expression that follows the default SQL mode, used to filter out the `UPDATE` DML events with a specified new value. > **Note:** > diff --git a/ticdc/ticdc-manage-changefeed.md b/ticdc/ticdc-manage-changefeed.md index a53d99f6b967a..d3dc7ce9bf5b8 100644 --- a/ticdc/ticdc-manage-changefeed.md +++ b/ticdc/ticdc-manage-changefeed.md @@ -81,7 +81,7 @@ cdc cli changefeed query --server=http://10.0.10.25:8300 --changefeed-id=simple- ```shell { "info": { - "sink-uri": "mysql://127.0.0.1:3306/?max-txn-row=20\u0026worker-number=4", + "sink-uri": "mysql://127.0.0.1:3306/?max-txn-row=20\u0026worker-count=4", "opts": {}, "create-time": "2020-08-27T10:33:41.687983832+08:00", "start-ts": 419036036249681921, @@ -196,7 +196,7 @@ TiCDC supports modifying the configuration of the replication task (not dynamica ```shell cdc cli changefeed pause -c test-cf --server=http://10.0.10.25:8300 -cdc cli changefeed update -c test-cf --server=http://10.0.10.25:8300 --sink-uri="mysql://127.0.0.1:3306/?max-txn-row=20&worker-number=8" --config=changefeed.toml +cdc cli changefeed update -c test-cf --server=http://10.0.10.25:8300 --sink-uri="mysql://127.0.0.1:3306/?max-txn-row=20&worker-count=8" --config=changefeed.toml cdc cli changefeed resume -c test-cf --server=http://10.0.10.25:8300 ``` diff --git a/ticdc/ticdc-open-api-v2.md b/ticdc/ticdc-open-api-v2.md index b464167b45be4..a31b226f87093 100644 --- a/ticdc/ticdc-open-api-v2.md +++ b/ticdc/ticdc-open-api-v2.md @@ -158,15 +158,6 @@ This interface is used to submit a replication task to TiCDC. If the request is "enable_old_value": true, "enable_sync_point": true, "filter": { - "do_dbs": [ - "string" - ], - "do_tables": [ - { - "database_name": "string", - "table_name": "string" - } - ], "event_filters": [ { "ignore_delete_value_expr": "string", @@ -184,15 +175,6 @@ This interface is used to submit a replication task to TiCDC. If the request is ] } ], - "ignore_dbs": [ - "string" - ], - "ignore_tables": [ - { - "database_name": "string", - "table_name": "string" - } - ], "ignore_txn_start_ts": [ 0 ], @@ -297,10 +279,6 @@ The `filter` parameters are described as follows: | Parameter name | Description | |:-----------------|:---------------------------------------| -| `do_dbs` | `STRING ARRAY` type. The databases to be replicated. (Optional) | -| `do_tables` | The tables to be replicated. (Optional) | -| `ignore_dbs` | `STRING ARRAY` type. The databases to be ignored. (Optional) | -| `ignore_tables` | The tables to be ignored. (Optional) | | `event_filters` | The configuration to filter events. (Optional) | | `ignore_txn_start_ts` | `UINT64 ARRAY` type. Specifying this will ignore transactions that specify `start_ts`, such as `[1, 2]`. (Optional) | | `rules` | `STRING ARRAY` type. The rules for table schema filtering, such as `['foo*.*', 'bar*.*']`. For more information, see [Table Filter](/table-filter.md). (Optional) | @@ -332,7 +310,7 @@ The `sink` parameters are described as follows: | `date_separator` | `STRING` type. Indicates the date separator type of the file directory. Value options are `none`, `year`, `month`, and `day`. `none` is the default value and means that the date is not separated. (Optional) | | `dispatchers` | An configuration array for event dispatching. (Optional) | | `encoder_concurrency` | `INT` type. The number of encoder threads in the MQ sink. The default value is `16`. (Optional) | -| `protocol` | `STRING` type. For MQ sinks, you can specify the protocol format of the message. The following protocols are currently supported: `canal-json`, `open-protocol`, `canal`, `avro`, and `maxwell`. | +| `protocol` | `STRING` type. For MQ sinks, you can specify the protocol format of the message. The following protocols are currently supported: `canal-json`, `open-protocol`, `avro`, and `maxwell`. | | `schema_registry` | `STRING` type. The schema registry address. (Optional) | | `terminator` | `STRING` type. The terminator is used to separate two data change events. The default value is null, which means `"\r\n"` is used as the terminator. (Optional) | | `transaction_atomicity` | `STRING` type. The atomicity level of the transaction. (Optional) | @@ -412,15 +390,6 @@ If the request is successful, `200 OK` is returned. If the request fails, an err "enable_old_value": true, "enable_sync_point": true, "filter": { - "do_dbs": [ - "string" - ], - "do_tables": [ - { - "database_name": "string", - "table_name": "string" - } - ], "event_filters": [ { "ignore_delete_value_expr": "string", @@ -438,15 +407,6 @@ If the request is successful, `200 OK` is returned. If the request fails, an err ] } ], - "ignore_dbs": [ - "string" - ], - "ignore_tables": [ - { - "database_name": "string", - "table_name": "string" - } - ], "ignore_txn_start_ts": [ 0 ], @@ -616,15 +576,6 @@ To modify the changefeed configuration, follow the steps of `pause the replicati "enable_old_value": true, "enable_sync_point": true, "filter": { - "do_dbs": [ - "string" - ], - "do_tables": [ - { - "database_name": "string", - "table_name": "string" - } - ], "event_filters": [ { "ignore_delete_value_expr": "string", @@ -642,15 +593,6 @@ To modify the changefeed configuration, follow the steps of `pause the replicati ] } ], - "ignore_dbs": [ - "string" - ], - "ignore_tables": [ - { - "database_name": "string", - "table_name": "string" - } - ], "ignore_txn_start_ts": [ 0 ], diff --git a/ticdc/ticdc-open-api.md b/ticdc/ticdc-open-api.md index 2194e7752ebe9..ea24874ea44a3 100644 --- a/ticdc/ticdc-open-api.md +++ b/ticdc/ticdc-open-api.md @@ -163,7 +163,7 @@ The configuration parameters of sink are as follows: `matcher`: The matching syntax of matcher is the same as the filter rule syntax. -`protocol`: For the sink of MQ type, you can specify the protocol format of the message. Currently the following protocols are supported: `canal-json`, `open-protocol`, `canal`, `avro`, and `maxwell`. +`protocol`: For the sink of MQ type, you can specify the protocol format of the message. Currently the following protocols are supported: `canal-json`, `open-protocol`, `avro`, and `maxwell`. ### Example diff --git a/ticdc/ticdc-overview.md b/ticdc/ticdc-overview.md index 094b8f1b35046..c3b3858380000 100644 --- a/ticdc/ticdc-overview.md +++ b/ticdc/ticdc-overview.md @@ -24,7 +24,7 @@ TiCDC has the following key capabilities: - Replicating incremental data between TiDB clusters with second-level RPO and minute-level RTO. - Bidirectional replication between TiDB clusters, allowing the creation of a multi-active TiDB solution using TiCDC. - Replicating incremental data from a TiDB cluster to a MySQL database or other MySQL-compatible databases with low latency. -- Replicating incremental data from a TiDB cluster to a Kafka cluster. The recommended data format includes [Canal-JSON](/ticdc/ticdc-canal-json.md) and [Avro](/ticdc/ticdc-avro-protocol.md). +- Replicating incremental data from a TiDB cluster to a Kafka cluster. The recommended data format includes [Canal-JSON](/ticdc/ticdc-canal-json.md), [Avro](/ticdc/ticdc-avro-protocol.md), and [Debezium](/ticdc/ticdc-debezium.md). - Replicating incremental data from a TiDB cluster to storage services, such as Amazon S3, GCS, Azure Blob Storage, and NFS. - Replicating tables with the ability to filter databases, tables, DMLs, and DDLs. - High availability with no single point of failure, supporting dynamically adding and deleting TiCDC nodes. diff --git a/ticdc/ticdc-simple-protocol.md b/ticdc/ticdc-simple-protocol.md new file mode 100644 index 0000000000000..def6e7f3f78d9 --- /dev/null +++ b/ticdc/ticdc-simple-protocol.md @@ -0,0 +1,714 @@ +--- +title: TiCDC Simple Protocol +summary: Learn how to use the TiCDC Simple protocol and the data format implementation. +--- + +# TiCDC Simple Protocol + +TiCDC Simple protocol is a row-level data change notification protocol that provides data sources for monitoring, caching, full-text indexing, analysis engines, and primary-secondary replication between heterogeneous databases. This document describes how to use the TiCDC Simple protocol and the data format implementation. + +## Use the TiCDC Simple protocol + +When you use Kafka as the downstream, specify `protocol` as `"simple"` in the changefeed configuration. Then TiCDC encodes each row change or DDL event as a message, and sends the data change event to the downstream. + +The configuration example for using the Simple protocol is as follows: + +`sink-uri` configuration: + +```shell +--sink-uri = "kafka://127.0.0.1:9092/topic-name?kafka-version=2.4.0" +``` + +Changefeed configuration: + +```toml +[sink] +protocol = "simple" + +# The following configuration parameters control the sending behavior of bootstrap messages. +# send-bootstrap-interval-in-sec controls the time interval for sending bootstrap messages, in seconds. +# The default value is 120 seconds, which means that a bootstrap message is sent every 120 seconds for each table. +send-bootstrap-interval-in-sec = 120 + +# send-bootstrap-in-msg-count controls the message interval for sending bootstrap, in message count. +# The default value is 10000, which means that a bootstrap message is sent every 10000 row changed messages for each table. +send-bootstrap-in-msg-count = 10000 +# Note: If you want to disable the sending of bootstrap messages, set both send-bootstrap-interval-in-sec and send-bootstrap-in-msg-count to 0. + +# send-bootstrap-to-all-partition controls whether to send bootstrap messages to all partitions. +# The default value is true, which means that bootstrap messages are sent to all partitions of the corresponding table topic. +# Setting it to false means bootstrap messages are sent to only the first partition of the corresponding table topic. +send-bootstrap-to-all-partition = true + +[sink.kafka-config.codec-config] +# encoding-format controls the encoding format of the Simple protocol messages. Currently, the Simple protocol message supports "json" and "avro" encoding formats. +# The default value is "json". +encoding-format = "json" +``` + +## Message types + +The TiCDC Simple protocol has the following message types. + +DDL: + +- `CREATE`: the creating table event. +- `RENAME`: the renaming table event. +- `CINDEX`: the creating index event. +- `DINDEX`: the deleting index event. +- `ERASE`: the deleting table event. +- `TRUNCATE`: the truncating table event. +- `ALTER`: the altering table event, including adding columns, dropping columns, modifying column types, and other `ALTER TABLE` statements supported by TiCDC. +- `QUERY`: other DDL events. + +DML: + +- `INSERT`: the inserting event. +- `UPDATE`: the updating event. +- `DELETE`: the deleting event. + +Other: + +- `WATERMARK`: containing a TSO (that is, a 64-bit timestamp) of the upstream TiDB cluster, which marks the table replication progress. All events earlier than the watermark have been sent to the downstream. +- `BOOTSTRAP`: containing the schema information of a table, used to build the table schema for the downstream. + +## Message format + +In the Simple protocol, each message contains only one event. The Simple protocol supports encoding messages in JSON and Avro formats. This document uses JSON format as an example. For Avro format messages, their fields and meanings are the same as those in JSON format messages, but the encoding format is different. For details about the Avro format, see [Simple Protocol Avro Schema](https://github.com/pingcap/tiflow/blob/master/pkg/sink/codec/simple/message.json). + +### DDL + +TiCDC encodes a DDL event in the following JSON format: + +```json +{ + "version":1, + "type":"ALTER", + "sql":"ALTER TABLE `user` ADD COLUMN `createTime` TIMESTAMP", + "commitTs":447987408682614795, + "buildTs":1708936343598, + "tableSchema":{ + "schema":"simple", + "table":"user", + "tableID":148, + "version":447987408682614791, + "columns":[ + { + "name":"id", + "dataType":{ + "mysqlType":"int", + "charset":"binary", + "collate":"binary", + "length":11 + }, + "nullable":false, + "default":null + }, + { + "name":"name", + "dataType":{ + "mysqlType":"varchar", + "charset":"utf8mb4", + "collate":"utf8mb4_bin", + "length":255 + }, + "nullable":true, + "default":null + }, + { + "name":"age", + "dataType":{ + "mysqlType":"int", + "charset":"binary", + "collate":"binary", + "length":11 + }, + "nullable":true, + "default":null + }, + { + "name":"score", + "dataType":{ + "mysqlType":"float", + "charset":"binary", + "collate":"binary", + "length":12 + }, + "nullable":true, + "default":null + }, + { + "name":"createTime", + "dataType":{ + "mysqlType":"timestamp", + "charset":"binary", + "collate":"binary", + "length":19 + }, + "nullable":true, + "default":null + } + ], + "indexes":[ + { + "name":"primary", + "unique":true, + "primary":true, + "nullable":false, + "columns":[ + "id" + ] + } + ] + }, + "preTableSchema":{ + "schema":"simple", + "table":"user", + "tableID":148, + "version":447984074911121426, + "columns":[ + { + "name":"id", + "dataType":{ + "mysqlType":"int", + "charset":"binary", + "collate":"binary", + "length":11 + }, + "nullable":false, + "default":null + }, + { + "name":"name", + "dataType":{ + "mysqlType":"varchar", + "charset":"utf8mb4", + "collate":"utf8mb4_bin", + "length":255 + }, + "nullable":true, + "default":null + }, + { + "name":"age", + "dataType":{ + "mysqlType":"int", + "charset":"binary", + "collate":"binary", + "length":11 + }, + "nullable":true, + "default":null + }, + { + "name":"score", + "dataType":{ + "mysqlType":"float", + "charset":"binary", + "collate":"binary", + "length":12 + }, + "nullable":true, + "default":null + } + ], + "indexes":[ + { + "name":"primary", + "unique":true, + "primary":true, + "nullable":false, + "columns":[ + "id" + ] + } + ] + } +} +``` + +The fields in the preceding JSON data are explained as follows: + +| Field | Type | Description | +| ------------- | ------- | ------------------------------------------------------------- | +| `version` | Number | The version number of the protocol, which is currently `1`. | +| `type` | String | The DDL event type, including `CREATE`, `RENAME`, `CINDEX`, `DINDEX`, `ERASE`, `TRUNCATE`, `ALTER`, and `QUERY`. | +| `sql` | String | The DDL statement. | +| `commitTs` | Number | The commit timestamp when the DDL statement execution is completed in the upstream. | +| `buildTs` | Number | The UNIX timestamp when the message is successfully encoded within TiCDC. | +| `tableSchema` | Object | The current schema information of the table. For more information, see [TableSchema definition](#tableschema-definition). | +| `preTableSchema` | Object | The schema information of the table before the DDL statement is executed. All DDL events, except the `CREATE` type of DDL event, have this field. | + +### DML + +#### INSERT + +TiCEC encodes an `INSERT` event in the following JSON format: + +```json +{ + "version":1, + "database":"simple", + "table":"user", + "tableID":148, + "type":"INSERT", + "commitTs":447984084414103554, + "buildTs":1708923662983, + "schemaVersion":447984074911121426, + "data":{ + "age":"25", + "id":"1", + "name":"John Doe", + "score":"90.5" + } +} +``` + +The fields in the preceding JSON data are explained as follows: + +| Field | Type | Description | +| ------------- | ------- | --------------------------------------------------------- | +| `version` | Number | The version number of the protocol, which is currently `1`. | +| `database` | String | The name of the database. | +| `table` | String | The name of the table. | +| `tableID` | Number | The ID of the table. | +| `type` | String | The DML event type, including `INSERT`, `UPDATE`, and `DELETE`. | +| `commitTs` | Number | The commit timestamp when the DML statement execution is completed in the upstream. | +| `buildTs` | Number | The UNIX timestamp when the message is successfully encoded within TiCDC. | +| `schemaVersion` | Number | The schema version number of the table when the DML message is encoded. | +| `data` | Object | The inserted data, where the field name is the column name and the field value is the column value. | + +The `INSERT` event contains the `data` field, and does not contain the `old` field. + +#### UPDATE + +TiCDC encodes an `UPDATE` event in the following JSON format: + +```json +{ + "version":1, + "database":"simple", + "table":"user", + "tableID":148, + "type":"UPDATE", + "commitTs":447984099186180098, + "buildTs":1708923719184, + "schemaVersion":447984074911121426, + "data":{ + "age":"25", + "id":"1", + "name":"John Doe", + "score":"95" + }, + "old":{ + "age":"25", + "id":"1", + "name":"John Doe", + "score":"90.5" + } +} +``` + +The fields in the preceding JSON data are explained as follows: + +| Field | Type | Description | +| ------------- | ------- | --------------------------------------------------------- | +| `version` | Number | The version number of the protocol, which is currently `1`. | +| `database` | String | The name of the database. | +| `table` | String | The name of the table. | +| `tableID` | Number | The ID of the table. | +| `type` | String | The DML event type, including `INSERT`, `UPDATE`, and `DELETE`. | +| `commitTs` | Number | The commit timestamp when the DML statement execution is completed in the upstream. | +| `buildTs` | Number | The UNIX timestamp when the message is successfully encoded within TiCDC. | +| `schemaVersion` | Number | The schema version number of the table when the DML message is encoded. | +| `data` | Object | The data after updating, where the field name is the column name and the field value is the column value. | +| `old` | Object | The data before updating, where the field name is the column name and the field value is the column value. | + +The `UPDATE` event contains both the `data` and `old` fields, which represent the data after and before updating respectively. + +#### DELETE + +TiCDC encodes a `DELETE` event in the following JSON format: + +```json +{ + "version":1, + "database":"simple", + "table":"user", + "tableID":148, + "type":"DELETE", + "commitTs":447984114259722243, + "buildTs":1708923776484, + "schemaVersion":447984074911121426, + "old":{ + "age":"25", + "id":"1", + "name":"John Doe", + "score":"95" + } +} +``` + +The fields in the preceding JSON data are explained as follows: + +| Field | Type | Description | +| ------------- | ------- | --------------------------------------------------------- | +| `version` | Number | The version number of the protocol, which is currently `1`. | +| `database` | String | The name of the database. | +| `table` | String | The name of the table. | +| `tableID` | Number | The ID of the table. | +| `type` | String | The DML event type, including `INSERT`, `UPDATE`, and `DELETE`. | +| `commitTs` | Number | The commit timestamp when the DML statement execution is completed in the upstream. | +| `buildTs` | Number | The UNIX timestamp when the message is successfully encoded within TiCDC. | +| `schemaVersion` | Number | The schema version number of the table when the DML message is encoded. | +| `old` | Object | The deleted data, where the field name is the column name and the field value is the column value. | + +The `DELETE` event contains the `old` field, and does not contain the `data` field. + +### WATERMARK + +TiCDC encodes a `WATERMARK` event in the following JSON format: + +```json +{ + "version":1, + "type":"WATERMARK", + "commitTs":447984124732375041, + "buildTs":1708923816911 +} +``` + +The fields in the preceding JSON data are explained as follows: + +| Field | Type | Description | +| ------------- | ------- | --------------------------------------------------------- | +| `version` | Number | The version number of the protocol, which is currently `1`. | +| `type` | String | The `WATERMARK` event type. | +| `commitTs` | Number | The commit timestamp of the `WATERMARK`. | +| `buildTs` | Number | The UNIX timestamp when the message is successfully encoded within TiCDC. | + +### BOOTSTRAP + +TiCDC encodes a `BOOTSTRAP` event in the following JSON format: + +```json +{ + "version":1, + "type":"BOOTSTRAP", + "commitTs":0, + "buildTs":1708924603278, + "tableSchema":{ + "schema":"simple", + "table":"new_user", + "tableID":148, + "version":447984074911121426, + "columns":[ + { + "name":"id", + "dataType":{ + "mysqlType":"int", + "charset":"binary", + "collate":"binary", + "length":11 + }, + "nullable":false, + "default":null + }, + { + "name":"name", + "dataType":{ + "mysqlType":"varchar", + "charset":"utf8mb4", + "collate":"utf8mb4_bin", + "length":255 + }, + "nullable":true, + "default":null + }, + { + "name":"age", + "dataType":{ + "mysqlType":"int", + "charset":"binary", + "collate":"binary", + "length":11 + }, + "nullable":true, + "default":null + }, + { + "name":"score", + "dataType":{ + "mysqlType":"float", + "charset":"binary", + "collate":"binary", + "length":12 + }, + "nullable":true, + "default":null + } + ], + "indexes":[ + { + "name":"primary", + "unique":true, + "primary":true, + "nullable":false, + "columns":[ + "id" + ] + } + ] + } +} +``` + +The fields in the preceding JSON data are explained as follows: + +| Field | Type | Description | +| ------------- | ------- | --------------------------------------------------------- | +| `version` | Number | The version number of the protocol, which is currently `1`. | +| `type` | String | The `BOOTSTRAP` event type. | +| `commitTs` | Number | The `commitTs` of the `BOOTSTRAP` is `0`. Because it is generated internally by TiCDC, its `commitTs` is meaningless. | +| `buildTs` | Number | The UNIX timestamp when the message is successfully encoded within TiCDC. | +| `tableSchema` | Object | The schema information of the table. For more information, see [TableSchema definition](#tableschema-definition). | + +## Message generation and sending rules + +### DDL + +- Generation time: TiCDC sends a DDL event after all transactions before this DDL event have been sent. +- Destination: TiCDC sends DDL events to all partitions of the corresponding topic. + +### DML + +- Generation time: TiCDC sends DML events in the order of the `commitTs` of the transaction. +- Destination: TiCDC sends DDL events to the corresponding partition of the corresponding topic according to the user-configured dispatch rules. + +### WATERMARK + +- Generation time: TiCDC sends `WATERMARK` events periodically to mark the replication progress of a changefeed. The current interval is 1 second. +- Destination: TiCDC sends `WATERMARK` events to all partitions of the corresponding topic. + +### BOOTSTRAP + +- Generation time: + - After creating a new changefeed, before the first DML event of a table is sent, TiCDC sends a `BOOTSTRAP` event to the downstream to build the table schema. + - Additionally, TiCDC sends `BOOTSTRAP` events periodically to allow newly joined consumers to build the table schema. The default interval is 120 seconds or every 10000 messages. You can adjust the sending interval by configuring the `send-bootstrap-interval-in-sec` and `send-bootstrap-in-msg-count` parameters in the `sink` configuration. + - If a table does not receive any new DML messages within 30 minutes, the table is considered inactive. TiCDC stops sending `BOOTSTRAP` events for the table until new DML events are received. +- Destination: By default, TiCDC sends `BOOTSTRAP` events to all partitions of the corresponding topic. You can adjust the sending strategy by configuring the `send-bootstrap-to-all-partition` parameter in the sink configuration. + +## Message consumption methods + +Because the TiCDC Simple protocol does not include the schema information of the table when sending a DML message, the downstream needs to receive the DDL or BOOTSTRAP message and cache the schema information of the table before consuming a DML message. When receiving a DML message, the downstream obtains the corresponding table schema information from the cache by searching the `table` name and `schemaVersion` fields of the DML message, and then correctly consumes the DML message. + +The following describes how the downstream consumes DML messages based on DDL or BOOTSTRAP messages. According to preceding descriptions, the following information is known: + +- Each DML message contains a `schemaVersion` field to mark the schema version number of the table corresponding to the DML message. +- Each DDL message contains a `tableSchema` and `preTableSchema` field to mark the schema information of the table before and after the DDL event. +- Each BOOTSTRAP message contains a `tableSchema` field to mark the schema information of the table corresponding to the BOOTSTRAP message. + +The consumption methods are introduced in the following two scenarios. + +### Scenario 1: The consumer starts consuming from the beginning + +In this scenario, the consumer starts consuming from the creation of a table, so the consumer can receive all DDL and BOOTSTRAP messages of the table. In this case, the consumer can obtain the schema information of the table through the `table` name and `schemaVersion` field of the DML message. The detailed process is as follows: + +![TiCDC Simple Protocol consumer scene 1](/media/ticdc/ticdc-simple-consumer-1.png) + +### Scenario 2: The consumer starts consuming from the middle + +When a new consumer joins the consumer group, it might start consuming from the middle, so it might miss earlier DDL and BOOTSTRAP messages of the table. In this case, the consumer might receive some DML messages before obtaining the schema information of the table. Therefore, the consumer needs to wait for a period of time until it receives the DDL or BOOTSTRAP message to obtain the schema information of the table. Because TiCDC sends BOOTSTRAP messages periodically, the consumer can always obtain the schema information of the table within a period of time. The detailed process is as follows: + +![TiCDC Simple Protocol consumer scene 2](/media/ticdc/ticdc-simple-consumer-2.png) + +## Reference + +### TableSchema definition + +TableSchema is a JSON object that contains the schema information of the table, including the table name, table ID, table version number, column information, and index information. The JSON message format is as follows: + +``` json +{ + "schema":"simple", + "table":"user", + "tableID":148, + "version":447984074911121426, + "columns":[ + { + "name":"id", + "dataType":{ + "mysqlType":"int", + "charset":"binary", + "collate":"binary", + "length":11 + }, + "nullable":false, + "default":null + }, + { + "name":"name", + "dataType":{ + "mysqlType":"varchar", + "charset":"utf8mb4", + "collate":"utf8mb4_bin", + "length":255 + }, + "nullable":true, + "default":null + }, + { + "name":"age", + "dataType":{ + "mysqlType":"int", + "charset":"binary", + "collate":"binary", + "length":11 + }, + "nullable":true, + "default":null + }, + { + "name":"score", + "dataType":{ + "mysqlType":"float", + "charset":"binary", + "collate":"binary", + "length":12 + }, + "nullable":true, + "default":null + } + ], + "indexes":[ + { + "name":"primary", + "unique":true, + "primary":true, + "nullable":false, + "columns":[ + "id" + ] + } + ] +} +``` + +The preceding JSON data is explained as follows: + +| Field | Type | Description | +| ---------- | ------ | ------------------------------------------------------------------- | +| `schema` | String | The name of the database. | +| `table` | String | The name of the table. | +| `tableID` | Number | The ID of the table. | +| `version` | Number | The schema version number of the table. | +| `columns` | Array | The column information, including the column name, data type, whether it can be null, and the default value. | +| `indexes` | Array | The index information, including the index name, whether it is unique, whether it is a primary key, and the index columns. | + +You can uniquely identify the schema information of a table by the table name and the schema version number. + +> **Note:** +> +> Due to the implementation limitations of TiDB, the schema version number of a table does not change when the `RENAME TABLE` DDL operation is executed. + +#### Column definition + +Column is a JSON object that contains the schema information of the column, including the column name, data type, whether it can be null, and the default value. + +```json +{ + "name":"id", + "dataType":{ + "mysqlType":"int", + "charset":"binary", + "collate":"binary", + "length":11 + }, + "nullable":false, + "default":null +} +``` + +The preceding JSON data is explained as follows: + +| Field | Type | Description | +| ---------- | ------ | ------------------------------------------------------------------- | +| `name` | String | The name of the column. | +| `dataType` | Object | The data type information, including the MySQL data type, character set, collation, and field length. | +| `nullable` | Boolean | Whether the column can be null. | +| `default` | String | The default value of the column. | + +#### Index definition + +Index is a JSON object that contains the schema information of the index, including the index name, whether it is unique, whether it is a primary key, and the index column. + +```json +{ + "name":"primary", + "unique":true, + "primary":true, + "nullable":false, + "columns":[ + "id" + ] +} +``` + +The preceding JSON data is explained as follows: + +| Field | Type | Description | +| ---------- | ------ | ------------------------------------------------------------------- | +| `name` | String | The name of the index. | +| `unique` | Boolean | Whether the index is unique. | +| `primary` | Boolean | Whether the index is a primary key. | +| `nullable` | Boolean | Whether the index can be null. | +| `columns` | Array | The column names included in the index. | + +### mysqlType reference table + +The following table describes the value range of the `mysqlType` field in the TiCDC Simple protocol and its type in TiDB (Golang) and Avro (Java). When you need to parse DML messages, you can correctly parse the data according to this table and the `mysqlType` field in the DML message, depending on the protocol and language you use. + +**TiDB type (Golang)** represents the type of the corresponding `mysqlType` when it is processed in TiDB and TiCDC (Golang). **Avro type (Java)** represents the type of the corresponding `mysqlType` when it is encoded into Avro format messages. + +| mysqlType | Value range | TiDB type (Golang) | Avro type (Java) | +| --- | --- | --- | --- | +| tinyint | [-128, 127] | int64 | long | +| tinyint unsigned | [0, 255] | uint64 | long | +| smallint | [-32768, 32767] | int64 | long | +| smallint unsigned | [0, 65535] | uint64 | long | +| mediumint | [-8388608, 8388607] | int64 | long | +| mediumint unsigned | [0, 16777215] | uint64 | long | +| int | [-2147483648, 2147483647] | int64 | long | +| int unsigned | [0, 4294967295] | uint64 | long | +| bigint | [-9223372036854775808, 9223372036854775807] | int64 | long | +| bigint unsigned | [0, 9223372036854775807] | uint64 | long | +| bigint unsigned | [9223372036854775808, 18446744073709551615] | uint64 | string | +| float | / | float32 | float | +| double | / | float64 | double | +| decimal | / | string | string | +| varchar | / | []uint8 | string | +| char | / | []uint8 | string | +| varbinary | / | []uint8 | bytes | +| binary | / | []uint8 | bytes | +| tinytext | / | []uint8 | string | +| text | / | []uint8 | string | +| mediumtext | / | []uint8 | string | +| longtext | / | []uint8 | string | +| tinyblob | / | []uint8 | bytes | +| blob | / | []uint8 | bytes | +| mediumblob | / | []uint8 | bytes | +| longblob | / | []uint8 | bytes | +| date | / | string | string | +| datetime | / | string | string | +| timestamp | / | string | string | +| time | / | string | string | +| year | / | int64 | long | +| enum | / | uint64 | long | +| set | / | uint64 | long | +| bit | / | uint64 | long | +| json | / | string | string | +| bool | / | int64 | long | + +### Avro schema definition + +The Simple protocol supports outputting messages in Avro format. For details about the Avro format, see [Simple Protocol Avro Schema](https://github.com/pingcap/tiflow/blob/master/pkg/sink/codec/simple/message.json). diff --git a/ticdc/ticdc-sink-to-kafka.md b/ticdc/ticdc-sink-to-kafka.md index 0f6af2f2464d9..3fc45116afdd0 100644 --- a/ticdc/ticdc-sink-to-kafka.md +++ b/ticdc/ticdc-sink-to-kafka.md @@ -59,7 +59,7 @@ The following are descriptions of sink URI parameters and values that can be con | `replication-factor` | The number of Kafka message replicas that can be saved (optional, `1` by default). This value must be greater than or equal to the value of [`min.insync.replicas`](https://kafka.apache.org/33/documentation.html#brokerconfigs_min.insync.replicas) in Kafka. | | `required-acks` | A parameter used in the `Produce` request, which notifies the broker of the number of replica acknowledgements it needs to receive before responding. Value options are `0` (`NoResponse`: no response, only `TCP ACK` is provided), `1` (`WaitForLocal`: responds only after local commits are submitted successfully), and `-1` (`WaitForAll`: responds after all replicated replicas are committed successfully. You can configure the minimum number of replicated replicas using the [`min.insync.replicas`](https://kafka.apache.org/33/documentation.html#brokerconfigs_min.insync.replicas) configuration item of the broker). (Optional, the default value is `-1`). | | `compression` | The compression algorithm used when sending messages (value options are `none`, `lz4`, `gzip`, `snappy`, and `zstd`; `none` by default). Note that the Snappy compressed file must be in the [official Snappy format](https://github.com/google/snappy). Other variants of Snappy compression are not supported.| -| `protocol` | The protocol with which messages are output to Kafka. The value options are `canal-json`, `open-protocol`, `canal`, `avro` and `maxwell`. | +| `protocol` | The protocol with which messages are output to Kafka. The value options are `canal-json`, `open-protocol`, `avro` and `maxwell`. | | `auto-create-topic` | Determines whether TiCDC creates the topic automatically when the `topic-name` passed in does not exist in the Kafka cluster (optional, `true` by default). | | `enable-tidb-extension` | Optional. `false` by default. When the output protocol is `canal-json`, if the value is `true`, TiCDC sends [WATERMARK events](/ticdc/ticdc-canal-json.md#watermark-event) and adds the [TiDB extension field](/ticdc/ticdc-canal-json.md#tidb-extension-field) to Kafka messages. From v6.1.0, this parameter is also applicable to the `avro` protocol. If the value is `true`, TiCDC adds [three TiDB extension fields](/ticdc/ticdc-avro-protocol.md#tidb-extension-fields) to the Kafka message. | | `max-batch-size` | New in v4.0.9. If the message protocol supports outputting multiple data changes to one Kafka message, this parameter specifies the maximum number of data changes in one Kafka message. It currently takes effect only when Kafka's `protocol` is `open-protocol` (optional, `16` by default). | diff --git a/ticdc/ticdc-sink-to-pulsar.md b/ticdc/ticdc-sink-to-pulsar.md index d23e2db81dbe8..5f008e369faff 100644 --- a/ticdc/ticdc-sink-to-pulsar.md +++ b/ticdc/ticdc-sink-to-pulsar.md @@ -23,7 +23,7 @@ cdc cli changefeed create \ Create changefeed successfully! ID: simple-replication-task -Info: {"upstream_id":7277814241002263370,"namespace":"default","id":"simple-replication-task","sink_uri":"pulsar://127.0.0.1:6650/consumer-test?protocol=canal-json","create_time":"2024-01-25T14:42:32.000904+08:00","start_ts":444203257406423044,"config":{"memory_quota":1073741824,"case_sensitive":false,"force_replicate":false,"ignore_ineligible_table":false,"check_gc_safe_point":true,"enable_sync_point":false,"bdr_mode":false,"sync_point_interval":600000000000,"sync_point_retention":86400000000000,"filter":{"rules":["pulsar_test.*"]},"mounter":{"worker_num":16},"sink":{"protocol":"canal-json","csv":{"delimiter":",","quote":"\"","null":"\\N","include_commit_ts":false,"binary_encoding_method":"base64"},"dispatchers":[{"matcher":["pulsar_test.*"],"partition":"","topic":"test_{schema}_{table}"}],"encoder_concurrency":16,"terminator":"\r\n","date_separator":"day","enable_partition_separator":true,"enable_kafka_sink_v2":false,"only_output_updated_columns":false,"delete_only_output_handle_key_columns":false,"pulsar_config":{"connection-timeout":30,"operation-timeout":30,"batching-max-messages":1000,"batching-max-publish-delay":10,"send-timeout":30},"advance_timeout":150},"consistent":{"level":"none","max_log_size":64,"flush_interval":2000,"use_file_backend":false},"scheduler":{"enable_table_across_nodes":false,"region_threshold":100000,"write_key_threshold":0},"integrity":{"integrity_check_level":"none","corruption_handle_level":"warn"}},"state":"normal","creator_version":"v7.6.0","resolved_ts":444203257406423044,"checkpoint_ts":444203257406423044,"checkpoint_time":"2024-01-25 14:42:31.410"} +Info: {"upstream_id":7277814241002263370,"namespace":"default","id":"simple-replication-task","sink_uri":"pulsar://127.0.0.1:6650/consumer-test?protocol=canal-json","create_time":"2024-01-25T14:42:32.000904+08:00","start_ts":444203257406423044,"config":{"memory_quota":1073741824,"case_sensitive":false,"force_replicate":false,"ignore_ineligible_table":false,"check_gc_safe_point":true,"enable_sync_point":false,"bdr_mode":false,"sync_point_interval":600000000000,"sync_point_retention":86400000000000,"filter":{"rules":["pulsar_test.*"]},"mounter":{"worker_num":16},"sink":{"protocol":"canal-json","csv":{"delimiter":",","quote":"\"","null":"\\N","include_commit_ts":false,"binary_encoding_method":"base64"},"dispatchers":[{"matcher":["pulsar_test.*"],"partition":"","topic":"test_{schema}_{table}"}],"encoder_concurrency":16,"terminator":"\r\n","date_separator":"day","enable_partition_separator":true,"only_output_updated_columns":false,"delete_only_output_handle_key_columns":false,"pulsar_config":{"connection-timeout":30,"operation-timeout":30,"batching-max-messages":1000,"batching-max-publish-delay":10,"send-timeout":30},"advance_timeout":150},"consistent":{"level":"none","max_log_size":64,"flush_interval":2000,"use_file_backend":false},"scheduler":{"enable_table_across_nodes":false,"region_threshold":100000,"write_key_threshold":0},"integrity":{"integrity_check_level":"none","corruption_handle_level":"warn"}},"state":"normal","creator_version":"v7.6.0","resolved_ts":444203257406423044,"checkpoint_ts":444203257406423044,"checkpoint_time":"2024-01-25 14:42:31.410"} ``` The meaning of each parameter is as follows: diff --git a/ticdc/troubleshoot-ticdc.md b/ticdc/troubleshoot-ticdc.md index ccd46bd654e6a..569d29da9f794 100644 --- a/ticdc/troubleshoot-ticdc.md +++ b/ticdc/troubleshoot-ticdc.md @@ -137,3 +137,11 @@ If you want to skip this DDL statement that goes wrong, set the start-ts of the cdc cli changefeed remove --server=http://127.0.0.1:8300 --changefeed-id simple-replication-task cdc cli changefeed create --server=http://127.0.0.1:8300 --sink-uri="mysql://root:123456@127.0.0.1:3306/" --changefeed-id="simple-replication-task" --sort-engine="unified" --start-ts 415241823337054210 ``` + +## The `Kafka: client has run out of available brokers to talk to: EOF` error is reported when I use TiCDC to replicate messages to Kafka. What should I do? + +This error is typically caused by the connection failure between TiCDC and the Kafka cluster. To troubleshoot, you can check the Kafka logs and network status. One possible reason is that you did not specify the correct `kafka-version` parameter when creating the replication task, causing the Kafka client inside TiCDC to use the wrong Kafka API version when accessing the Kafka server. You can fix this issue by specifying the correct `kafka-version` parameter in the [`--sink-uri`](/ticdc/ticdc-sink-to-kafka.md#configure-sink-uri-for-kafka) configuration. For example: + +```shell +cdc cli changefeed create --server=http://127.0.0.1:8300 --sink-uri "kafka://127.0.0.1:9092/test?topic=test&protocol=open-protocol&kafka-version=2.4.0" +``` diff --git a/tidb-cloud/limited-sql-features.md b/tidb-cloud/limited-sql-features.md index 3ec5f7402f443..e5b521bd3b427 100644 --- a/tidb-cloud/limited-sql-features.md +++ b/tidb-cloud/limited-sql-features.md @@ -60,7 +60,6 @@ TiDB Cloud works with almost all workloads that TiDB supports, but there are som | `SHOW PLUGINS` | Supported | Not supported [^8] | | `SHOW PUMP STATUS` | Not supported [^7] | Not supported [^7] | | `SHUTDOWN` | Not supported [^4] | Not supported [^4] | -| `CREATE TABLE ... AUTO_ID_CACHE` | Supported | Not supported [^12] | ## Functions and operators @@ -134,7 +133,7 @@ TiDB Cloud works with almost all workloads that TiDB supports, but there are som | `max_allowed_packet` | No limitation | Read-only [^11] | | `plugin_dir` | No limitation | Not supported [^8] | | `plugin_load` | No limitation | Not supported [^8] | -| `require_secure_transport` | Not supported [^13] | Read-only [^11] | +| `require_secure_transport` | Not supported [^12] | Read-only [^11] | | `skip_name_resolve` | No limitation | Read-only [^11] | | `sql_log_bin` | No limitation | Read-only [^11] | | `tidb_cdc_write_source` | No limitation | Read-only [^11] | @@ -246,6 +245,4 @@ TiDB Cloud works with almost all workloads that TiDB supports, but there are som [^11]: The variable is read-only on TiDB Serverless. -[^12]: Customizing cache size using [`AUTO_ID_CACHE`](/auto-increment.md#cache-size-control) is temporarily unavailable on TiDB Serverless. - -[^13]: Not supported. Enabling `require_secure_transport` for TiDB Dedicated clusters will result in SQL client connection failures. +[^12]: Not supported. Enabling `require_secure_transport` for TiDB Dedicated clusters will result in SQL client connection failures. diff --git a/tidb-cloud/serverless-driver-kysely-example.md b/tidb-cloud/serverless-driver-kysely-example.md index 4278146387652..ded6989ccf6af 100644 --- a/tidb-cloud/serverless-driver-kysely-example.md +++ b/tidb-cloud/serverless-driver-kysely-example.md @@ -82,7 +82,7 @@ To complete this tutorial, you need the following: 2. Set the environment variable `DATABASE_URL` in your local environment. For example, in Linux or macOS, you can run the following command: ```bash - export DATABASE_URL=mysql://[username]:[password]@[host]/[database] + export DATABASE_URL='mysql://[username]:[password]@[host]/[database]' ``` ### Step 3. Use Kysely to query data @@ -275,7 +275,7 @@ mysql://[username]:[password]@[host]/[database] 3. Test your code locally: ``` - export DATABASE_URL=mysql://[username]:[password]@[host]/[database] + export DATABASE_URL='mysql://[username]:[password]@[host]/[database]' next dev ``` @@ -286,7 +286,7 @@ mysql://[username]:[password]@[host]/[database] 1. Deploy your code to Vercel with the `DATABASE_URL` environment variable: ``` - vercel -e DATABASE_URL=mysql://[username]:[password]@[host]/[database] --prod + vercel -e DATABASE_URL='mysql://[username]:[password]@[host]/[database]' --prod ``` After the deployment is complete, you will get the URL of your project. diff --git a/tidb-cloud/serverless-driver-prisma-example.md b/tidb-cloud/serverless-driver-prisma-example.md index c266fadf26455..49b3ffd9a7c88 100644 --- a/tidb-cloud/serverless-driver-prisma-example.md +++ b/tidb-cloud/serverless-driver-prisma-example.md @@ -68,7 +68,7 @@ To complete this tutorial, you need the following: 2. In the root directory of your project, create a file named `.env`, define an environment variable named `DATABASE_URL` as follows, and then replace the placeholders `[]` in this variable with the corresponding parameters in the connection string. ```dotenv - DATABASE_URL="mysql://[username]:[password]@[host]:4000/[database]?sslaccept=strict" + DATABASE_URL='mysql://[username]:[password]@[host]:4000/[database]?sslaccept=strict' ``` > **Note:** diff --git a/tidb-cloud/serverless-driver.md b/tidb-cloud/serverless-driver.md index 22c49566f68fe..04904e50075d9 100644 --- a/tidb-cloud/serverless-driver.md +++ b/tidb-cloud/serverless-driver.md @@ -53,7 +53,7 @@ try { await tx.execute('insert into test values (1)') await tx.execute('select * from test') await tx.commit() -}catch (err) { +} catch (err) { await tx.rollback() throw err } diff --git a/tidb-configuration-file.md b/tidb-configuration-file.md index 722c89d0c23b9..96b79531c62ac 100644 --- a/tidb-configuration-file.md +++ b/tidb-configuration-file.md @@ -319,6 +319,12 @@ Configuration items related to log. - Default value: `10000` - When the number of query rows (including the intermediate results based on statistics) is larger than this value, it is an `expensive` operation and outputs log with the `[EXPENSIVE_QUERY]` prefix. +### `general-log-file` New in v8.0.0 + ++ The filename of the [general log](/system-variables.md#tidb_general_log). ++ Default value: `""` ++ If you specify a filename, the general log is written to this specified file. If the value is blank, the general log is written to the server log of the TiDB instance. You can specify the name of the server log using [`filename`](#filename). + ### `timeout` New in v7.1.0 - Sets the timeout for log-writing operations in TiDB. In case of a disk failure that prevents logs from being written, this configuration item can trigger the TiDB process to panic instead of hang. @@ -355,6 +361,13 @@ Configuration items related to log files. - Default value: `0` - All the log files are retained by default. If you set it to `7`, seven log files are retained at maximum. +#### `compression` New in v8.0.0 + ++ The compression method for the log. ++ Default value: `""` ++ Value options: `""`, `"gzip"` ++ The default value is `""`, which means no compression. To enable the gzip compression, set this value to `"gzip"`. After compression is enabled, all log files are affected, such as [`slow-query-file`](#slow-query-file) and [`general-log-file`](#general-log-file-new-in-v800). + ## Security Configuration items related to security. @@ -413,26 +426,22 @@ Configuration items related to security. ### `tls-version` +> **Warning:** +> +> `"TLSv1.0"` and `"TLSv1.1"` protocols are deprecated in TiDB v7.6.0, and will be removed in v8.0.0. + - Set the minimum TLS version for MySQL Protocol connections. -- Default value: "", which allows TLSv1.2 or higher. Before TiDB v7.6.0, the default value allows TLSv1.1 or higher. -- Optional values: `"TLSv1.0"`, `"TLSv1.1"`, `"TLSv1.2"` and `"TLSv1.3"` +- Default value: "", which allows TLSv1.2 or later versions. Before TiDB v7.6.0, the default value allows TLSv1.1 or later versions. +- Optional values: `"TLSv1.2"` and `"TLSv1.3"`. Before TiDB v8.0.0, `"TLSv1.0"` and `"TLSv1.1"` are also allowed. ### `auth-token-jwks` New in v6.4.0 -> **Warning:** -> -> The `tidb_auth_token` authentication method is used only for the internal operation of TiDB Cloud. **DO NOT** change the value of this configuration. - -- Set the local file path of the JSON Web Key Sets (JWKS) for the `tidb_auth_token` authentication method. +- Set the local file path of the JSON Web Key Sets (JWKS) for the [`tidb_auth_token`](/security-compatibility-with-mysql.md#tidb_auth_token) authentication method. - Default value: `""` ### `auth-token-refresh-interval` New in v6.4.0 -> **Warning:** -> -> The `tidb_auth_token` authentication method is used only for the internal operation of TiDB Cloud. **DO NOT** change the value of this configuration. - -- Set the JWKS refresh interval for the `tidb_auth_token` authentication method. +- Set the JWKS refresh interval for the [`tidb_auth_token`](/security-compatibility-with-mysql.md#tidb_auth_token) authentication method. - Default value: `1h` ### `disconnect-on-expired-password` New in v6.5.0 @@ -479,6 +488,7 @@ Configuration items related to performance. - Default value: `3600000` - Unit: Millisecond - The transaction that holds locks longer than this time can only be committed or rolled back. The commit might not be successful. +- For transactions executed using the [`"bulk"` DML mode](/system-variables.md#tidb_dml_type-new-in-v800), the maximum TTL can exceed the limit of this configuration item. The maximum value is the greater value between this configuration item and 24 hours. ### `stmt-count-limit` @@ -724,12 +734,21 @@ Configuration items related to opentracing.reporter. > **Warning:** > -> This configuration might be deprecated in future versions. **DO NOT** change the value of this configuration. +> This configuration parameter might be deprecated in future versions. **DO NOT** change the value of it. + The timeout of a single Coprocessor request. + Default value: `60` + Unit: second +### `enable-replica-selector-v2` New in v8.0.0 + +> **Warning:** +> +> This configuration parameter might be deprecated in future versions. **DO NOT** change the value of it. + ++ Whether to use the new version of the Region replica selector when sending RPC requests to TiKV. ++ Default value: `true` + ## tikv-client.copr-cache New in v4.0.0 This section introduces configuration items related to the Coprocessor Cache feature. @@ -832,6 +851,7 @@ For pessimistic transaction usage, refer to [TiDB Pessimistic Transaction Mode]( + Determines the transaction mode that the auto-commit transaction uses when the pessimistic transaction mode is globally enabled (`tidb_txn_mode='pessimistic'`). By default, even if the pessimistic transaction mode is globally enabled, the auto-commit transaction still uses the optimistic transaction mode. After enabling `pessimistic-auto-commit` (set to `true`), the auto-commit transaction also uses pessimistic mode, which is consistent with the other explicitly committed pessimistic transactions. + For scenarios with conflicts, after enabling this configuration, TiDB includes auto-commit transactions into the global lock-waiting management, which avoids deadlocks and mitigates the latency spike brought by deadlock-causing conflicts. + For scenarios with no conflicts, if there are many auto-commit transactions (the specific number is determined by the real scenarios. For example, the number of auto-commit transactions accounts for more than half of the total number of applications), and a single transaction operates a large data volume, enabling this configuration causes performance regression. For example, the auto-commit `INSERT INTO SELECT` statement. ++ When the session-level system variable [`tidb_dml_type`](/system-variables.md#tidb_dml_type-new-in-v800) is set to `"bulk"`, the effect of this configuration in the session is equivalent to setting it to `false`. + Default value: `false` ### constraint-check-in-place-pessimistic New in v6.4.0 @@ -853,7 +873,7 @@ Configuration items related to read isolation. ### `tidb_enable_collect_execution_info` -- This configuration controls whether to record the execution information of each operator in the slow query log. +- This configuration controls whether to record the execution information of each operator in the slow query log and whether to record the [usage statistics of indexes](/information-schema/information-schema-tidb-index-usage.md). - Default value: `true` - Before v6.1.0, this configuration is set by `enable-collect-execution-info`. diff --git a/tidb-distributed-execution-framework.md b/tidb-distributed-execution-framework.md index 5b50c5d46d868..70b68989e923a 100644 --- a/tidb-distributed-execution-framework.md +++ b/tidb-distributed-execution-framework.md @@ -40,7 +40,11 @@ Currently, the DXF supports the distributed execution of the `ADD INDEX` and `IM ## Limitation -- The DXF can only schedule the distributed execution of one `ADD INDEX` task at a time. If a new `ADD INDEX` task is submitted before the current `ADD INDEX` distributed task has finished, the new task is executed through a transaction. +The DXF can only schedule up to 16 tasks (including `ADD INDEX` tasks and `IMPORT INTO` tasks) simultaneously. + +## `ADD INDEX` limitation + +- For each cluster, only one `ADD INDEX` task is allowed for distributed execution at a time. If a new `ADD INDEX` task is submitted before the current `ADD INDEX` distributed task has finished, the new `ADD INDEX` task is executed through a transaction instead of being scheduled by DXF. - Adding indexes on columns with the `TIMESTAMP` data type through the DXF is not supported, because it might lead to inconsistency between the index and the data. ## Prerequisites diff --git a/tidb-global-sort.md b/tidb-global-sort.md index abddbf4cdef6b..3094178c46877 100644 --- a/tidb-global-sort.md +++ b/tidb-global-sort.md @@ -8,9 +8,10 @@ summary: Learn the use cases, limitations, usage, and implementation principles # TiDB Global Sort -> **Warning:** +> **Note:** > -> This feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. +> - Currently, the Global Sort process consumes a large amount of computing and memory resources of TiDB nodes. In scenarios such as adding indexes online while user business applications are running, it is recommended to add new TiDB nodes to the cluster and set the [`tidb_service_scope`](/system-variables.md#tidb_service_scope-new-in-v740) variable of these nodes to `"background"`. In this way, the distributed framework schedules tasks to these nodes, isolating the workload from other TiDB nodes to reduce the impact of executing backend tasks such as `ADD INDEX` and `IMPORT INTO` on user business applications. +> - When the Global Sort feature is used, it is recommended to use TiDB nodes with at least 16 cores of CPU and 32 GiB of memory to avoid OOM. > **Note:** > @@ -20,7 +21,7 @@ summary: Learn the use cases, limitations, usage, and implementation principles The TiDB Global Sort feature enhances the stability and efficiency of data import and DDL (Data Definition Language) operations. It serves as a general operator in the [TiDB Distributed eXecution Framework (DXF)](/tidb-distributed-execution-framework.md), providing a global sort service on cloud. -The Global Sort feature currently only supports using Amazon S3 as cloud storage. In future releases, it will be extended to support multiple shared storage interfaces, such as POSIX, enabling seamless integration with different storage systems. This flexibility enables efficient and adaptable data sorting for various use cases. +Currently, the Global Sort feature supports using Amazon S3 as cloud storage. ## Use cases diff --git a/tidb-lightning/tidb-lightning-configuration.md b/tidb-lightning/tidb-lightning-configuration.md index 5504cd62718f7..e173df1cc0f0e 100644 --- a/tidb-lightning/tidb-lightning-configuration.md +++ b/tidb-lightning/tidb-lightning-configuration.md @@ -125,16 +125,19 @@ driver = "file" # keep-after-success = false [conflict] -# Starting from v7.3.0, a new version of strategy is introduced to handle conflicting data. The default value is "". -# - "": TiDB Lightning does not detect or handle conflicting data. If the source file contains conflicting primary or unique key records, the subsequent step reports an error. +# Starting from v7.3.0, a new version of strategy is introduced to handle conflicting data. The default value is "". Starting from v8.0.0, TiDB Lightning optimizes the conflict strategy for both physical and logical import modes (experimental). +# - "": in the physical import mode, TiDB Lightning does not detect or handle conflicting data. If the source file contains conflicting primary or unique key records, the subsequent step reports an error. In the logical import mode, TiDB Lightning converts the "" strategy to the "error" strategy for processing. # - "error": when detecting conflicting primary or unique key records in the imported data, TiDB Lightning terminates the import and reports an error. -# - "replace": when encountering conflicting primary or unique key records, TiDB Lightning retains the new data and overwrites the old data. -# - "ignore": when encountering conflicting primary or unique key records, TiDB Lightning retains the old data and ignores the new data. -# The new version strategy cannot be used together with tikv-importer.duplicate-resolution (the old version of conflict detection). +# - "replace": when encountering conflicting primary or unique key records, TiDB Lightning retains the latest data and overwrites the old data. +# The conflicting data are recorded in the `lightning_task_info.conflict_error_v2` table (recording conflicting data detected by post-import conflict detection in the physical import mode) and the `conflict_records` table (recording conflicting data detected by preprocess conflict detection in both logical and physical import modes) of the target TiDB cluster. +# You can manually insert the correct records into the target table based on your application requirements. Note that the target TiKV must be v5.2.0 or later versions. +# - "ignore": when encountering conflicting primary or unique key records, TiDB Lightning retains the old data and ignores the new data. This option can only be used in the logical import mode. strategy = "" -# Controls the upper limit of the conflicting data that can be handled when strategy is "replace" or "ignore". You can set it only when strategy is "replace" or "ignore". The default value is 9223372036854775807, which means that almost all errors are tolerant. +# Controls whether to enable preprocess conflict detection, which checks conflicts in data before importing it to TiDB. In scenarios where the ratio of conflict records is greater than or equal to 1%, it is recommended to enable preprocess conflict detection for better performance in conflict detection. In other scenarios, it is recommended to disable it. The default value is false, indicating that TiDB Lightning only checks conflicts after the import. If you set it to true, TiDB Lightning checks conflicts both before and after the import. This parameter is experimental, and it can be used only in the physical import mode. +# precheck-conflict-before-import = false +# Controls the maximum number of conflict errors that can be handled when the strategy is "replace" or "ignore". You can set it only when the strategy is "replace" or "ignore". The default value is 9223372036854775807, which means that almost all errors are tolerant. # threshold = 9223372036854775807 -# Controls the maximum number of records in the conflict_records table. The default value is 100. If the strategy is "ignore", the conflict records that are ignored are recorded; if the strategy is "replace", the conflict records that are overwritten are recorded. However, the "replace" strategy cannot record the conflict records in the logical import mode. +# Controls the maximum number of records in the `conflict_records` table. The default value is 100. In the physical import mode, if the strategy is "replace", the conflict records that are overwritten are recorded. In the logical import mode, if the strategy is "ignore", the conflict records that are ignored are recorded; if the strategy is "replace", the conflict records are not recorded. # max-record-rows = 100 [tikv-importer] @@ -150,6 +153,7 @@ strategy = "" # Note that this parameter is only used in scenarios where the target table is empty. # parallel-import = false +# Starting from v8.0.0, the `duplicate-resolution` parameter is deprecated. For more information, see . # Whether to detect and resolve duplicate records (unique key conflict) in the physical import mode. # The following resolution algorithms are supported: # - none: does not detect duplicate records, which has the best performance of the two algorithms. @@ -238,6 +242,21 @@ strategy = "" # This parameter is introduced in v7.6.0. The default value is "16KiB". The value must be greater than or equal to `1B`. Note that if you only specify a number (for example, `16`), the unit is Byte instead of KiB. # block-size = "16KiB" +# In Logical Import Mode, this parameter controls the size of each SQL statement executed on the downstream TiDB server. +# This parameter is introduced in v8.0.0. +# It specifies the expected size of the VALUES part of each INSERT or REPLACE statement in a single transaction. +# This parameter is not a hard limit. The actual SQL executed might be longer or shorter, depending on the actual content imported. +# The default value is "96KiB", which is optimized for import speed when TiDB Lightning is the only client of the cluster. +# Due to the implementation details of TiDB Lightning, the value is capped at 96 KiB. Setting a larger value will not take effect. +# You can decrease this value to reduce the stress on the cluster due to large transactions. +# logical-import-batch-size = "96KiB" + +# In Logical Import Mode, this parameter controls the maximum number of rows inserted per transaction. +# This parameter is introduced in v8.0.0. The default value is `65536` rows. +# When both `logical-import-batch-size` and `logical-import-batch-rows` are specified, the parameter whose value reaches its threshold first will take effect. +# You can decrease this value to reduce the stress on the cluster due to large transactions. +# logical-import-batch-rows = 65536 + [mydumper] # Block size for file reading. Keep it longer than the longest string of the data source. read-block-size = "64KiB" # default value diff --git a/tidb-lightning/tidb-lightning-error-resolution.md b/tidb-lightning/tidb-lightning-error-resolution.md index 0c27bd53f6dc9..56126e88328a8 100644 --- a/tidb-lightning/tidb-lightning-error-resolution.md +++ b/tidb-lightning/tidb-lightning-error-resolution.md @@ -11,7 +11,7 @@ This document introduces TiDB Lightning error types, how to query the errors, an - `lightning.max-error`: the tolerance threshold of type error - `conflict.strategy`, `conflict.threshold`, and `conflict.max-record-rows`: configurations related to conflicting data -- `tikv-importer.duplicate-resolution`: the conflict handling configuration that can only be used in the physical import mode +- `tikv-importer.duplicate-resolution` (deprecated in v8.0.0): the conflict handling configuration that can only be used in the physical import mode - `lightning.task-info-schema-name`: the database where conflicting data is stored when TiDB Lightning detects conflicts For more information, see [TiDB Lightning (Task)](/tidb-lightning/tidb-lightning-configuration.md#tidb-lightning-task). @@ -119,9 +119,9 @@ CREATE TABLE conflict_records ( `type_error_v1` records all [type errors](#type-error) managed by `lightning.max-error`. Each error corresponds to one row. -`conflict_error_v1` records all unique and primary key conflicts managed by `tikv-importer.duplicate-resolution` in the physical import mode. Each pair of conflicts corresponds to two rows. +`conflict_error_v2` records conflicts managed by the `conflict` configuration group in the physical import mode. Each pair of conflicts corresponds to two rows. -`conflict_records` records all unique and primary key conflicts managed by the `conflict` configuration group in logical import mode and physical import mode. Each error corresponds to one row. +`conflict_records` records conflicts managed by the `conflict` configuration group in both the logical import mode and physical import mode. Each error corresponds to one row. | Column | Syntax | Type | Conflict | Description | | ------------ | ------ | ---- | -------- | ----------------------------------------------------------------------------------------------------------------------------------- | diff --git a/tidb-lightning/tidb-lightning-logical-import-mode-usage.md b/tidb-lightning/tidb-lightning-logical-import-mode-usage.md index ee2ac84758707..d7f53d5abb6d2 100644 --- a/tidb-lightning/tidb-lightning-logical-import-mode-usage.md +++ b/tidb-lightning/tidb-lightning-logical-import-mode-usage.md @@ -53,8 +53,8 @@ Conflicting data refers to two or more records with the same data in the PK or U | :-- | :-- | :-- | | `"replace"` | Replacing existing data with new data. | `REPLACE INTO ...` | | `"ignore"` | Keeping existing data and ignoring new data. | `INSERT IGNORE INTO ...` | -| `"error"` | Pausing the import and reporting an error. | `INSERT INTO ...` | -| `""` | TiDB Lightning does not detect or handle conflicting data. If data with primary and unique key conflicts exists, the subsequent step reports an error. | None | +| `"error"` | Terminating the import when conflicting data is detected. | `INSERT INTO ...` | +| `""` | Converted to `"error"`, which means terminating the import when conflicting data is detected. | None | When the strategy is `"error"`, errors caused by conflicting data directly terminates the import task. When the strategy is `"replace"` or `"ignore"`, you can control the maximum tolerant conflicts by configuring [`conflict.threshold`](/tidb-lightning/tidb-lightning-configuration.md#tidb-lightning-task). The default value is `9223372036854775807`, which means that almost all errors are tolerant. diff --git a/tidb-lightning/tidb-lightning-physical-import-mode-usage.md b/tidb-lightning/tidb-lightning-physical-import-mode-usage.md index 2b4fef7fb6b59..55c59c0ae2bc9 100644 --- a/tidb-lightning/tidb-lightning-physical-import-mode-usage.md +++ b/tidb-lightning/tidb-lightning-physical-import-mode-usage.md @@ -30,13 +30,15 @@ check-requirements = true data-source-dir = "/data/my_database" [conflict] -# Starting from v7.3.0, a new version of strategy is introduced to handle conflicting data. The default value is "". +# Starting from v7.3.0, a new version of strategy is introduced to handle conflicting data. The default value is "". Starting from v8.0.0, TiDB Lightning optimizes the conflict strategy for both physical and logical import modes (experimental). # - "": TiDB Lightning does not detect or handle conflicting data. If the source file contains conflicting primary or unique key records, the subsequent step reports an error. # - "error": when detecting conflicting primary or unique key records in the imported data, TiDB Lightning terminates the import and reports an error. -# - "replace": when encountering conflicting primary or unique key records, TiDB Lightning retains the new data and overwrites the old data. -# - "ignore": when encountering conflicting primary or unique key records, TiDB Lightning retains the old data and ignores the new data. -# The new version strategy cannot be used together with tikv-importer.duplicate-resolution (the old version of conflict detection). +# - "replace": when encountering conflicting primary or unique key records, TiDB Lightning retains the latest data and overwrites the old data. +# The conflicting data are recorded in the `lightning_task_info.conflict_error_v2` table (recording conflicting data detected by post-import conflict detection) and the `conflict_records` table (recording conflicting data detected by preprocess conflict detection) of the target TiDB cluster. +# You can manually insert the correct records into the target table based on your application requirements. Note that the target TiKV must be v5.2.0 or later versions. strategy = "" +# Controls whether to enable preprocess conflict detection, which checks conflicts in data before importing it to TiDB. In scenarios where the ratio of conflict records is greater than or equal to 1%, it is recommended to enable preprocess conflict detection for better performance in conflict detection. In other scenarios, it is recommended to disable it. The default value is false, indicating that TiDB Lightning only checks conflicts after the import. If you set it to true, TiDB Lightning checks conflicts both before and after the import. This parameter is experimental. +# precheck-conflict-before-import = false # threshold = 9223372036854775807 # max-record-rows = 100 @@ -44,6 +46,7 @@ strategy = "" # Import mode. "local" means using the physical import mode. backend = "local" +# Starting from v8.0.0, the `duplicate-resolution` parameter is deprecated. For more information, see . # The method to resolve the conflicting data. duplicate-resolution = 'remove' @@ -101,36 +104,37 @@ Conflicting data refers to two or more records with the same primary key or uniq There are two versions for conflict detection: - The new version of conflict detection, controlled by the `conflict` configuration item. -- The old version of conflict detection, controlled by the `tikv-importer.duplicate-resolution` configuration item. +- The old version of conflict detection (deprecated in v8.0.0), controlled by the `tikv-importer.duplicate-resolution` configuration item. ### The new version of conflict detection -The meaning of configuration values are as follows: +The meanings of configuration values are as follows: | Strategy | Default behavior of conflicting data | The corresponding SQL statement | | :-- | :-- | :-- | -| `"replace"` | Replacing existing data with new data. | `REPLACE INTO ...` | -| `"ignore"` | Keeping existing data and ignoring new data. | `INSERT IGNORE INTO ...` | -| `"error"` | Pausing the import and reporting an error. | `INSERT INTO ...` | -| `""` | TiDB Lightning does not detect or handle conflicting data. If data with primary and unique key conflicts exists, the subsequent step reports an error. | None | +| `"replace"` | Retaining the latest data and overwriting the old data | `REPLACE INTO ...` | +| `"error"` | Terminating the import and reporting an error. | `INSERT INTO ...` | +| `""` | TiDB Lightning does not detect or handle conflicting data. If data with primary and unique key conflicts exists, the subsequent checksum step reports an error. | None | > **Note:** > > The conflict detection result in the physical import mode might differ from SQL-based import due to internal implementation and limitation of TiDB Lightning. -When the strategy is `"replace"` or `"ignore"`, conflicting data is treated as [conflict errors](/tidb-lightning/tidb-lightning-error-resolution.md#conflict-errors). If the [`conflict.threshold`](/tidb-lightning/tidb-lightning-configuration.md#tidb-lightning-task) value is greater than `0`, TiDB Lightning tolerates the specified number of conflict errors. The default value is `9223372036854775807`, which means that almost all errors are tolerant. For more information, see [error resolution](/tidb-lightning/tidb-lightning-error-resolution.md). +When the strategy is `"error"` and conflicting data is detected, TiDB Lightning reports an error and exits the import. When the strategy is `"replace"`, conflicting data is treated as [conflict errors](/tidb-lightning/tidb-lightning-error-resolution.md#conflict-errors). If the [`conflict.threshold`](/tidb-lightning/tidb-lightning-configuration.md#tidb-lightning-task) value is greater than `0`, TiDB Lightning tolerates the specified number of conflict errors. The default value is `9223372036854775807`, which means that almost all errors are tolerant. For more information, see [error resolution](/tidb-lightning/tidb-lightning-error-resolution.md). The new version of conflict detection has the following limitations: - Before importing, TiDB Lightning prechecks potential conflicting data by reading all data and encoding it. During the detection process, TiDB Lightning uses `tikv-importer.sorted-kv-dir` to store temporary files. After the detection is complete, TiDB Lightning retains the results for import phase. This introduces additional overhead for time consumption, disk space usage, and API requests to read the data. - The new version of conflict detection only works in a single node, and does not apply to parallel imports and scenarios where the `disk-quota` parameter is enabled. -- The new version (`conflict`) and old version (`tikv-importer.duplicate-resolution`) conflict detection cannot be used at the same time. The new version of conflict detection is enabled when the configuration [`conflict.strategy`](/tidb-lightning/tidb-lightning-configuration.md#tidb-lightning-task) is set. -Compared with the old version of conflict detection, the new version takes less time when the imported data contains a large amount of conflicting data. It is recommended that you use the new version of conflict detection in non-parallel import tasks when the data contains conflicting data and there is sufficient local disk space. +The new version of conflict detection controls whether to enable preprocess conflict detection via the `precheck-conflict-before-import` parameter. In cases where the original data contains a lot of conflicting data, the total time consumed by conflict detection before and after the import is less than that of the old version. Therefore, it is recommended to enable preprocess conflict detection in scenarios where the ratio of conflict records is greater than or equal to 1% and the local disk space is sufficient. -### The old version of conflict detection +### The old version of conflict detection (deprecated in v8.0.0) -The old version of conflict detection is enabled when `tikv-importer.duplicate-resolution` is not an empty string. In v7.2.0 and earlier versions, TiDB Lightning only supports this conflict detection method. +Starting from v8.0.0, the old version of conflict detection (`tikv-importer.duplicate-resolution`) is deprecated. If `tikv-importer.duplicate-resolution` is `remove` and `conflict.strategy` is not configured, TiDB Lightning automatically enables the new version of conflict detection by assigning the value of `conflict.strategy` to `"replace"`. Note that `tikv-importer.duplicate-resolution` and `conflict.strategy` cannot be configured at the same time, as it will result in an error. + +- For versions between v7.3.0 and v7.6.0, TiDB Lightning enables the old version of conflict detection when `tikv-importer.duplicate-resolution` is not an empty string. +- For v7.2.0 and earlier versions, TiDB Lightning only supports the old version of conflict detection. In the old version of conflict detection, TiDB Lightning offers two strategies: diff --git a/tidb-resource-control.md b/tidb-resource-control.md index 18396aea2a034..63c8dc9cae67d 100644 --- a/tidb-resource-control.md +++ b/tidb-resource-control.md @@ -87,7 +87,7 @@ Request Unit (RU) is a unified abstraction unit in TiDB for system resources, wh 1 KiB write request payload consumes 1 RU - SQL CPU + CPU 3 ms consumes 1 RU diff --git a/tidb-troubleshooting-map.md b/tidb-troubleshooting-map.md index ee614d974b144..01eff0670ca58 100644 --- a/tidb-troubleshooting-map.md +++ b/tidb-troubleshooting-map.md @@ -472,7 +472,7 @@ Check the specific cause for busy by viewing the monitor **Grafana** -> **TiKV** ### 6.2 Data Migration -- 6.2.1 TiDB Data Migration (DM) is a migration tool that supports data migration from MySQL/MariaDB into TiDB. For details, see [DM on GitHub](https://github.com/pingcap/dm/). +- 6.2.1 TiDB Data Migration (DM) is a migration tool that supports data migration from MySQL/MariaDB into TiDB. For details, see [DM overview](/dm/dm-overview.md). - 6.2.2 `Access denied for user 'root'@'172.31.43.27' (using password: YES)` shows when you run `query status` or check the log. diff --git a/tiflash/tiflash-disaggregated-and-s3.md b/tiflash/tiflash-disaggregated-and-s3.md index 556dfd806a525..856d64dbe2a6e 100644 --- a/tiflash/tiflash-disaggregated-and-s3.md +++ b/tiflash/tiflash-disaggregated-and-s3.md @@ -64,7 +64,7 @@ TiFlash disaggregated storage and compute architecture is suitable for cost-effe ``` ```shell - tiup cluster scale-in mycuster -N 'node0,node1...' # Remove all TiFlash nodes + tiup cluster scale-in mycluster -N 'node0,node1...' # Remove all TiFlash nodes tiup cluster display mycluster # Wait for all TiFlash nodes to enter the Tombstone state tiup cluster prune mycluster # Remove all TiFlash nodes in the Tombstone state ``` diff --git a/tiflash/tiflash-supported-pushdown-calculations.md b/tiflash/tiflash-supported-pushdown-calculations.md index e24b6967e2b2b..f35b41478c95b 100644 --- a/tiflash/tiflash-supported-pushdown-calculations.md +++ b/tiflash/tiflash-supported-pushdown-calculations.md @@ -35,14 +35,14 @@ TiFlash supports the following push-down expressions: | Expression Type | Operations | | :-------------- | :------------------------------------- | -| [Numeric functions and operators](/functions-and-operators/numeric-functions-and-operators.md) | `+`, `-`, `/`, `*`, `%`, `>=`, `<=`, `=`, `!=`, `<`, `>`, `ROUND()`, `ABS()`, `FLOOR(int)`, `CEIL(int)`, `CEILING(int)`, `SQRT()`, `LOG()`, `LOG2()`, `LOG10()`, `LN()`, `EXP()`, `POW()`, `SIGN()`, `RADIANS()`, `DEGREES()`, `CONV()`, `CRC32()`, `GREATEST(int/real)`, `LEAST(int/real)` | +| [Numeric functions and operators](/functions-and-operators/numeric-functions-and-operators.md) | `+`, `-`, `/`, `*`, `%`, `>=`, `<=`, `=`, `!=`, `<`, `>`, `ROUND()`, `ABS()`, `FLOOR(int)`, `CEIL(int)`, `CEILING(int)`, `SQRT()`, `LOG()`, `LOG2()`, `LOG10()`, `LN()`, `EXP()`, `POW()`, `POWER()`, `SIGN()`, `RADIANS()`, `DEGREES()`, `CONV()`, `CRC32()`, `GREATEST(int/real)`, `LEAST(int/real)` | | [Logical functions](/functions-and-operators/control-flow-functions.md) and [operators](/functions-and-operators/operators.md) | `AND`, `OR`, `NOT`, `CASE WHEN`, `IF()`, `IFNULL()`, `ISNULL()`, `IN`, `LIKE`, `ILIKE`, `COALESCE`, `IS` | | [Bitwise operations](/functions-and-operators/bit-functions-and-operators.md) | `&` (bitand), \| (bitor), `~` (bitneg), `^` (bitxor) | | [String functions](/functions-and-operators/string-functions.md) | `SUBSTR()`, `CHAR_LENGTH()`, `REPLACE()`, `CONCAT()`, `CONCAT_WS()`, `LEFT()`, `RIGHT()`, `ASCII()`, `LENGTH()`, `TRIM()`, `LTRIM()`, `RTRIM()`, `POSITION()`, `FORMAT()`, `LOWER()`, `UCASE()`, `UPPER()`, `SUBSTRING_INDEX()`, `LPAD()`, `RPAD()`, `STRCMP()` | -| [Regular expression functions and operators](/functions-and-operators/string-functions.md) | `REGEXP`, `REGEXP_LIKE()`, `REGEXP_INSTR()`, `REGEXP_SUBSTR()`, `REGEXP_REPLACE()` | +| [Regular expression functions and operators](/functions-and-operators/string-functions.md) | `REGEXP`, `REGEXP_LIKE()`, `REGEXP_INSTR()`, `REGEXP_SUBSTR()`, `REGEXP_REPLACE()`, `RLIKE` | | [Date functions](/functions-and-operators/date-and-time-functions.md) | `DATE_FORMAT()`, `TIMESTAMPDIFF()`, `FROM_UNIXTIME()`, `UNIX_TIMESTAMP(int)`, `UNIX_TIMESTAMP(decimal)`, `STR_TO_DATE(date)`, `STR_TO_DATE(datetime)`, `DATEDIFF()`, `YEAR()`, `MONTH()`, `DAY()`, `EXTRACT(datetime)`, `DATE()`, `HOUR()`, `MICROSECOND()`, `MINUTE()`, `SECOND()`, `SYSDATE()`, `DATE_ADD/ADDDATE(datetime, int)`, `DATE_ADD/ADDDATE(string, int/real)`, `DATE_SUB/SUBDATE(datetime, int)`, `DATE_SUB/SUBDATE(string, int/real)`, `QUARTER()`, `DAYNAME()`, `DAYOFMONTH()`, `DAYOFWEEK()`, `DAYOFYEAR()`, `LAST_DAY()`, `MONTHNAME()`, `TO_SECONDS()`, `TO_DAYS()`, `FROM_DAYS()`, `WEEKOFYEAR()` | | [JSON function](/functions-and-operators/json-functions.md) | `JSON_LENGTH()`, `->`, `->>`, `JSON_EXTRACT()`, `JSON_ARRAY()`, `JSON_DEPTH()`, `JSON_VALID()`, `JSON_KEYS()`, `JSON_CONTAINS_PATH()`, `JSON_UNQUOTE()` | -| [Conversion functions](/functions-and-operators/cast-functions-and-operators.md) | `CAST(int AS DOUBLE), CAST(int AS DECIMAL)`, `CAST(int AS STRING)`, `CAST(int AS TIME)`, `CAST(double AS INT)`, `CAST(double AS DECIMAL)`, `CAST(double AS STRING)`, `CAST(double AS TIME)`, `CAST(string AS INT)`, `CAST(string AS DOUBLE), CAST(string AS DECIMAL)`, `CAST(string AS TIME)`, `CAST(decimal AS INT)`, `CAST(decimal AS STRING)`, `CAST(decimal AS TIME)`, `CAST(time AS INT)`, `CAST(time AS DECIMAL)`, `CAST(time AS STRING)`, `CAST(time AS REAL)`, `CAST(json AS JSON)`, `CAST(json AS STRING)`, `CAST(int AS JSON)`, `CAST(real AS JSON)`, `CAST(decimal AS JSON)`, `CAST(string AS JSON)`, `CAST(time AS JSON)`, `CAST(duration AS JSON)` | +| [Conversion functions](/functions-and-operators/cast-functions-and-operators.md) | `CAST(int AS DOUBLE), CAST(int AS DECIMAL)`, `CAST(int AS STRING)`, `CAST(int AS TIME)`, `CAST(double AS INT)`, `CAST(double AS DECIMAL)`, `CAST(double AS STRING)`, `CAST(double AS TIME)`, `CAST(string AS INT)`, `CAST(string AS DOUBLE), CAST(string AS DECIMAL)`, `CAST(string AS TIME)`, `CAST(decimal AS INT)`, `CAST(decimal AS STRING)`, `CAST(decimal AS TIME)`, `CAST(decimal AS DOUBLE)`, `CAST(time AS INT)`, `CAST(time AS DECIMAL)`, `CAST(time AS STRING)`, `CAST(time AS REAL)`, `CAST(json AS JSON)`, `CAST(json AS STRING)`, `CAST(int AS JSON)`, `CAST(real AS JSON)`, `CAST(decimal AS JSON)`, `CAST(string AS JSON)`, `CAST(time AS JSON)`, `CAST(duration AS JSON)` | | [Aggregate functions](/functions-and-operators/aggregate-group-by-functions.md) | `MIN()`, `MAX()`, `SUM()`, `COUNT()`, `AVG()`, `APPROX_COUNT_DISTINCT()`, `GROUP_CONCAT()` | | [Miscellaneous functions](/functions-and-operators/miscellaneous-functions.md) | `INET_NTOA()`, `INET_ATON()`, `INET6_NTOA()`, `INET6_ATON()` | diff --git a/tikv-configuration-file.md b/tikv-configuration-file.md index 5477cd5dfcccc..4ddca277f7023 100644 --- a/tikv-configuration-file.md +++ b/tikv-configuration-file.md @@ -177,9 +177,9 @@ This document only describes the parameters that are not included in command-lin ### `grpc-stream-initial-window-size` + The window size of the gRPC stream -+ Default value: `2MB` -+ Unit: KB|MB|GB -+ Minimum value: `"1KB"` ++ Default value: `2MiB` ++ Unit: KiB|MiB|GiB ++ Minimum value: `"1KiB"` ### `grpc-keepalive-time` @@ -220,9 +220,9 @@ This document only describes the parameters that are not included in command-lin ### `snap-io-max-bytes-per-sec` + The maximum allowable disk bandwidth when processing snapshots -+ Default value: `"100MB"` -+ Unit: KB|MB|GB -+ Minimum value: `"1KB"` ++ Default value: `"100MiB"` ++ Unit: KiB|MiB|GiB ++ Minimum value: `"1KiB"` ### `enable-request-batch` @@ -283,9 +283,9 @@ Configuration items related to the single thread pool serving read requests. Thi + The stack size of the threads in the unified thread pool + Type: Integer + Unit -+ Default value: `"10MB"` -+ Unit: KB|MB|GB -+ Minimum value: `"2MB"` ++ Default value: `"10MiB"` ++ Unit: KiB|MiB|GiB ++ Minimum value: `"2MiB"` + Maximum value: The number of Kbytes output in the result of the `ulimit -sH` command executed in the system. ### `max-tasks-per-worker` @@ -348,9 +348,9 @@ Configuration items related to storage thread pool. + The stack size of threads in the Storage read thread pool + Type: Integer + Unit -+ Default value: `"10MB"` -+ Unit: KB|MB|GB -+ Minimum value: `"2MB"` ++ Default value: `"10MiB"` ++ Unit: KiB|MiB|GiB ++ Minimum value: `"2MiB"` + Maximum value: The number of Kbytes output in the result of the `ulimit -sH` command executed in the system. ## `readpool.coprocessor` @@ -402,9 +402,9 @@ Configuration items related to the Coprocessor thread pool. + The stack size of the thread in the Coprocessor thread pool + Type: Integer + Unit -+ Default value: `"10MB"` -+ Unit: KB|MB|GB -+ Minimum value: `"2MB"` ++ Default value: `"10MiB"` ++ Unit: KiB|MiB|GiB ++ Minimum value: `"2MiB"` + Maximum value: The number of Kbytes output in the result of the `ulimit -sH` command executed in the system. ## storage @@ -444,8 +444,8 @@ Configuration items related to storage. ### `scheduler-pending-write-threshold` + The maximum size of the write queue. A `Server Is Busy` error is returned for a new write to TiKV when this value is exceeded. -+ Default value: `"100MB"` -+ Unit: MB|GB ++ Default value: `"100MiB"` ++ Unit: MiB|GiB ### `enable-async-apply-prewrite` @@ -456,9 +456,9 @@ Configuration items related to storage. + When TiKV is started, some space is reserved on the disk as disk protection. When the remaining disk space is less than the reserved space, TiKV restricts some write operations. The reserved space is divided into two parts: 80% of the reserved space is used as the extra disk space required for operations when the disk space is insufficient, and the other 20% is used to store the temporary file. In the process of reclaiming space, if the storage is exhausted by using too much extra disk space, this temporary file serves as the last protection for restoring services. + The name of the temporary file is `space_placeholder_file`, located in the `storage.data-dir` directory. When TiKV goes offline because its disk space ran out, if you restart TiKV, the temporary file is automatically deleted and TiKV tries to reclaim the space. -+ When the remaining space is insufficient, TiKV does not create the temporary file. The effectiveness of the protection is related to the size of the reserved space. The size of the reserved space is the larger value between 5% of the disk capacity and this configuration value. When the value of this configuration item is `"0MB"`, TiKV disables this disk protection feature. -+ Default value: `"5GB"` -+ Unit: MB|GB ++ When the remaining space is insufficient, TiKV does not create the temporary file. The effectiveness of the protection is related to the size of the reserved space. The size of the reserved space is the larger value between 5% of the disk capacity and this configuration value. When the value of this configuration item is `"0MiB"`, TiKV disables this disk protection feature. ++ Default value: `"5GiB"` ++ Unit: MiB|GiB ### `enable-ttl` @@ -514,7 +514,7 @@ Configuration items related to the sharing of block cache among multiple RocksDB + When `storage.engine="raft-kv"`, the default value is 45% of the size of total system memory. + When `storage.engine="partitioned-raft-kv"`, the default value is 30% of the size of total system memory. -+ Unit: KB|MB|GB ++ Unit: KiB|MiB|GiB ## storage.flow-control @@ -538,12 +538,12 @@ Configuration items related to the flow control mechanism in TiKV. This mechanis ### `soft-pending-compaction-bytes-limit` + When the pending compaction bytes in KvDB reach this threshold, the flow control mechanism starts to reject some write requests and reports the `ServerIsBusy` error. When `enable` is set to `true`, this configuration item overrides `rocksdb.(defaultcf|writecf|lockcf).soft-pending-compaction-bytes-limit`. -+ Default value: `"192GB"` ++ Default value: `"192GiB"` ### `hard-pending-compaction-bytes-limit` + When the pending compaction bytes in KvDB reach this threshold, the flow control mechanism rejects all write requests and reports the `ServerIsBusy` error. When `enable` is set to `true`, this configuration item overrides `rocksdb.(defaultcf|writecf|lockcf).hard-pending-compaction-bytes-limit`. -+ Default value: `"1024GB"` ++ Default value: `"1024GiB"` ## storage.io-rate-limit @@ -552,7 +552,7 @@ Configuration items related to the I/O rate limiter. ### `max-bytes-per-sec` + Limits the maximum I/O bytes that a server can write to or read from the disk (determined by the `mode` configuration item below) in one second. When this limit is reached, TiKV prefers throttling background operations over foreground ones. The value of this configuration item should be set to the disk's optimal I/O bandwidth, for example, the maximum I/O bandwidth specified by your cloud disk vendor. When this configuration value is set to zero, disk I/O operations are not limited. -+ Default value: `"0MB"` ++ Default value: `"0MiB"` ### `mode` @@ -604,7 +604,7 @@ Configuration items related to Raftstore. + The storage capacity, which is the maximum size allowed to store data. If `capacity` is left unspecified, the capacity of the current disk prevails. To deploy multiple TiKV instances on the same physical disk, add this parameter to the TiKV configuration. For details, see [Key parameters of the hybrid deployment](/hybrid-deployment-topology.md#key-parameters). + Default value: `0` -+ Unit: KB|MB|GB ++ Unit: KiB|MiB|GiB ### `raftdb-path` @@ -668,10 +668,10 @@ Configuration items related to Raftstore. > This configuration item cannot be queried via SQL statements but can be configured in the configuration file. + The soft limit on the size of a single message packet -+ Default value: `"1MB"` ++ Default value: `"1MiB"` + Minimum value: greater than `0` -+ Maximum value: `3GB` -+ Unit: KB|MB|GB ++ Maximum value: `3GiB` ++ Unit: KiB|MiB|GiB ### `raft-max-inflight-msgs` @@ -687,9 +687,9 @@ Configuration items related to Raftstore. ### `raft-entry-max-size` + The hard limit on the maximum size of a single log -+ Default value: `"8MB"` ++ Default value: `"8MiB"` + Minimum value: `0` -+ Unit: MB|GB ++ Unit: MiB|GiB ### `raft-log-compact-sync-interval` New in v5.3 @@ -712,7 +712,7 @@ Configuration items related to Raftstore. ### `raft-log-gc-count-limit` + The hard limit on the allowable number of residual Raft logs -+ Default value: the log number that can be accommodated in the 3/4 Region size (calculated as 1MB for each log) ++ Default value: the log number that can be accommodated in the 3/4 Region size (calculated as 1MiB for each log) + Minimum value: `0` ### `raft-log-gc-size-limit` @@ -845,9 +845,9 @@ Configuration items related to Raftstore. ### `lock-cf-compact-bytes-threshold` + The size out of which TiKV triggers a manual compaction for the Lock Column Family -+ Default value: `"256MB"` ++ Default value: `"256MiB"` + Minimum value: `0` -+ Unit: MB ++ Unit: MiB ### `notify-capacity` @@ -900,9 +900,9 @@ Configuration items related to Raftstore. ### `snap-apply-batch-size` + The memory cache size required when the imported snapshot file is written into the disk -+ Default value: `"10MB"` ++ Default value: `"10MiB"` + Minimum value: `0` -+ Unit: MB ++ Unit: MiB ### `consistency-check-interval` @@ -990,7 +990,7 @@ Configuration items related to Raftstore. ### `store-io-pool-size` New in v5.3.0 + The allowable number of threads that process Raft I/O tasks, which is the size of the StoreWriter thread pool. When you modify the size of this thread pool, refer to [Performance tuning for TiKV thread pools](/tune-tikv-thread-performance.md#performance-tuning-for-tikv-thread-pools). -+ Default value: `0` ++ Default value: `1` (Before v8.0.0, the default value is `0`) + Minimum value: `0` ### `future-poll-size` @@ -1014,7 +1014,7 @@ Configuration items related to Raftstore. ### `raft-write-size-limit` New in v5.3.0 + Determines the threshold at which Raft data is written into the disk. If the data size is larger than the value of this configuration item, the data is written to the disk. When the value of `store-io-pool-size` is `0`, this configuration item does not take effect. -+ Default value: `1MB` ++ Default value: `1MiB` + Minimum value: `0` ### `report-min-resolved-ts-interval` New in v6.0.0 @@ -1155,9 +1155,9 @@ Configuration items related to RocksDB ### `max-manifest-file-size` + The maximum size of a RocksDB Manifest file -+ Default value: `"128MB"` ++ Default value: `"128MiB"` + Minimum value: `0` -+ Unit: B|KB|MB|GB ++ Unit: B|KiB|MiB|GiB ### `create-if-missing` @@ -1191,14 +1191,14 @@ Configuration items related to RocksDB + The size limit of the archived WAL files. When the value is exceeded, the system deletes these files. + Default value: `0` + Minimum value: `0` -+ Unit: B|KB|MB|GB ++ Unit: B|KiB|MiB|GiB ### `max-total-wal-size` + The maximum RocksDB WAL size in total, which is the size of `*.log` files in the `data-dir`. + Default value: - + When `storage.engine="raft-kv"`, the default value is `"4GB"`. + + When `storage.engine="raft-kv"`, the default value is `"4GiB"`. + When `storage.engine="partitioned-raft-kv"`, the default value is `1`. ### `stats-dump-period` @@ -1211,17 +1211,17 @@ Configuration items related to RocksDB ### `compaction-readahead-size` -+ Enables the readahead feature during RocksDB compaction and specifies the size of readahead data. If you are using mechanical disks, it is recommended to set the value to 2MB at least. ++ Enables the readahead feature during RocksDB compaction and specifies the size of readahead data. If you are using mechanical disks, it is recommended to set the value to 2MiB at least. + Default value: `0` + Minimum value: `0` -+ Unit: B|KB|MB|GB ++ Unit: B|KiB|MiB|GiB ### `writable-file-max-buffer-size` + The maximum buffer size used in WritableFileWrite -+ Default value: `"1MB"` ++ Default value: `"1MiB"` + Minimum value: `0` -+ Unit: B|KB|MB|GB ++ Unit: B|KiB|MiB|GiB ### `use-direct-io-for-flush-and-compaction` @@ -1231,9 +1231,9 @@ Configuration items related to RocksDB ### `rate-bytes-per-sec` + When Titan is disabled, this configuration item limits the I/O rate of RocksDB compaction to reduce the impact of RocksDB compaction on the foreground read and write performance during traffic peaks. When Titan is enabled, this configuration item limits the summed I/O rates of RocksDB compaction and Titan GC. If you find that the I/O or CPU consumption of RocksDB compaction and Titan GC is too large, set this configuration item to an appropriate value according the disk I/O bandwidth and the actual write traffic. -+ Default value: `10GB` ++ Default value: `10GiB` + Minimum value: `0` -+ Unit: B|KB|MB|GB ++ Unit: B|KiB|MiB|GiB ### `rate-limiter-refill-period` @@ -1259,23 +1259,23 @@ Configuration items related to RocksDB ### `bytes-per-sync` + The rate at which OS incrementally synchronizes files to disk while these files are being written asynchronously -+ Default value: `"1MB"` ++ Default value: `"1MiB"` + Minimum value: `0` -+ Unit: B|KB|MB|GB ++ Unit: B|KiB|MiB|GiB ### `wal-bytes-per-sync` + The rate at which OS incrementally synchronizes WAL files to disk while the WAL files are being written -+ Default value: `"512KB"` ++ Default value: `"512KiB"` + Minimum value: `0` -+ Unit: B|KB|MB|GB ++ Unit: B|KiB|MiB|GiB ### `info-log-max-size` + The maximum size of Info log -+ Default value: `"1GB"` ++ Default value: `"1GiB"` + Minimum value: `0` -+ Unit: B|KB|MB|GB ++ Unit: B|KiB|MiB|GiB ### `info-log-roll-time` @@ -1335,7 +1335,7 @@ Configuration items related to Titan. > > - To enhance the performance of wide table and JSON data writing and point query, starting from TiDB v7.6.0, the default value changes from `false` to `true`, which means that Titan is enabled by default. > - Existing clusters upgraded to v7.6.0 or later versions retain the original configuration, which means that if Titan is not explicitly enabled, it still uses RocksDB. -> - If the cluster has enabled Titan before upgrading to TiDB v7.6.0 or later versions, Titan will be retained after the upgrade, and the [`min-blob-size`](/tikv-configuration-file.md#min-blob-size) configuration before the upgrade will be retained. If you do not explicitly configure the value before the upgrade, the default value of the previous version `1KB` will be retained to ensure the stability of the cluster configuration after the upgrade. +> - If the cluster has enabled Titan before upgrading to TiDB v7.6.0 or later versions, Titan will be retained after the upgrade, and the [`min-blob-size`](/tikv-configuration-file.md#min-blob-size) configuration before the upgrade will be retained. If you do not explicitly configure the value before the upgrade, the default value of the previous version `1KiB` will be retained to ensure the stability of the cluster configuration after the upgrade. + Enables or disables Titan. + Default value: `true` @@ -1363,10 +1363,10 @@ Configuration items related to `rocksdb.defaultcf`, `rocksdb.writecf`, and `rock ### `block-size` + The default size of a RocksDB block -+ Default value for `defaultcf` and `writecf`: `"32KB"` -+ Default value for `lockcf`: `"16KB"` -+ Minimum value: `"1KB"` -+ Unit: KB|MB|GB ++ Default value for `defaultcf` and `writecf`: `"32KiB"` ++ Default value for `lockcf`: `"16KiB"` ++ Minimum value: `"1KiB"` ++ Unit: KiB|MiB|GiB ### `block-cache-size` @@ -1379,7 +1379,7 @@ Configuration items related to `rocksdb.defaultcf`, `rocksdb.writecf`, and `rock + Default value for `writecf`: `Total machine memory * 15%` + Default value for `lockcf`: `Total machine memory * 2%` + Minimum value: `0` -+ Unit: KB|MB|GB ++ Unit: KiB|MiB|GiB ### `disable-block-cache` @@ -1460,12 +1460,12 @@ Configuration items related to `rocksdb.defaultcf`, `rocksdb.writecf`, and `rock ### `write-buffer-size` + Memtable size -+ Default value for `defaultcf` and `writecf`: `"128MB"` ++ Default value for `defaultcf` and `writecf`: `"128MiB"` + Default value for `lockcf`: - + When `storage.engine="raft-kv"`, the default value is `"32MB"`. - + When `storage.engine="partitioned-raft-kv"`, the default value is `"4MB"`. + + When `storage.engine="raft-kv"`, the default value is `"32MiB"`. + + When `storage.engine="partitioned-raft-kv"`, the default value is `"4MiB"`. + Minimum value: `0` -+ Unit: KB|MB|GB ++ Unit: KiB|MiB|GiB ### `max-write-buffer-number` @@ -1482,18 +1482,18 @@ Configuration items related to `rocksdb.defaultcf`, `rocksdb.writecf`, and `rock ### `max-bytes-for-level-base` + The maximum number of bytes at base level (level-1). Generally, it is set to 4 times the size of a memtable. When the level-1 data size reaches the limit value of `max-bytes-for-level-base`, the SST files of level-1 and their overlapping SST files of level-2 will be compacted. -+ Default value for `defaultcf` and `writecf`: `"512MB"` -+ Default value for `lockcf`: `"128MB"` ++ Default value for `defaultcf` and `writecf`: `"512MiB"` ++ Default value for `lockcf`: `"128MiB"` + Minimum value: `0` -+ Unit: KB|MB|GB -+ It is recommended that the value of `max-bytes-for-level-base` is set approximately equal to the data volume in L0 to reduce unnecessary compaction. For example, if the compression method is "no:no:lz4:lz4:lz4:lz4:lz4", the value of `max-bytes-for-level-base` should be `write-buffer-size * 4`, because there is no compression of L0 and L1 and the trigger condition of compaction for L0 is that the number of the SST files reaches 4 (the default value). When L0 and L1 both adopt compaction, you need to analyze RocksDB logs to understand the size of an SST file compressed from a memtable. For example, if the file size is 32 MB, it is recommended to set the value of `max-bytes-for-level-base` to 128 MB (`32 MB * 4`). ++ Unit: KiB|MiB|GiB ++ It is recommended that the value of `max-bytes-for-level-base` is set approximately equal to the data volume in L0 to reduce unnecessary compaction. For example, if the compression method is "no:no:lz4:lz4:lz4:lz4:lz4", the value of `max-bytes-for-level-base` should be `write-buffer-size * 4`, because there is no compression of L0 and L1 and the trigger condition of compaction for L0 is that the number of the SST files reaches 4 (the default value). When L0 and L1 both adopt compaction, you need to analyze RocksDB logs to understand the size of an SST file compressed from a memtable. For example, if the file size is 32 MiB, it is recommended to set the value of `max-bytes-for-level-base` to 128 MiB (`32 MiB * 4`). ### `target-file-size-base` + The size of the target file at base level. This value is overridden by `compaction-guard-max-output-file-size` when the `enable-compaction-guard` value is `true`. -+ Default value: `"8MB"` ++ Default value: `"8MiB"` + Minimum value: `0` -+ Unit: KB|MB|GB ++ Unit: KiB|MiB|GiB ### `level0-file-num-compaction-trigger` @@ -1517,9 +1517,9 @@ Configuration items related to `rocksdb.defaultcf`, `rocksdb.writecf`, and `rock ### `max-compaction-bytes` + The maximum number of bytes written into disk per compaction -+ Default value: `"2GB"` ++ Default value: `"2GiB"` + Minimum value: `0` -+ Unit: KB|MB|GB ++ Unit: KiB|MiB|GiB ### `compaction-pri` @@ -1561,14 +1561,14 @@ Configuration items related to `rocksdb.defaultcf`, `rocksdb.writecf`, and `rock ### `soft-pending-compaction-bytes-limit` + The soft limit on the pending compaction bytes. When `storage.flow-control.enable` is set to `true`, `storage.flow-control.soft-pending-compaction-bytes-limit` overrides this configuration item. -+ Default value: `"192GB"` -+ Unit: KB|MB|GB ++ Default value: `"192GiB"` ++ Unit: KiB|MiB|GiB ### `hard-pending-compaction-bytes-limit` + The hard limit on the pending compaction bytes. When `storage.flow-control.enable` is set to `true`, `storage.flow-control.hard-pending-compaction-bytes-limit` overrides this configuration item. -+ Default value: `"256GB"` -+ Unit: KB|MB|GB ++ Default value: `"256GiB"` ++ Unit: KiB|MiB|GiB ### `enable-compaction-guard` @@ -1579,14 +1579,14 @@ Configuration items related to `rocksdb.defaultcf`, `rocksdb.writecf`, and `rock ### `compaction-guard-min-output-file-size` + The minimum SST file size when the compaction guard is enabled. This configuration prevents SST files from being too small when the compaction guard is enabled. -+ Default value: `"8MB"` -+ Unit: KB|MB|GB ++ Default value: `"8MiB"` ++ Unit: KiB|MiB|GiB ### `compaction-guard-max-output-file-size` + The maximum SST file size when the compaction guard is enabled. The configuration prevents SST files from being too large when the compaction guard is enabled. This configuration overrides `target-file-size-base` for the same column family. -+ Default value: `"128MB"` -+ Unit: KB|MB|GB ++ Default value: `"128MiB"` ++ Unit: KiB|MiB|GiB ### `format-version` New in v6.2.0 @@ -1627,14 +1627,14 @@ Configuration items related to `rocksdb.defaultcf.titan`. > **Note:** > -> - Starting from TiDB v7.6.0, Titan is enabled by default to enhance the performance of wide table and JSON data writing and point query. The default value of `min-blob-size` changes from `1KB` to `32KB`. This means that values exceeding `32KB` is stored in Titan, while other data continues to be stored in RocksDB. -> - To ensure configuration consistency, for existing clusters upgrading to TiDB v7.6.0 or later versions, if you do not explicitly set `min-blob-size` before the upgrade, TiDB retains the previous default value of `1KB`. -> - A value smaller than `32KB` might affect the performance of range scans. However, if the workload primarily involves heavy writes and point queries, you can consider decreasing the value of `min-blob-size` for better performance. +> - Starting from TiDB v7.6.0, Titan is enabled by default to enhance the performance of wide table and JSON data writing and point query. The default value of `min-blob-size` changes from `1KiB` to `32KiB`. This means that values exceeding `32KiB` is stored in Titan, while other data continues to be stored in RocksDB. +> - To ensure configuration consistency, for existing clusters upgrading to TiDB v7.6.0 or later versions, if you do not explicitly set `min-blob-size` before the upgrade, TiDB retains the previous default value of `1KiB`. +> - A value smaller than `32KiB` might affect the performance of range scans. However, if the workload primarily involves heavy writes and point queries, you can consider decreasing the value of `min-blob-size` for better performance. + The smallest value stored in a Blob file. Values smaller than the specified size are stored in the LSM-Tree. -+ Default value: `"32KB"` ++ Default value: `"32KiB"` + Minimum value: `0` -+ Unit: KB|MB|GB ++ Unit: KiB|MiB|GiB ### `blob-file-compression` @@ -1649,31 +1649,36 @@ Configuration items related to `rocksdb.defaultcf.titan`. ### `zstd-dict-size` -+ The zstd dictionary compression size. The default value is `"0KB"`, which means to disable the zstd dictionary compression. In this case, Titan compresses data based on single values, whereas RocksDB compresses data based on blocks (`32KB` by default). When the average size of Titan values is less than `32KB`, Titan's compression ratio is lower than that of RocksDB. Taking JSON as an example, the store size in Titan can be 30% to 50% larger than that of RocksDB. The actual compression ratio depends on whether the value content is suitable for compression and the similarity among different values. You can enable the zstd dictionary compression to increase the compression ratio by configuring `zstd-dict-size` (for example, set it to `16KB`). The actual store size can be lower than that of RocksDB. But the zstd dictionary compression might lead to about 10% performance regression in specific workloads. -+ Default value: `"0KB"` -+ Unit: KB|MB|GB ++ The zstd dictionary compression size. The default value is `"0KiB"`, which means to disable the zstd dictionary compression. In this case, Titan compresses data based on single values, whereas RocksDB compresses data based on blocks (`32KiB` by default). When the average size of Titan values is less than `32KiB`, Titan's compression ratio is lower than that of RocksDB. Taking JSON as an example, the store size in Titan can be 30% to 50% larger than that of RocksDB. The actual compression ratio depends on whether the value content is suitable for compression and the similarity among different values. You can enable the zstd dictionary compression to increase the compression ratio by configuring `zstd-dict-size` (for example, set it to `16KiB`). The actual store size can be lower than that of RocksDB. But the zstd dictionary compression might lead to about 10% performance regression in specific workloads. ++ Default value: `"0KiB"` ++ Unit: KiB|MiB|GiB ### `blob-cache-size` + The cache size of a Blob file -+ Default value: `"0GB"` ++ Default value: `"0GiB"` + Minimum value: `0` -+ Recommended value: After database stabilization, it is recommended to set the RocksDB block cache (`storage.block-cache.capacity`) based on monitoring to maintain a block cache hit rate of at least 95%, and set `blob-cache-size` to `(total memory size) * 50% - (size of block cache)`. This is to ensure that the block cache is sufficiently large to cache the entire RocksDB, while maximizing the blob cache size. However, to prevent a significant drop in the block cache hit rate, do not set the blob cache size too large. -+ Unit: KB|MB|GB ++ Recommended value: `0`. Starting from v8.0.0, TiKV introduces the `shared-blob-cache` configuration item and enables it by default, so there is no need to set `blob-cache-size` separately. The configuration of `blob-cache-size` only takes effect when `shared-blob-cache` is set to `false`. ++ Unit: KiB|MiB|GiB + +### `shared-blob-cache` (New in v8.0.0) + ++ Controls whether to enable the shared cache for Titan blob files and RocksDB block files. ++ Default value: `true`. When the shared cache is enabled, block files have higher priority. This means that TiKV prioritizes meeting the cache needs of block files and then uses the remaining cache for blob files. ### `min-gc-batch-size` + The minimum total size of Blob files required to perform GC for one time -+ Default value: `"16MB"` ++ Default value: `"16MiB"` + Minimum value: `0` -+ Unit: KB|MB|GB ++ Unit: KiB|MiB|GiB ### `max-gc-batch-size` + The maximum total size of Blob files allowed to perform GC for one time -+ Default value: `"64MB"` ++ Default value: `"64MiB"` + Minimum value: `0` -+ Unit: KB|MB|GB ++ Unit: KiB|MiB|GiB ### `discardable-ratio` @@ -1699,9 +1704,9 @@ Configuration items related to `rocksdb.defaultcf.titan`. ### `merge-small-file-threshold` + When the size of a Blob file is smaller than this value, the Blob file might still be selected for GC. In this situation, `discardable-ratio` is ignored. -+ Default value: `"8MB"` ++ Default value: `"8MiB"` + Minimum value: `0` -+ Unit: KB|MB|GB ++ Unit: KiB|MiB|GiB ### `blob-run-mode` @@ -1742,9 +1747,9 @@ Configuration items related to `raftdb` ### `max-manifest-file-size` + The maximum size of a RocksDB Manifest file -+ Default value: `"20MB"` ++ Default value: `"20MiB"` + Minimum value: `0` -+ Unit: B|KB|MB|GB ++ Unit: B|KiB|MiB|GiB ### `create-if-missing` @@ -1775,29 +1780,29 @@ Configuration items related to `raftdb` + The size limit of the archived WAL files. When the value is exceeded, the system deletes these files. + Default value: `0` + Minimum value: `0` -+ Unit: B|KB|MB|GB ++ Unit: B|KiB|MiB|GiB ### `max-total-wal-size` + The maximum RocksDB WAL size in total -+ Default value: `"4GB"` - + When `storage.engine="raft-kv"`, the default value is `"4GB"`. ++ Default value: + + When `storage.engine="raft-kv"`, the default value is `"4GiB"`. + When `storage.engine="partitioned-raft-kv"`, the default value is `1`. ### `compaction-readahead-size` + Controls whether to enable the readahead feature during RocksDB compaction and specify the size of readahead data. -+ If you use mechanical disks, it is recommended to set the value to `2MB` at least. ++ If you use mechanical disks, it is recommended to set the value to `2MiB` at least. + Default value: `0` + Minimum value: `0` -+ Unit: B|KB|MB|GB ++ Unit: B|KiB|MiB|GiB ### `writable-file-max-buffer-size` + The maximum buffer size used in WritableFileWrite -+ Default value: `"1MB"` ++ Default value: `"1MiB"` + Minimum value: `0` -+ Unit: B|KB|MB|GB ++ Unit: B|KiB|MiB|GiB ### `use-direct-io-for-flush-and-compaction` @@ -1817,23 +1822,23 @@ Configuration items related to `raftdb` ### `bytes-per-sync` + The rate at which OS incrementally synchronizes files to disk while these files are being written asynchronously -+ Default value: `"1MB"` ++ Default value: `"1MiB"` + Minimum value: `0` -+ Unit: B|KB|MB|GB ++ Unit: B|KiB|MiB|GiB ### `wal-bytes-per-sync` + The rate at which OS incrementally synchronizes WAL files to disk when the WAL files are being written -+ Default value: `"512KB"` ++ Default value: `"512KiB"` + Minimum value: `0` -+ Unit: B|KB|MB|GB ++ Unit: B|KiB|MiB|GiB ### `info-log-max-size` + The maximum size of Info logs -+ Default value: `"1GB"` ++ Default value: `"1GiB"` + Minimum value: `0` -+ Unit: B|KB|MB|GB ++ Unit: B|KiB|MiB|GiB ### `info-log-roll-time` @@ -1880,24 +1885,24 @@ Configuration items related to Raft Engine. ### `batch-compression-threshold` + Specifies the threshold size of a log batch. A log batch larger than this configuration is compressed. If you set this configuration item to `0`, compression is disabled. -+ Default value: `"8KB"` ++ Default value: `"8KiB"` ### `bytes-per-sync` + Specifies the maximum accumulative size of buffered writes. When this configuration value is exceeded, buffered writes are flushed to the disk. + If you set this configuration item to `0`, incremental sync is disabled. -+ Default value: `"4MB"` ++ Default value: `"4MiB"` ### `target-file-size` + Specifies the maximum size of log files. When a log file is larger than this value, it is rotated. -+ Default value: `"128MB"` ++ Default value: `"128MiB"` ### `purge-threshold` + Specifies the threshold size of the main log queue. When this configuration value is exceeded, the main log queue is purged. + This configuration can be used to adjust the disk space usage of Raft Engine. -+ Default value: `"10GB"` ++ Default value: `"10GiB"` ### `recovery-mode` @@ -1908,7 +1913,7 @@ Configuration items related to Raft Engine. ### `recovery-read-block-size` + The minimum I/O size for reading log files during recovery. -+ Default value: `"16KB"` ++ Default value: `"16KiB"` + Minimum value: `"512B"` ### `recovery-threads` @@ -2066,7 +2071,7 @@ Configuration items related to TiDB Lightning import and BR restore. + The garbage ratio threshold to trigger GC. + Default value: `1.1` -### `num-threads` New in v6.5.8 and v7.6.0 +### `num-threads` New in v6.5.8, v7.1.4, v7.5.1, and v7.6.0 + The number of GC threads when `enable-compaction-filter` is `false`. + Default value: `1` @@ -2091,7 +2096,7 @@ Configuration items related to BR backup. + The threshold of the backup SST file size. If the size of a backup file in a TiKV Region exceeds this threshold, the file is backed up to several files with the TiKV Region split into multiple Region ranges. Each of the files in the split Regions is the same size as `sst-max-size` (or slightly larger). + For example, when the size of a backup file in the Region of `[a,e)` is larger than `sst-max-size`, the file is backed up to several files with regions `[a,b)`, `[b,c)`, `[c,d)` and `[d,e)`, and the size of `[a,b)`, `[b,c)`, `[c,d)` is the same as that of `sst-max-size` (or slightly larger). -+ Default value: `"144MB"` ++ Default value: `"144MiB"` ### `enable-auto-tune` New in v5.4.0 @@ -2139,12 +2144,13 @@ Configuration items related to log backup. ### `initial-scan-pending-memory-quota` New in v6.2.0 + The quota of cache used for storing incremental scan data during log backup. -+ Default value: `min(Total machine memory * 10%, 512 MB)` ++ Default value: `min(Total machine memory * 10%, 512 MiB)` ### `initial-scan-rate-limit` New in v6.2.0 + The rate limit on throughput in an incremental data scan during log backup, which means the maximum amount of data that can be read from the disk per second. Note that if you only specify a number (for example, `60`), the unit is Byte instead of KiB. + Default value: 60MiB ++ Minimum value: 1MiB ### `max-flush-interval` New in v6.2.0 @@ -2174,17 +2180,17 @@ Configuration items related to TiCDC. ### `old-value-cache-memory-quota` + The upper limit of memory usage by TiCDC old values. -+ Default value: `512MB` ++ Default value: `512MiB` ### `sink-memory-quota` + The upper limit of memory usage by TiCDC data change events. -+ Default value: `512MB` ++ Default value: `512MiB` ### `incremental-scan-speed-limit` + The maximum speed at which historical data is incrementally scanned. -+ Default value: `"128MB"`, which means 128 MB per second. ++ Default value: `"128MiB"`, which means 128 MiB per second. ### `incremental-scan-threads` @@ -2268,14 +2274,14 @@ Suppose that your machine on which TiKV is deployed has limited resources, for e #### `foreground-write-bandwidth` New in v6.0.0 + The soft limit on the bandwidth with which transactions write data. -+ Default value: `0KB` (which means no limit) -+ Recommended setting: Use the default value `0` in most cases unless the `foreground-cpu-time` setting is not enough to limit the write bandwidth. For such an exception, it is recommended to set the value smaller than `50MB` in the instance with 4 or less cores. ++ Default value: `0KiB` (which means no limit) ++ Recommended setting: Use the default value `0` in most cases unless the `foreground-cpu-time` setting is not enough to limit the write bandwidth. For such an exception, it is recommended to set the value smaller than `50MiB` in the instance with 4 or less cores. #### `foreground-read-bandwidth` New in v6.0.0 + The soft limit on the bandwidth with which transactions and the Coprocessor read data. -+ Default value: `0KB` (which means no limit) -+ Recommended setting: Use the default value `0` in most cases unless the `foreground-cpu-time` setting is not enough to limit the read bandwidth. For such an exception, it is recommended to set the value smaller than `20MB` in the instance with 4 or less cores. ++ Default value: `0KiB` (which means no limit) ++ Recommended setting: Use the default value `0` in most cases unless the `foreground-cpu-time` setting is not enough to limit the read bandwidth. For such an exception, it is recommended to set the value smaller than `20MiB` in the instance with 4 or less cores. ### Background Quota Limiter @@ -2301,7 +2307,7 @@ Suppose that your machine on which TiKV is deployed has limited resources, for e > This configuration item is returned in the result of `SHOW CONFIG`, but currently setting it does not take any effect. + The soft limit on the bandwidth with which background transactions write data. -+ Default value: `0KB` (which means no limit) ++ Default value: `0KiB` (which means no limit) #### `background-read-bandwidth` New in v6.2.0 @@ -2310,7 +2316,7 @@ Suppose that your machine on which TiKV is deployed has limited resources, for e > This configuration item is returned in the result of `SHOW CONFIG`, but currently setting it does not take any effect. + The soft limit on the bandwidth with which background transactions and the Coprocessor read data. -+ Default value: `0KB` (which means no limit) ++ Default value: `0KiB` (which means no limit) #### `enable-auto-tune` New in v6.2.0 @@ -2374,24 +2380,24 @@ Configuration items related to [Load Base Split](/configure-load-base-split.md). + Controls the traffic threshold at which a Region is identified as a hotspot. + Default value: - + `30MiB` per second when [`region-split-size`](#region-split-size) is less than 4 GB. - + `100MiB` per second when [`region-split-size`](#region-split-size) is greater than or equal to 4 GB. + + `30MiB` per second when [`region-split-size`](#region-split-size) is less than 4 GiB. + + `100MiB` per second when [`region-split-size`](#region-split-size) is greater than or equal to 4 GiB. ### `qps-threshold` + Controls the QPS threshold at which a Region is identified as a hotspot. + Default value: - + `3000` when [`region-split-size`](#region-split-size) is less than 4 GB. - + `7000` when [`region-split-size`](#region-split-size) is greater than or equal to 4 GB. + + `3000` when [`region-split-size`](#region-split-size) is less than 4 GiB. + + `7000` when [`region-split-size`](#region-split-size) is greater than or equal to 4 GiB. ### `region-cpu-overload-threshold-ratio` New in v6.2.0 + Controls the CPU usage threshold at which a Region is identified as a hotspot. + Default value: - + `0.25` when [`region-split-size`](#region-split-size) is less than 4 GB. - + `0.75` when [`region-split-size`](#region-split-size) is greater than or equal to 4 GB. + + `0.25` when [`region-split-size`](#region-split-size) is less than 4 GiB. + + `0.75` when [`region-split-size`](#region-split-size) is greater than or equal to 4 GiB. ## memory New in v7.5.0 @@ -2403,4 +2409,4 @@ Configuration items related to [Load Base Split](/configure-load-base-split.md). ### `profiling-sample-per-bytes` New in v7.5.0 + Specifies the amount of data sampled by Heap Profiling each time, rounding up to the nearest power of 2. -+ Default value: `512KB` ++ Default value: `512KiB` diff --git a/tiproxy/tiproxy-grafana.md b/tiproxy/tiproxy-grafana.md index 0d77836e7d694..8339349dfe150 100644 --- a/tiproxy/tiproxy-grafana.md +++ b/tiproxy/tiproxy-grafana.md @@ -47,6 +47,7 @@ TiProxy has four panel groups. The metrics on these panels indicate the current - CPS by Instance: command per second of each TiProxy instance - CPS by Backend: command per second of each TiDB instance - CPS by CMD: command per second grouped by SQL command type +- Handshake Duration: average, P95, and P99 duration of the handshake phase between the client and TiProxy ## Balance @@ -59,3 +60,10 @@ TiProxy has four panel groups. The metrics on these panels indicate the current - Get Backend Duration: the average, p95, p99 duration of TiProxy connecting to a TiDB instance - Ping Backend Duration: the network latency between each TiProxy instance and each TiProxy instance. For example, `10.24.31.1:6000 | 10.24.31.2:4000` indicates the network latency between TiProxy instance `10.24.31.1:6000` and TiDB instance `10.24.31.2:4000` - Health Check Cycle: the duration of a cycle of the health check between a TiProxy instance and all TiDB instances. For example, `10.24.31.1:6000` indicates the duration of the latest health check that TiProxy instance `10.24.31.1:6000` executes on all the TiDB instances. If this duration is higher than 3 seconds, TiProxy may not be timely to refresh the backend TiDB list + +## Traffic + +- Bytes/Second from Backends: the amount of data, in bytes, sent from each TiDB instance to each TiProxy instance per second. +- Packets/Second from Backends: the number of MySQL packets sent from each TiDB instance to each TiProxy instance per second. +- Bytes/Second to Backends: the amount of data, in bytes, sent from each TiProxy instance to each TiDB instance per second. +- Packets/Second to Backends: the number of MySQL packets sent from each TiProxy instance to each TiDB instance per second. diff --git a/tiproxy/tiproxy-overview.md b/tiproxy/tiproxy-overview.md index c67f44c6798a9..32eda9fb4040f 100644 --- a/tiproxy/tiproxy-overview.md +++ b/tiproxy/tiproxy-overview.md @@ -11,7 +11,7 @@ TiProxy is an optional component. You can also use a third-party proxy component The following figure shows the architecture of TiProxy: -![TiProxy architecture](/media/tiproxy/tiproxy-architecture.png) +TiProxy architecture ## Main features @@ -23,7 +23,7 @@ TiProxy can migrate connections from one TiDB server to another without breaking As shown in the following figure, the client originally connects to TiDB 1 through TiProxy. After the connection migration, the client actually connects to TiDB 2. When TiDB 1 is about to be offline or the ratio of connections on TiDB 1 to connections on TiDB 2 exceeds the set threshold, the connection migration is triggered. The client is unaware of the connection migration. -![TiProxy connection migration](/media/tiproxy/tiproxy-session-migration.png) +TiProxy connection migration Connection migration usually occurs in the following scenarios: @@ -59,7 +59,7 @@ This section describes how to deploy and change TiProxy using TiUP. For how to d ### Deploy TiProxy -1. Generate a self-signed certificate. +1. Before TiUP v1.15.0, you need to manually generate a self-signed certificate. Generate a self-signed certificate for the TiDB instance and place the certificate on all TiDB instances to ensure that all TiDB instances have the same certificate. For detailed steps, see [Generate self-signed certificates](/generate-self-signed-certificates.md). @@ -67,8 +67,8 @@ This section describes how to deploy and change TiProxy using TiUP. For how to d When using TiProxy, you also need to configure the following items for the TiDB instances: - - Configure the [`security.session-token-signing-cert`](/tidb-configuration-file.md#session-token-signing-cert-new-in-v640) and [`security.session-token-signing-key`](/tidb-configuration-file.md#session-token-signing-key-new-in-v640) of TiDB instances to the path of the certificate. Otherwise, the connection cannot be migrated. - - Configure the [`graceful-wait-before-shutdown`](/tidb-configuration-file.md#graceful-wait-before-shutdown-new-in-v50) of TiDB instances to a value greater than the longest transaction duration of the application. Otherwise, the client might disconnect when the TiDB server is offline. For details, see [TiProxy usage limitations](#limitations). + - Before TiUP v1.15.0, configure the [`security.session-token-signing-cert`](/tidb-configuration-file.md#session-token-signing-cert-new-in-v640) and [`security.session-token-signing-key`](/tidb-configuration-file.md#session-token-signing-key-new-in-v640) of TiDB instances to the path of the certificate. Otherwise, the connection cannot be migrated. + - Configure the [`graceful-wait-before-shutdown`](/tidb-configuration-file.md#graceful-wait-before-shutdown-new-in-v50) of TiDB instances to a value greater than the longest transaction duration of the application. Otherwise, the client might disconnect when the TiDB server is offline. You can view the transaction duration through the [Transaction metrics on the TiDB monitoring dashboard](/grafana-tidb-dashboard.md#transaction). For details, see [TiProxy usage limitations](#limitations). A configuration example is as follows: @@ -100,7 +100,7 @@ This section describes how to deploy and change TiProxy using TiUP. For how to d ```yaml component_versions: - tiproxy: "v0.2.0" + tiproxy: "v1.0.0" server_configs: tiproxy: security.server-tls.ca: "/var/ssl/ca.pem" @@ -202,3 +202,8 @@ The following table lists some supported connectors: | Python | PyMySQL | 0.7 | Note that some connectors call the common library to connect to the database, and these connectors are not listed in the table. You can refer to the above table for the required version of the corresponding library. For example, MySQL/Ruby uses libmysqlclient to connect to the database, so it requires that the libmysqlclient used by MySQL/Ruby is version 5.5.7 or later. + +## TiProxy resources + +- [TiProxy Release Notes](https://github.com/pingcap/tiproxy/releases) +- [TiProxy Issues](https://github.com/pingcap/tiup/issues): Lists TiProxy GitHub issues diff --git a/tiproxy/tiproxy-performance-test.md b/tiproxy/tiproxy-performance-test.md index 90b3c648cfee3..7cc0992afd228 100644 --- a/tiproxy/tiproxy-performance-test.md +++ b/tiproxy/tiproxy-performance-test.md @@ -9,33 +9,34 @@ This report tests the performance of TiProxy in the OLTP scenario of Sysbench an The results are as follows: -- The QPS upper limit of TiProxy is affected by the type of workload. Under the basic workloads of Sysbench and the same CPU usage, the QPS of TiProxy is about 20% to 40% lower than that of HAProxy. -- The number of TiDB server instances that TiProxy can hold varies according to the type of workload. Under the basic workloads of Sysbench, a TiProxy can hold 4 to 10 TiDB server instances of the same model. -- The performance of TiProxy is more affected by the number of vCPUs, compared to HAProxy. When the returned data is 10,000 rows and the CPU usage is the same, the QPS of TiProxy is about 30% lower than that of HAProxy. +- The QPS upper limit of TiProxy is affected by the type of workload. Under the basic workloads of Sysbench and the same CPU usage, the QPS of TiProxy is about 25% lower than that of HAProxy. +- The number of TiDB server instances that TiProxy can hold varies according to the type of workload. Under the basic workloads of Sysbench, a TiProxy can hold 5 to 12 TiDB server instances of the same model. +- The row number of the query result set has a significant impact on the QPS of TiProxy, and the impact is the same as that of HAProxy. - The performance of TiProxy increases almost linearly with the number of vCPUs. Therefore, increasing the number of vCPUs can effectively improve the QPS upper limit. +- The number of long connections and the frequency of creating short connections have minimal impact on the QPS of TiProxy. ## Test environment ### Hardware configuration -| Service | Machine Type | CPU Architecture | Instance Count | +| Service | Machine type | CPU model | Instance count | | --- | --- | --- | --- | -| TiProxy | 4C8G | AMD64 | 1 | -| HAProxy | 4C8G | AMD64 | 1 | -| PD | 4C8G | AMD64 | 3 | -| TiDB | 8C16G | AMD64 | 8 | -| TiKV | 8C16G | AMD64 | 8 | -| Sysbench | 8C16G | AMD64 | 1 | +| TiProxy | 4C8G | Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz | 1 | +| HAProxy | 4C8G | Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz | 1 | +| PD | 4C8G | Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz | 3 | +| TiDB | 8C16G | Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz | 8 | +| TiKV | 8C16G | Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz | 8 | +| Sysbench | 8C16G | Intel(R) Xeon(R) Silver 4214R CPU @ 2.40GHz | 1 | ### Software | Service | Software version | | --- | --- | -| TiProxy | v0.2.0 | +| TiProxy | v1.0.0 | | HAProxy | 2.9.0 | -| PD | v7.6.0 | -| TiDB | v7.6.0 | -| TiKV | v7.6.0 | +| PD | v8.0.0 | +| TiDB | v8.0.0 | +| TiKV | v8.0.0 | | Sysbench | 1.0.17 | ### Configuration @@ -102,73 +103,73 @@ sysbench $testname \ TiProxy test results: -| Threads | QPS | Avg latency(ms) | P95 latency (ms) | TiProxy CPU usage | TiDB overall CPU Usage | -| --- | --- | --- | --- | --- | --- | -| 20 | 43935 | 0.45 | 0.63 | 210% | 900% | -| 50 | 87870 | 0.57 | 0.77 | 350% | 1700% | -| 100 | 91611 | 1.09 | 1.79 | 400% | 1800% | +| Threads | QPS | Avg latency (ms) | P95 latency (ms) | TiProxy CPU usage | TiDB overall CPU usage | +|---------|---------|------------------|------------------|-------------------|------------------------| +| 20 | 41273 | 0.48 | 0.64 | 190% | 900% | +| 50 | 100255 | 0.50 | 0.62 | 330% | 1900% | +| 100 | 137688 | 0.73 | 1.01 | 400% | 2600% | HAProxy test results: -| Threads | QPS | Avg latency(ms) | P95 latency (ms) | HAProxy CPU usage | TiDB overall CPU Usage | -| --- | --- | --- | --- | --- | --- | -| 20 | 43629 | 0.46 | 0.63 | 130% | 900% | -| 50 | 102934 | 0.49 | 0.61 | 320% | 2000% | -| 100 | 157880 | 0.63 | 0.81 | 400% | 3000% | +| Threads | QPS | Avg latency (ms) | P95 latency (ms) | HAProxy CPU usage | TiDB overall CPU usage | +|---------|--------|------------------|------------------|-------------------|------------------------| +| 20 | 44833 | 0.45 | 0.61 | 140% | 1000% | +| 50 | 103631 | 0.48 | 0.61 | 270% | 2100% | +| 100 | 163069 | 0.61 | 0.77 | 360% | 3100% | ### Read Only TiProxy test results: -| Threads | QPS | Avg latency(ms) | P95 latency (ms) | TiProxy CPU usage | TiDB overall CPU Usage | -| --- | --- | --- | --- | --- | --- | -| 50 | 71816 | 11.14 | 12.98 | 340% | 2500% | -| 100 | 79299 | 20.17 | 23.95 | 400% | 2800% | -| 200 | 83371 | 38.37 | 46.63 | 400% | 2900% | +| Threads | QPS | Avg latency (ms) | P95 latency (ms) | TiProxy CPU usage | TiDB overall CPU usage | +|---------|--------|------------------|------------------|-------------------|------------------------| +| 50 | 72076 | 11.09 | 12.75 | 290% | 2500% | +| 100 | 109704 | 14.58 | 17.63 | 370% | 3800% | +| 200 | 117519 | 27.21 | 32.53 | 400% | 4100% | HAProxy test results: -| Threads | QPS | Avg latency(ms) | P95 latency (ms) | HAProxy CPU usage | TiDB overall CPU Usage | -| --- | --- | --- | --- | --- | --- | -| 50 | 74945 | 10.67 | 12.08 | 250% | 2500% | -| 100 | 118526 | 13.49 | 18.28 | 350% | 4000% | -| 200 | 131102 | 24.39 | 34.33 | 390% | 4300% | +| Threads | QPS | Avg latency (ms) | P95 latency (ms) | HAProxy CPU usage | TiDB overall CPU usage | +|---------|---------|------------------|------------------|-------------------|------------------------| +| 50 | 75760 | 10.56 | 12.08 | 250% | 2600% | +| 100 | 121730 | 13.14 | 15.83 | 350% | 4200% | +| 200 | 131712 | 24.27 | 30.26 | 370% | 4500% | ### Write Only TiProxy test results: -| Threads | QPS | Avg latency(ms) | P95 latency (ms) | TiProxy CPU usage | TiDB overall CPU Usage | -| --- | --- | --- | --- | --- | --- | -| 100 | 67762 | 8.85 | 15.27 | 310% | 3200% | -| 300 | 81113 | 22.18 | 38.25 | 390% | 3900% | -| 500 | 79260 | 37.83 | 56.84 | 400% | 3800% | +| Threads | QPS | Avg latency (ms) | P95 latency (ms) | TiProxy CPU usage | TiDB overall CPU usage | +|---------|---------|------------------|------------------|-------------------|------------------------| +| 100 | 81957 | 7.32 | 10.27 | 290% | 3900% | +| 300 | 103040 | 17.45 | 31.37 | 330% | 4700% | +| 500 | 104869 | 28.59 | 52.89 | 340% | 4800% | HAProxy test results: -| Threads | QPS | Avg latency(ms) | P95 latency (ms) | HAProxy CPU usage | TiDB overall CPU Usage | -| --- | --- | --- | --- | --- | --- | -| 100 | 74501 | 8.05 | 12.30 | 220% | 3500% | -| 300 | 97942 | 18.36 | 31.94 | 280% | 4300% | -| 500 | 105352 | 28.44 | 49.21 | 300% | 4500% | +| Threads | QPS | Avg latency (ms) | P95 latency (ms) | HAProxy CPU usage | TiDB overall CPU usage | +|---------|---------|------------------|------------------|-------------------|------------------------| +| 100 | 81708 | 7.34 | 10.65 | 240% | 3700% | +| 300 | 106008 | 16.95 | 31.37 | 320% | 4800% | +| 500 | 122369 | 24.45 | 47.47 | 350% | 5300% | ### Read Write TiProxy test results: -| Threads | QPS | Avg latency(ms) | P95 latency (ms) | TiProxy CPU usage | TiDB overall CPU Usage | -| --- | --- | --- | --- | --- | --- | -| 50 | 60170 | 16.62 | 18.95 | 280% | 2700% | -| 100 | 81691 | 24.48 | 31.37 | 340% | 3600% | -| 200 | 88755 | 45.05 | 54.83 | 400% | 4000% | +| Threads | QPS | Avg latency (ms) | P95 latency (ms) | TiProxy CPU usage | TiDB overall CPU usage | +|---------|--------|------------------|------------------|-------------------|------------------------| +| 50 | 58571 | 17.07 | 19.65 | 250% | 2600% | +| 100 | 88432 | 22.60 | 29.19 | 330% | 3900% | +| 200 | 108758 | 36.73 | 51.94 | 380% | 4800% | HAProxy test results: -| Threads | QPS | Avg latency(ms) | P95 latency (ms) | HAProxy CPU usage | TiDB overall CPU Usage | -| --- | --- | --- | --- | --- | --- | -| 50 | 58151 | 17.19 | 20.37 | 240% | 2600% | -| 100 | 94123 | 21.24 | 26.68 | 370% | 4100% | -| 200 | 107423 | 37.21 | 45.79 | 400% | 4700% | +| Threads | QPS | Avg latency (ms) | P95 latency (ms) | HAProxy CPU usage | TiDB overall CPU usage | +|---------|---------|------------------|------------------|-------------------|------------------------| +| 50 | 61226 | 16.33 | 19.65 | 190% | 2800% | +| 100 | 96569 | 20.70 | 26.68 | 290% | 4100% | +| 200 | 120163 | 31.28 | 49.21 | 340% | 5200% | ## Result set test @@ -202,21 +203,21 @@ sysbench oltp_read_only \ TiProxy test results: -| Range Size | QPS | Avg latency(ms) | P95 latency (ms) | TiProxy CPU usage | TiDB overall CPU Usage | Inbound Network (MiB/s) | Outbound Network (MiB/s) | -| --- | --- | --- | --- | --- | --- | --- | --- | -| 10 | 92100 | 1.09 | 1.34 | 330% | 3700% | 150 | 150 | -| 100 | 57931 | 1.73 | 2.30 | 370% | 2800% | 840 | 840 | -| 1000 | 8249 | 12.12 | 18.95 | 250% | 1300% | 1140 | 1140 | -| 10000 | 826 | 120.77 | 363.18 | 230% | 600% | 1140 | 1140 | +| Range size | QPS | Avg latency (ms) | P95 latency (ms) | TiProxy CPU usage | TiDB overall CPU usage | Inbound network (MiB/s) | Outbound network (MiB/s) | +|------------|---------|------------------|------------------|-------------------|------------------------|-------------------------|--------------------------| +| 10 | 80157 | 1.25 | 1.61 | 340% | 2600% | 140 | 140 | +| 100 | 55936 | 1.79 | 2.43 | 370% | 2800% | 820 | 820 | +| 1000 | 10313 | 9.69 | 13.70 | 310% | 1500% | 1370 | 1370 | +| 10000 | 1064 | 93.88 | 142.39 | 250% | 600% | 1430 | 1430 | HAProxy test results: -| Range Size | QPS | Avg latency(ms) | P95 latency (ms) | HAProxy CPU usage | TiDB overall CPU Usage | Inbound Network (MiB/s) | Outbound Network (MiB/s) | -| --- | --- | --- | --- | --- | --- | --- | --- | -| 10 | 93202 | 1.07 | 1.30 | 330% | 3800% | 145 | 145 | -| 100 | 64348 | 1.55 | 1.86 | 350% | 3100% | 830 | 830 | -| 1000 | 8944 | 11.18 | 14.73 | 240% | 1400% | 1100 | 1100 | -| 10000 | 908 | 109.96 | 139.85 | 180% | 600% | 1130 | 1130 | +| Range size | QPS | Avg latency (ms) | P95 latency (ms) | HAProxy CPU usage | TiDB overall CPU usage | Inbound network (MiB/s) | Outbound network (MiB/s) | +|------------|--------|------------------|------------------|-------------------|------------------------|-------------------------|--------------------------| +| 10 | 94376 | 1.06 | 1.30 | 250% | 4000% | 150 | 150 | +| 100 | 70129 | 1.42 | 1.76 | 270% | 3300% | 890 | 890 | +| 1000 | 9501 | 11.18 | 14.73 | 240% | 1500% | 1180 | 1180 | +| 10000 | 955 | 104.61 | 320.17 | 180% | 1200% | 1200 | 1200 | ## Scalability test @@ -241,9 +242,73 @@ sysbench oltp_point_select \ ### Test results -| vCPU | Threads | QPS | Avg latency(ms) | P95 latency (ms) | TiProxy CPU usage | TiDB overall CPU Usage | -| --- | --- | --- | --- | --- | --- | --- | -| 2 | 40 | 58508 | 0.68 | 0.97 | 190% | 1200% | -| 4 | 80 | 104890 | 0.76 | 1.16 | 390% | 2000% | -| 6 | 120 | 155520 | 0.77 | 1.14 | 590% | 2900% | -| 8 | 160 | 202134 | 0.79 | 1.18 | 800% | 3900% | +| vCPU | Threads | QPS | Avg latency (ms) | P95 latency (ms) | TiProxy CPU usage | TiDB overall CPU usage | +|------|---------|---------|------------------|------------------|-------------------|------------------------| +| 2 | 40 | 58508 | 0.68 | 0.97 | 190% | 1200% | +| 4 | 80 | 104890 | 0.76 | 1.16 | 390% | 2000% | +| 6 | 120 | 155520 | 0.77 | 1.14 | 590% | 2900% | +| 8 | 160 | 202134 | 0.79 | 1.18 | 800% | 3900% | + +## Long connection test + +### Test plan + +This test aims to verify that a large number of idle connections have minimal impact on the QPS when the client uses long connections. This test creates 5000, 10000, and 15000 idle long connections, and then executes `sysbench`. + +This test uses the default value for the `conn-buffer-size` configuration: + +```yaml +proxy.conn-buffer-size: 32768 +``` + +Use the following command to perform the test: + +```bash +sysbench oltp_point_select \ + --threads=50 \ + --time=1200 \ + --report-interval=10 \ + --rand-type=uniform \ + --db-driver=mysql \ + --mysql-db=sbtest \ + --mysql-host=$host \ + --mysql-port=$port \ + run --tables=32 --table-size=1000000 +``` + +### Test results + +| Connection count | QPS | Avg latency (ms) | P95 latency (ms) | TiProxy CPU usage | TiProxy memory usage (MB) | TiDB overall CPU usage | +|------------------|-------|------------------|------------------|-------------------|---------------------------|------------------------| +| 5000 | 96620 | 0.52 | 0.64 | 330% | 920 | 1800% | +| 10000 | 96143 | 0.52 | 0.65 | 330% | 1710 | 1800% | +| 15000 | 96048 | 0.52 | 0.65 | 330% | 2570 | 1900% | + +## Short connection test + +### Test plan + +This test aims to verify that frequent creation and destruction of connections have minimal impact on the QPS when the client uses short connections. This test starts another client program to create and disconnect 100, 200, and 300 short connections per second while executing `sysbench`. + +Use the following command to perform the test: + +```bash +sysbench oltp_point_select \ + --threads=50 \ + --time=1200 \ + --report-interval=10 \ + --rand-type=uniform \ + --db-driver=mysql \ + --mysql-db=sbtest \ + --mysql-host=$host \ + --mysql-port=$port \ + run --tables=32 --table-size=1000000 +``` + +### Test results + +| New connections per second | QPS | Avg latency (ms) | P95 latency (ms) | TiProxy CPU usage | TiDB overall CPU usage | +|----------------------------|--------|------------------|------------------|-------------------|------------------------| +| 100 | 95597 | 0.52 | 0.65 | 330% | 1800% | +| 200 | 94692 | 0.53 | 0.67 | 330% | 1800% | +| 300 | 94102 | 0.53 | 0.68 | 330% | 1900% | diff --git a/tiup/tiup-playground.md b/tiup/tiup-playground.md index 4ae07c8131ffc..314c07bde839b 100644 --- a/tiup/tiup-playground.md +++ b/tiup/tiup-playground.md @@ -172,3 +172,16 @@ You can specify a `pid` in the `tiup playground scale-in` command to scale in th ```shell tiup playground scale-in --pid 86526 ``` + +## Deploy PD microservices + +Starting from v8.0.0, PD supports the [microservice mode](/pd-microservices.md) (experimental). You can deploy the `tso` microservice and `scheduling` microservice for your cluster using TiUP Playground as follows: + +```shell +./tiup-playground v8.0.0 --pd.mode ms --pd.api 3 --pd.tso 2 --pd.scheduling 3 +``` + +- `--pd.mode`: setting it to `ms` means enabling the microservice mode for PD. +- `--pd.api num`: specifies the number of APIs for PD microservices. It must be at least `1`. +- `--pd.tso num`: specifies the number of instances to be deployed for the `tso` microservice. +- `--pd.scheduling num`: specifies the number of instances to be deployed for the `scheduling` microservice. \ No newline at end of file diff --git a/troubleshoot-lock-conflicts.md b/troubleshoot-lock-conflicts.md index 7e6a978d0ca74..fc1399896168c 100644 --- a/troubleshoot-lock-conflicts.md +++ b/troubleshoot-lock-conflicts.md @@ -86,13 +86,14 @@ For example, to filter transactions with a long lock-waiting time using the `whe {{< copyable "sql" >}} ```sql -select trx.* from information_schema.data_lock_waits as l left join information_schema.tidb_trx as trx on l.trx_id = trx.id where l.key = "7480000000000000415F728000000000000001"\G +select trx.* from information_schema.data_lock_waits as l left join information_schema.cluster_tidb_trx as trx on l.trx_id = trx.id where l.key = "7480000000000000415F728000000000000001"\G ``` The following is an example output: ```sql *************************** 1. row *************************** + INSTANCE: 127.0.0.1:10080 ID: 426831815660273668 START_TIME: 2021-08-06 07:16:00.081000 CURRENT_SQL_DIGEST: 06da614b93e62713bd282d4685fc5b88d688337f36e88fe55871726ce0eb80d7 @@ -106,6 +107,7 @@ CURRENT_SQL_DIGEST_TEXT: update `t` set `v` = `v` + ? where `id` = ? ; DB: test ALL_SQL_DIGESTS: ["0fdc781f19da1c6078c9de7eadef8a307889c001e05f107847bee4cfc8f3cdf3","06da614b93e62713bd282d4685fc5b88d688337f36e88fe55871726ce0eb80d7"] *************************** 2. row *************************** + INSTANCE: 127.0.0.1:10080 ID: 426831818019569665 START_TIME: 2021-08-06 07:16:09.081000 CURRENT_SQL_DIGEST: 06da614b93e62713bd282d4685fc5b88d688337f36e88fe55871726ce0eb80d7 diff --git a/tune-tikv-thread-performance.md b/tune-tikv-thread-performance.md index 43a6b4ef98fc1..36dfc3dd5ce5a 100644 --- a/tune-tikv-thread-performance.md +++ b/tune-tikv-thread-performance.md @@ -61,7 +61,7 @@ Starting from TiKV v5.0, all read requests use the unified thread pool for queri * The Raftstore thread pool. - The Raftstore thread pool is the most complex thread pool in TiKV. The default size (configured by `raftstore.store-pool-size`) of this thread pool is `2`. For the StoreWriter thread pool, the default size (configured by `raftstore.store-io-pool-size`) is `0`. + The Raftstore thread pool is the most complex thread pool in TiKV. The default size (configured by `raftstore.store-pool-size`) of this thread pool is `2`. For the StoreWriter thread pool, the default size (configured by `raftstore.store-io-pool-size`) is `1`. - When the size of the StoreWriter thread pool is 0, all write requests are written into RocksDB in the way of `fsync` by the Raftstore thread. In this case, it is recommended to tune the performance as follows: diff --git a/upgrade-tidb-using-tiup.md b/upgrade-tidb-using-tiup.md index 9b0ce992c17e7..7bdb7fbce7205 100644 --- a/upgrade-tidb-using-tiup.md +++ b/upgrade-tidb-using-tiup.md @@ -26,6 +26,31 @@ This document is targeted for the following upgrade paths: > - If your cluster to be upgraded is v3.1 or an earlier version (v3.0 or v2.1), the direct upgrade to v7.6.0 is not supported. You need to upgrade your cluster first to v4.0 and then to v7.6.0. > - If your cluster to be upgraded is earlier than v6.2, the upgrade might get stuck when you upgrade the cluster to v6.2 or later versions in some scenarios. You can refer to [How to fix the issue](#how-to-fix-the-issue-that-the-upgrade-gets-stuck-when-upgrading-to-v620-or-later-versions). > - TiDB nodes use the value of the [`server-version`](/tidb-configuration-file.md#server-version) configuration item to verify the current TiDB version. Therefore, to avoid unexpected behaviors, before upgrading the TiDB cluster, you need to set the value of `server-version` to empty or the real version of the current TiDB cluster. +> - Setting the [`performance.force-init-stats`](/tidb-configuration-file.md#force-init-stats-new-in-v657-and-v710) configuration item to `ON` prolongs the TiDB startup time, which might cause startup timeouts and upgrade failures. To avoid this issue, it is recommended to set a longer waiting timeout for TiUP. +> - Scenarios that might be affected: +> - The original cluster version is earlier than v6.5.7 and v7.1.0 (which does not support `performance.force-init-stats` yet), and the target version is v7.2.0 or later. +> - The original cluster version is equal to or later than v6.5.7 and v7.1.0, and the `performance.force-init-stats` configuration item is set to `ON`. +> +> - Check the value of the `performance.force-init-stats` configuration item: +> +> ``` +> SHOW CONFIG WHERE type = 'tidb' AND name = 'performance.force-init-stats'; +> ``` +> +> - You can increase the TiUP waiting timeout by adding the command-line option [`--wait-timeout`](/tiup/tiup-component-cluster.md#--wait-timeout). For example, execute the following command to set the waiting timeout to 1200 seconds (20 minutes). +> +> ```shell +> tiup update cluster --wait-timeout 1200 [other options] +> ``` +> +> Generally, a 20-minute waiting timeout is sufficient for most scenarios. For a more precise estimate, search for `init stats info time` in the TiDB log to get the statistics loading time during the previous startup as a reference. For example: +> +> ``` +> [domain.go:2271] ["init stats info time"] [lite=true] ["take time"=2.151333ms] +> ``` +> +> If the original cluster is v7.1.0 or earlier, when upgrading to v7.2.0 or later, because of the introduction of [`performance.lite-init-stats`](/tidb-configuration-file.md#lite-init-stats-new-in-v710), the statistics loading time is greatly reduced. In this case, the `init stats info time` before the upgrade is longer than the loading time after the upgrade. +> - If you want to shorten the rolling upgrade duration of TiDB and the potential performance impact of missing initial statistical information during the upgrade is acceptable for your cluster, you can set `performance.force-init-stats` to `OFF` before the upgrade by [modifying the configuration of the target instance with TiUP](/maintain-tidb-using-tiup.md#modify-the-configuration). After the upgrade is completed, you can reassess and revert this setting if necessary. ## Upgrade caveat