diff --git a/TOC-tidb-cloud.md b/TOC-tidb-cloud.md index f2e11684d32fa..c3b69b7a58aac 100644 --- a/TOC-tidb-cloud.md +++ b/TOC-tidb-cloud.md @@ -309,6 +309,9 @@ - [TiDB Serverless Limitations](/tidb-cloud/serverless-limitations.md) - [Limited SQL Features on TiDB Cloud](/tidb-cloud/limited-sql-features.md) - [TiDB Limitations](/tidb-limitations.md) + - TiDB Distributed eXecution Framework (DXF) + - [Introduction](/tidb-distributed-execution-framework.md) + - [TiDB Global Sort](/tidb-global-sort.md) - Benchmarks - [TPC-C Performance Test Report](/tidb-cloud/v7.1.0-performance-benchmarking-with-tpcc.md) - [Sysbench Performance Test Report](/tidb-cloud/v7.1.0-performance-benchmarking-with-sysbench.md) @@ -621,8 +624,6 @@ - [update](/tidb-cloud/ticloud-update.md) - [Table Filter](/table-filter.md) - [Resource Control](/tidb-resource-control.md) - - [TiDB Backend Task Distributed Execution Framework](/tidb-distributed-execution-framework.md) - - [TiDB Global Sort](/tidb-global-sort.md) - [DDL Execution Principles and Best Practices](/ddl-introduction.md) - [Troubleshoot Inconsistency Between Data and Indexes](/troubleshoot-data-inconsistency-errors.md) - [Support](/tidb-cloud/tidb-cloud-support.md) diff --git a/TOC.md b/TOC.md index 1bce896b31c43..671f3117e9fba 100644 --- a/TOC.md +++ b/TOC.md @@ -644,6 +644,9 @@ - [Data Validation](/tiflash/tiflash-data-validation.md) - [Compatibility](/tiflash/tiflash-compatibility.md) - [Pipeline Execution Model](/tiflash/tiflash-pipeline-model.md) + - TiDB Distributed eXecution Framework (DXF) + - [Introduction](/tidb-distributed-execution-framework.md) + - [TiDB Global Sort](/tidb-global-sort.md) - [System Variables](/system-variables.md) - [Server Status Variables](/status-variables.md) - Configuration File Parameters @@ -1000,9 +1003,6 @@ - [Table Filter](/table-filter.md) - [Schedule Replicas by Topology Labels](/schedule-replicas-by-topology-labels.md) - [URI Formats of External Storage Services](/external-storage-uri.md) - - Internal Components - - [TiDB Backend Task Distributed Execution Framework](/tidb-distributed-execution-framework.md) - - [TiDB Global Sort](/tidb-global-sort.md) - FAQs - [FAQ Summary](/faq/faq-overview.md) - [TiDB FAQs](/faq/tidb-faq.md) diff --git a/basic-features.md b/basic-features.md index a074d3554220b..4b405e6a3938f 100644 --- a/basic-features.md +++ b/basic-features.md @@ -250,7 +250,7 @@ You can try out TiDB features on [TiDB Playground](https://play.tidbcloud.com/?u | [Runaway Queries management](/tidb-resource-control.md#manage-queries-that-consume-more-resources-than-expected-runaway-queries) | E | N | N | N | N | N | N | N | N | N | | [Background tasks management](/tidb-resource-control.md#manage-background-tasks) | E | N | N | N | N | N | N | N | N | N | | [TiFlash Disaggregated Storage and Compute Architecture and S3 Support](/tiflash/tiflash-disaggregated-and-s3.md) | Y | E | N | N | N | N | N | N | N | N | -| [Selecting TiDB nodes for the distributed framework tasks](/system-variables.md#tidb_service_scope-new-in-v740) | Y | N | N | N | N | N | N | N | N | N | +| [Selecting TiDB nodes for the Distributed eXecution Framework (DXF) tasks](/system-variables.md#tidb_service_scope-new-in-v740) | Y | N | N | N | N | N | N | N | N | N | [^1]: TiDB incorrectly treats latin1 as a subset of utf8. See [TiDB #18955](https://github.com/pingcap/tidb/issues/18955) for more details. diff --git a/external-storage-uri.md b/external-storage-uri.md index b9ddf28d1c007..dfd7202e17db5 100644 --- a/external-storage-uri.md +++ b/external-storage-uri.md @@ -41,7 +41,7 @@ s3://external/testfolder?access-key=${access-key}&secret-access-key=${secret-acc The following is an example of an Amazon S3 URI for [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md). In this example, you need to specify a specific filename `test.csv`. ```shell -s3://external/test.csv?access-key=${access-key}&secret-access-key=${secret-access-key}" +s3://external/test.csv?access-key=${access-key}&secret-access-key=${secret-access-key} ``` ## GCS URI format diff --git a/releases/release-7.1.0.md b/releases/release-7.1.0.md index 846d74d09fd6b..99a390ac552b4 100644 --- a/releases/release-7.1.0.md +++ b/releases/release-7.1.0.md @@ -128,11 +128,11 @@ Compared with the previous LTS 6.5.0, 7.1.0 not only includes new features, impr For more information, see [documentation](/sql-non-prepared-plan-cache.md). -* Support the DDL distributed parallel execution framework (experimental) [#41495](https://github.com/pingcap/tidb/issues/41495) @[benjamin2037](https://github.com/benjamin2037) +* Support the TiDB Distributed eXecution Framework (DXF) (experimental) [#41495](https://github.com/pingcap/tidb/issues/41495) @[benjamin2037](https://github.com/benjamin2037) - Before TiDB v7.1.0, only one TiDB node can serve as the DDL owner and execute DDL tasks at the same time. Starting from TiDB v7.1.0, in the new distributed parallel execution framework, multiple TiDB nodes can execute the same DDL task in parallel, thus better utilizing the resources of the TiDB cluster and significantly improving the performance of DDL. In addition, you can linearly improve the performance of DDL by adding more TiDB nodes. Note that this feature is currently experimental and only supports `ADD INDEX` operations. + Before TiDB v7.1.0, only one TiDB node can serve as the DDL owner and execute DDL tasks at the same time. Starting from TiDB v7.1.0, in the new DXF, multiple TiDB nodes can execute the same DDL task in parallel, thus better utilizing the resources of the TiDB cluster and significantly improving the performance of DDL. In addition, you can linearly improve the performance of DDL by adding more TiDB nodes. Note that this feature is currently experimental and only supports `ADD INDEX` operations. - To use the distributed framework, set the value of [`tidb_enable_dist_task`](/system-variables.md#tidb_enable_dist_task-new-in-v710) to `ON`: + To use the DXF, set the value of [`tidb_enable_dist_task`](/system-variables.md#tidb_enable_dist_task-new-in-v710) to `ON`: ```sql SET GLOBAL tidb_enable_dist_task = ON; @@ -332,7 +332,7 @@ Compared with the previous LTS 6.5.0, 7.1.0 not only includes new features, impr | [`authentication_ldap_simple_server_host`](/system-variables.md#authentication_ldap_simple_server_host-new-in-v710) | Newly added | Specifies the LDAP server host in LDAP simple authentication. | | [`authentication_ldap_simple_server_port`](/system-variables.md#authentication_ldap_simple_server_port-new-in-v710) | Newly added | Specifies the LDAP server TCP/IP port number in LDAP simple authentication. | | [`authentication_ldap_simple_tls`](/system-variables.md#authentication_ldap_simple_tls-new-in-v710) | Newly added | Specifies whether connections by the plugin to the LDAP server are protected with StartTLS in LDAP simple authentication. | -| [`tidb_enable_dist_task`](/system-variables.md#tidb_enable_dist_task-new-in-v710) | Newly added | Controls whether to enable the distributed execution framework. After enabling distributed execution, DDL, import, and other supported backend tasks will be jointly completed by multiple TiDB nodes in the cluster. This variable was renamed from `tidb_ddl_distribute_reorg`. | +| [`tidb_enable_dist_task`](/system-variables.md#tidb_enable_dist_task-new-in-v710) | Newly added | Controls whether to enable the Distributed eXecution Framework (DXF). After enabling the DXF, DDL, import, and other supported DXF tasks will be jointly completed by multiple TiDB nodes in the cluster. This variable was renamed from `tidb_ddl_distribute_reorg`. | | [`tidb_enable_non_prepared_plan_cache_for_dml`](/system-variables.md#tidb_enable_non_prepared_plan_cache_for_dml-new-in-v710) | Newly added | Controls whether to enable the [Non-prepared plan cache](/sql-non-prepared-plan-cache.md) feature for DML statements. | | [`tidb_enable_row_level_checksum`](/system-variables.md#tidb_enable_row_level_checksum-new-in-v710) | Newly added | Controls whether to enable the TiCDC data integrity validation for single-row data feature.| | [`tidb_opt_fix_control`](/system-variables.md#tidb_opt_fix_control-new-in-v710) | Newly added | This variable provides more fine-grained control over the optimizer and helps to prevent performance regression after upgrading caused by behavior changes in the optimizer. | diff --git a/releases/release-7.1.1.md b/releases/release-7.1.1.md index ad17bb1d0f5bc..c9322534f4ed6 100644 --- a/releases/release-7.1.1.md +++ b/releases/release-7.1.1.md @@ -73,7 +73,7 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v7.1/quick-start-with- - Fix the issue that the cluster upgrade fails when there are paused DDL operations before the upgrade [#44225](https://github.com/pingcap/tidb/issues/44225) @[zimulala](https://github.com/zimulala) - Fix the `duplicate entry` error that occurs when restoring a table with `AUTO_ID_CACHE=1` using BR [#44716](https://github.com/pingcap/tidb/issues/44716) @[tiancaiamao](https://github.com/tiancaiamao) - Fix the data index inconsistency issue triggered by multiple switches of DDL owner [#44619](https://github.com/pingcap/tidb/issues/44619) @[tangenta](https://github.com/tangenta) - - Fix the issue that canceling an `ADD INDEX` DDL task in the `none` status might cause memory leak because this task is not removed from the backend task queue [#44205](https://github.com/pingcap/tidb/issues/44205) @[tangenta](https://github.com/tangenta) + - Fix the issue that canceling an `ADD INDEX` DDL task in the `none` status might cause memory leak because this task is not removed from the Distributed eXecution Framework (DXF) task queue [#44205](https://github.com/pingcap/tidb/issues/44205) @[tangenta](https://github.com/tangenta) - Fix the issue that the proxy protocol reports the `Header read timeout` error when processing certain erroneous data [#43205](https://github.com/pingcap/tidb/issues/43205) @[blacktear23](https://github.com/blacktear23) - Fix the issue that PD isolation might block the running DDL [#44267](https://github.com/pingcap/tidb/issues/44267) @[wjhuang2016](https://github.com/wjhuang2016) - Fix the issue that the query result of the `SELECT CAST(n AS CHAR)` statement is incorrect when `n` in the statement is a negative number [#44786](https://github.com/pingcap/tidb/issues/44786) @[xhebox](https://github.com/xhebox) diff --git a/releases/release-7.2.0.md b/releases/release-7.2.0.md index 7dad79e5729a1..6cb8e4000c3b5 100644 --- a/releases/release-7.2.0.md +++ b/releases/release-7.2.0.md @@ -154,7 +154,7 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v7.2/quick-start-with- The `IMPORT INTO` statement integrates the [Physical Import Mode](/tidb-lightning/tidb-lightning-physical-import-mode.md) capability of TiDB Lightning. With this statement, you can quickly import data in formats such as CSV, SQL, and PARQUET into an empty table in TiDB. This import method eliminates the need for a separate deployment and management of TiDB Lightning, thereby reducing the complexity of data import and greatly improving import efficiency. - For data files stored in Amazon S3 or GCS, when the [Backend task distributed execution framework](/tidb-distributed-execution-framework.md) is enabled, `IMPORT INTO` also supports splitting a data import job into multiple sub-jobs and scheduling them to multiple TiDB nodes for parallel import, which further enhances import performance. + For data files stored in Amazon S3 or GCS, when the [TiDB Distributed eXecution Framework (DXF)](/tidb-distributed-execution-framework.md) is enabled, `IMPORT INTO` also supports splitting a data import job into multiple sub-jobs and scheduling them to multiple TiDB nodes for parallel import, which further enhances import performance. For more information, see [documentation](/sql-statements/sql-statement-import-into.md). diff --git a/releases/release-7.4.0.md b/releases/release-7.4.0.md index 8c11ec1d0bf6f..c9bc9287b1f1e 100644 --- a/releases/release-7.4.0.md +++ b/releases/release-7.4.0.md @@ -25,7 +25,7 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v7.4/quick-start-with- Reliability and Availability Improve the performance and stability of IMPORT INTO and ADD INDEX operations via global sort (experimental) - Before v7.4.0, tasks such as ADD INDEX or IMPORT INTO using the distributed execution framework meant localized and partial sorting, which ultimately led to TiKV doing a lot of extra work to make up for the partial sorting. These jobs also required TiDB nodes to allocate local disk space for sorting, before loading to TiKV.
With the introduction of the Global Sort feature in v7.4.0, data is temporarily stored in external shared storage (S3 in this version) for global sorting before being loaded into TiKV. This eliminates the need for TiKV to consume extra resources and significantly improves the performance and stability of operations like ADD INDEX and IMPORT INTO. + Before v7.4.0, tasks such as ADD INDEX or IMPORT INTO using the TiDB Distributed eXecution Framework (DXF) meant localized and partial sorting, which ultimately led to TiKV doing a lot of extra work to make up for the partial sorting. These jobs also required TiDB nodes to allocate local disk space for sorting, before loading to TiKV.
With the introduction of the Global Sort feature in v7.4.0, data is temporarily stored in external shared storage (S3 in this version) for global sorting before being loaded into TiKV. This eliminates the need for TiKV to consume extra resources and significantly improves the performance and stability of operations like ADD INDEX and IMPORT INTO. Resource control for background tasks (experimental) @@ -68,9 +68,9 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v7.4/quick-start-with- ### Scalability -* Support selecting the TiDB nodes to parallelly execute the backend `ADD INDEX` or `IMPORT INTO` tasks of the distributed execution framework (experimental) [#46453](https://github.com/pingcap/tidb/pull/46453) @[ywqzzy](https://github.com/ywqzzy) +* Support selecting the TiDB nodes to parallelly execute the backend `ADD INDEX` or `IMPORT INTO` tasks of the Distributed eXecution Framework (DXF) (experimental) [#46453](https://github.com/pingcap/tidb/pull/46453) @[ywqzzy](https://github.com/ywqzzy) - Executing `ADD INDEX` or `IMPORT INTO` tasks in parallel in a resource-intensive cluster can consume a large amount of TiDB node resources, which can lead to cluster performance degradation. Starting from v7.4.0, you can use the system variable [`tidb_service_scope`](/system-variables.md#tidb_service_scope-new-in-v740) to control the service scope of each TiDB node under the [TiDB Backend Task Distributed Execution Framework](/tidb-distributed-execution-framework.md). You can select several existing TiDB nodes or set the TiDB service scope for new TiDB nodes, and all parallel `ADD INDEX` and `IMPORT INTO` tasks only run on these nodes. This mechanism can avoid performance impact on existing services. + Executing `ADD INDEX` or `IMPORT INTO` tasks in parallel in a resource-intensive cluster can consume a large amount of TiDB node resources, which can lead to cluster performance degradation. Starting from v7.4.0, you can use the system variable [`tidb_service_scope`](/system-variables.md#tidb_service_scope-new-in-v740) to control the service scope of each TiDB node under the [TiDB Distributed eXecution Framework (DXF)](/tidb-distributed-execution-framework.md). You can select several existing TiDB nodes or set the TiDB service scope for new TiDB nodes, and all parallel `ADD INDEX` and `IMPORT INTO` tasks only run on these nodes. This mechanism can avoid performance impact on existing services. For more information, see [documentation](/system-variables.md#tidb_service_scope-new-in-v740). @@ -106,7 +106,7 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v7.4/quick-start-with- * Introduce cloud storage-based global sort capability to improve the performance and stability of `ADD INDEX` and `IMPORT INTO` tasks in parallel execution (experimental) [#45719](https://github.com/pingcap/tidb/issues/45719) @[wjhuang2016](https://github.com/wjhuang2016) - Before v7.4.0, when executing tasks like `ADD INDEX` or `IMPORT INTO` in the distributed parallel execution framework, each TiDB node needs to allocate a significant amount of local disk space for sorting encoded index KV pairs and table data KV pairs. However, due to the lack of global sorting capability, there might be overlapping data between different TiDB nodes and within each individual node during the process. As a result, TiKV has to constantly perform compaction operations while importing these KV pairs into its storage engine, which impacts the performance and stability of `ADD INDEX` and `IMPORT INTO`. + Before v7.4.0, when executing tasks like `ADD INDEX` or `IMPORT INTO` in the Distributed eXecution Framework (DXF), each TiDB node needs to allocate a significant amount of local disk space for sorting encoded index KV pairs and table data KV pairs. However, due to the lack of global sorting capability, there might be overlapping data between different TiDB nodes and within each individual node during the process. As a result, TiKV has to constantly perform compaction operations while importing these KV pairs into its storage engine, which impacts the performance and stability of `ADD INDEX` and `IMPORT INTO`. In v7.4.0, TiDB introduces the [Global Sort](/tidb-global-sort.md) feature. Instead of writing the encoded data locally and sorting it there, the data is now written to cloud storage for global sorting. Once sorted, both the indexed data and table data are imported into TiKV in parallel, thereby improving performance and stability. @@ -298,7 +298,7 @@ Quick access: [Quick start](https://docs.pingcap.com/tidb/v7.4/quick-start-with- | [`tidb_opt_objective`](/system-variables.md#tidb_opt_objective-new-in-v740) | Newly added | This variable controls the objective of the optimizer. `moderate` maintains the default behavior in versions prior to TiDB v7.4.0, where the optimizer tries to use more information to generate better execution plans. `determinate` mode tends to be more conservative and makes the execution plan more stable. | | [`tidb_request_source_type`](/system-variables.md#tidb_request_source_type-new-in-v740) | Newly added | Explicitly specifies the task type for the current session, which is identified and controlled by [Resource Control](/tidb-resource-control.md). For example: `SET @@tidb_request_source_type = "background"`. | | [`tidb_schema_version_cache_limit`](/system-variables.md#tidb_schema_version_cache_limit-new-in-v740) | Newly added | This variable limits how many historical schema versions can be cached in a TiDB instance. The default value is `16`, which means that TiDB caches 16 historical schema versions by default. | -| [`tidb_service_scope`](/system-variables.md#tidb_service_scope-new-in-v740) | Newly added | This variable is an instance-level system variable. You can use it to control the service scope of TiDB nodes under the [TiDB distributed execution framework](/tidb-distributed-execution-framework.md). When you set `tidb_service_scope` of a TiDB node to `background`, the TiDB distributed execution framework schedules that TiDB node to execute background tasks, such as [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) and [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md). | +| [`tidb_service_scope`](/system-variables.md#tidb_service_scope-new-in-v740) | Newly added | This variable is an instance-level system variable. You can use it to control the service scope of TiDB nodes under the [TiDB Distributed eXecution Framework (DXF)](/tidb-distributed-execution-framework.md). When you set `tidb_service_scope` of a TiDB node to `background`, the DXF schedules that TiDB node to execute DXF tasks, such as [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) and [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md). | | [`tidb_session_alias`](/system-variables.md#tidb_session_alias-new-in-v740) | Newly added | Controls the value of the `session_alias` column in the logs related to the current session. | | [`tiflash_mem_quota_query_per_node`](/system-variables.md#tiflash_mem_quota_query_per_node-new-in-v740) | Newly added | Limits the maximum memory usage for a query on a TiFlash node. When the memory usage of a query exceeds this limit, TiFlash returns an error and terminates the query. The default value is `0`, which means no limit.| | [`tiflash_query_spill_ratio`](/system-variables.md#tiflash_query_spill_ratio-new-in-v740) | Newly added | Controls the threshold for TiFlash [query-level spilling](/tiflash/tiflash-spill-disk.md#query-level-spilling). The default value is `0.7`. | diff --git a/releases/release-7.5.0.md b/releases/release-7.5.0.md index dcba72dfc1448..f9a618f4f2c3f 100644 --- a/releases/release-7.5.0.md +++ b/releases/release-7.5.0.md @@ -32,7 +32,7 @@ Compared with the previous LTS 7.1.0, 7.5.0 includes new features, improvements, Reliability and Availability Optimize Global Sort (experimental, introduced in v7.4.0) - TiDB v7.2.0 introduced the distributed execution framework. For tasks that take advantage of this framework, v7.4 introduces global sorting to eliminate the unnecessary I/O, CPU, and memory spikes caused by temporarily out-of-order data during data re-organization tasks. The global sorting takes advantage of external shared object storage (Amazon S3 in this first iteration) to store intermediary files during the job, adding flexibility and cost savings. Operations like ADD INDEX and IMPORT INTO will be faster, more resilient, more stable, more flexible, and cost less to run. + TiDB v7.1.0 introduced the Distributed eXecution Framework (DXF). For tasks that take advantage of this framework, v7.4 introduces global sorting to eliminate the unnecessary I/O, CPU, and memory spikes caused by temporarily out-of-order data during data re-organization tasks. The global sorting takes advantage of external shared object storage (Amazon S3 in this first iteration) to store intermediary files during the job, adding flexibility and cost savings. Operations like ADD INDEX and IMPORT INTO will be faster, more resilient, more stable, more flexible, and cost less to run. Resource control for background tasks (experimental, introduced in v7.4.0) @@ -50,7 +50,7 @@ Compared with the previous LTS 7.1.0, 7.5.0 includes new features, improvements, DB Operations and Observability TiDB Lightning's physical import mode integrated into TiDB with IMPORT INTO (GA) - Before v7.2.0, to import data based on the file system, you needed to install TiDB Lightning and used its physical import mode. Now, the same capability is integrated into the IMPORT INTO statement so you can use this statement to quickly import data without installing any additional tool. This statement also supports the distributed execution framework for parallel import, which improves import efficiency during large-scale imports. + Before v7.2.0, to import data based on the file system, you needed to install TiDB Lightning and used its physical import mode. Now, the same capability is integrated into the IMPORT INTO statement so you can use this statement to quickly import data without installing any additional tool. This statement also supports the Distributed eXecution Framework (DXF) for parallel import, which improves import efficiency during large-scale imports. Specify the respective TiDB nodes to execute the ADD INDEX and IMPORT INTO SQL statements (GA) @@ -71,19 +71,19 @@ Compared with the previous LTS 7.1.0, 7.5.0 includes new features, improvements, ### Scalability -* Support designating and isolating TiDB nodes to distributedly execute `ADD INDEX` or `IMPORT INTO` tasks when the distributed execution framework is enabled [#46258](https://github.com/pingcap/tidb/issues/46258) @[ywqzzy](https://github.com/ywqzzy) +* Support designating and isolating TiDB nodes to distributedly execute `ADD INDEX` or `IMPORT INTO` tasks when the Distributed eXecution Framework (DXF) is enabled [#46258](https://github.com/pingcap/tidb/issues/46258) @[ywqzzy](https://github.com/ywqzzy) - Executing `ADD INDEX` or `IMPORT INTO` tasks in parallel in a resource-intensive cluster can consume a large amount of TiDB node resources, which can lead to cluster performance degradation. To avoid performance impact on existing services, v7.4.0 introduces the system variable [`tidb_service_scope`](/system-variables.md#tidb_service_scope-new-in-v740) as an experimental feature to control the service scope of each TiDB node under the [TiDB backend task distributed execution framework](/tidb-distributed-execution-framework.md). You can select several existing TiDB nodes or set the TiDB service scope for new TiDB nodes, and all distributedly executed `ADD INDEX` and `IMPORT INTO` tasks only run on these nodes. In v7.5.0, this feature becomes generally available (GA). + Executing `ADD INDEX` or `IMPORT INTO` tasks in parallel in a resource-intensive cluster can consume a large amount of TiDB node resources, which can lead to cluster performance degradation. To avoid performance impact on existing services, v7.4.0 introduces the system variable [`tidb_service_scope`](/system-variables.md#tidb_service_scope-new-in-v740) as an experimental feature to control the service scope of each TiDB node under the [TiDB Distributed eXecution Framework (DXF)](/tidb-distributed-execution-framework.md). You can select several existing TiDB nodes or set the TiDB service scope for new TiDB nodes, and all distributedly executed `ADD INDEX` and `IMPORT INTO` tasks only run on these nodes. In v7.5.0, this feature becomes generally available (GA). For more information, see [documentation](/system-variables.md#tidb_service_scope-new-in-v740). ### Performance -* The TiDB backend task distributed execution framework becomes generally available (GA), improving the performance and stability of `ADD INDEX` and `IMPORT INTO` tasks in parallel execution [#45719](https://github.com/pingcap/tidb/issues/45719) @[wjhuang2016](https://github.com/wjhuang2016) +* The TiDB Distributed eXecution Framework (DXF) becomes generally available (GA), improving the performance and stability of `ADD INDEX` and `IMPORT INTO` tasks in parallel execution [#45719](https://github.com/pingcap/tidb/issues/45719) @[wjhuang2016](https://github.com/wjhuang2016) - The backend task distributed execution framework introduced in v6.6.0 has become GA. In versions before TiDB v7.1.0, only one TiDB node can execute DDL tasks at the same time. Starting from v7.1.0, multiple TiDB nodes can execute the same DDL task in parallel under the backend task distributed execution framework. Starting from v7.2.0, the backend task distributed execution framework supports multiple TiDB nodes to execute the same `IMPORT INTO` task in parallel, thereby better utilizing the resources of the TiDB cluster and significantly improving the performance of DDL and `IMPORT INTO` tasks. In addition, you can also increase TiDB nodes to linearly improve the performance of these tasks. + The DXF introduced in v7.1.0 has become GA. In versions before TiDB v7.1.0, only one TiDB node can execute DDL tasks at the same time. Starting from v7.1.0, multiple TiDB nodes can execute the same DDL task in parallel under the DXF. Starting from v7.2.0, the DXF supports multiple TiDB nodes to execute the same `IMPORT INTO` task in parallel, thereby better utilizing the resources of the TiDB cluster and significantly improving the performance of DDL and `IMPORT INTO` tasks. In addition, you can also increase TiDB nodes to linearly improve the performance of these tasks. - To use the backend task distributed execution framework, set [`tidb_enable_dist_task`](/system-variables.md#tidb_enable_dist_task-new-in-v710) value to `ON`. + To use the DXF, set [`tidb_enable_dist_task`](/system-variables.md#tidb_enable_dist_task-new-in-v710) value to `ON`. ```sql SET GLOBAL tidb_enable_dist_task = ON; diff --git a/sql-statements/sql-statement-import-into.md b/sql-statements/sql-statement-import-into.md index 3263af1e5d89c..c640a2b7b8351 100644 --- a/sql-statements/sql-statement-import-into.md +++ b/sql-statements/sql-statement-import-into.md @@ -13,10 +13,10 @@ The `IMPORT INTO` statement is used to import data in formats such as `CSV`, `SQ `IMPORT INTO` supports importing data from files stored in Amazon S3, GCS, and the TiDB local storage. -- For data files stored in Amazon S3, GCS, or Azure Blob Storage, `IMPORT INTO` supports running in the [TiDB backend task distributed execution framework](/tidb-distributed-execution-framework.md). +- For data files stored in Amazon S3, GCS, or Azure Blob Storage, `IMPORT INTO` supports running in the [TiDB Distributed eXecution Framework (DXF)](/tidb-distributed-execution-framework.md). - - When this framework is enabled ([tidb_enable_dist_task](/system-variables.md#tidb_enable_dist_task-new-in-v710) is `ON`), `IMPORT INTO` splits a data import job into multiple sub-jobs and distributes these sub-jobs to different TiDB nodes for execution to improve the import efficiency. - - When this framework is disabled, `IMPORT INTO` only supports running on the TiDB node where the current user is connected. + - When this DXF is enabled ([tidb_enable_dist_task](/system-variables.md#tidb_enable_dist_task-new-in-v710) is `ON`), `IMPORT INTO` splits a data import job into multiple sub-jobs and distributes these sub-jobs to different TiDB nodes for execution to improve the import efficiency. + - When this DXF is disabled, `IMPORT INTO` only supports running on the TiDB node where the current user is connected. - For data files stored locally in TiDB, `IMPORT INTO` only supports running on the TiDB node where the current user is connected. Therefore, the data files need to be placed on the TiDB node where the current user is connected. If you access TiDB through a proxy or load balancer, you cannot import data files stored locally in TiDB. @@ -165,7 +165,7 @@ In the following scenarios, there can be significant overlap in KV ranges: - `IMPORT INTO` splits sub-jobs based on the traversal order of data files, usually sorted by file name in lexicographic order. - If the target table has many indexes, or the index column values are scattered in the data file, the index KV generated by the encoding of each sub-job will also overlap. -When [Backend task distributed execution framework](/tidb-distributed-execution-framework.md) is enabled, you can enable [Global Sort](/tidb-global-sort.md) by specifying the `CLOUD_STORAGE_URI` option in the `IMPORT INTO` statement or by specifying the target storage address for encoded KV data using the system variable [`tidb_cloud_storage_uri`](/system-variables.md#tidb_cloud_storage_uri-new-in-v740). Note that currently, only S3 is supported as the Global Sort storage address. When Global Sort is enabled, `IMPORT INTO` writes encoded KV data to the cloud storage, performs Global Sort in the cloud storage, and then parallelly imports the globally sorted index and table data into TiKV. This prevents problems caused by KV overlap and enhances import stability. +When the [TiDB Distributed eXecution Framework (DXF)](/tidb-distributed-execution-framework.md) is enabled, you can enable [Global Sort](/tidb-global-sort.md) by specifying the `CLOUD_STORAGE_URI` option in the `IMPORT INTO` statement or by specifying the target storage address for encoded KV data using the system variable [`tidb_cloud_storage_uri`](/system-variables.md#tidb_cloud_storage_uri-new-in-v740). Note that currently, only S3 is supported as the Global Sort storage address. When Global Sort is enabled, `IMPORT INTO` writes encoded KV data to the cloud storage, performs Global Sort in the cloud storage, and then parallelly imports the globally sorted index and table data into TiKV. This prevents problems caused by KV overlap and enhances import stability. Global Sort consumes a significant amount of memory resources. Before the data import, it is recommended to configure the [`tidb_server_memory_limit_gc_trigger`](/system-variables.md#tidb_server_memory_limit_gc_trigger-new-in-v640) and [`tidb_server_memory_limit`](/system-variables.md#tidb_server_memory_limit-new-in-v640) variables, which avoids golang GC being frequently triggered and thus affecting the import efficiency. diff --git a/system-variables.md b/system-variables.md index c3f74b438dc60..7f28e6d780164 100644 --- a/system-variables.md +++ b/system-variables.md @@ -1570,9 +1570,9 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; - Persists to cluster: Yes - Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): No - Default value: `OFF` -- This variable is used to control whether to enable the [TiDB backend task distributed execution framework](/tidb-distributed-execution-framework.md). After the framework is enabled, backend tasks such as DDL and import will be distributedly executed and completed by multiple TiDB nodes in the cluster. -- Starting from TiDB v7.1.0, the framework supports distributedly executing the [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) statement for partitioned tables. -- Starting from TiDB v7.2.0, the framework supports distributedly executing the [`IMPORT INTO`](https://docs.pingcap.com/tidb/v7.2/sql-statement-import-into) statement for import jobs of TiDB Self-Hosted. For TiDB Cloud, the `IMPORT INTO` statement is not applicable. +- This variable is used to control whether to enable the [TiDB Distributed eXecution Framework (DXF)](/tidb-distributed-execution-framework.md). After the framework is enabled, the DXF tasks such as DDL and import will be distributedly executed and completed by multiple TiDB nodes in the cluster. +- Starting from TiDB v7.1.0, the DXF supports distributedly executing the [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) statement for partitioned tables. +- Starting from TiDB v7.2.0, the DXF supports distributedly executing the [`IMPORT INTO`](https://docs.pingcap.com/tidb/v7.2/sql-statement-import-into) statement for import jobs of TiDB Self-Hosted. For TiDB Cloud, the `IMPORT INTO` statement is not applicable. - This variable is renamed from `tidb_ddl_distribute_reorg`. ### tidb_cloud_storage_uri New in v7.4.0 @@ -1588,7 +1588,7 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; -- This variable is used to specify the Amazon S3 cloud storage URI to enable [Global Sort](/tidb-global-sort.md). After enabling the [distributed execution framework](/tidb-distributed-execution-framework.md), you can use the Global Sort feature by configuring the URI and pointing it to an appropriate cloud storage path with the necessary permissions to access the storage. For more details, see [Amazon S3 URI format](/external-storage-uri.md#amazon-s3-uri-format). +- This variable is used to specify the Amazon S3 cloud storage URI to enable [Global Sort](/tidb-global-sort.md). After enabling the [TiDB Distributed eXecution Framework (DXF)](/tidb-distributed-execution-framework.md), you can use the Global Sort feature by configuring the URI and pointing it to an appropriate cloud storage path with the necessary permissions to access the storage. For more details, see [Amazon S3 URI format](/external-storage-uri.md#amazon-s3-uri-format). - The following statements can use the Global Sort feature. - The [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) statement. - The [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md) statement for import jobs of TiDB Self-Hosted. For TiDB Cloud, the `IMPORT INTO` statement is not applicable. @@ -1596,7 +1596,7 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1; -- This variable is used to specify the cloud storage URI to enable [Global Sort](/tidb-global-sort.md). After enabling the [distributed execution framework](/tidb-distributed-execution-framework.md), you can use the Global Sort feature by configuring the URI and pointing it to an appropriate cloud storage path with the necessary permissions to access the storage. For more details, see [URI Formats of External Storage Services](https://docs.pingcap.com/tidb/stable/external-storage-uri). +- This variable is used to specify the cloud storage URI to enable [Global Sort](/tidb-global-sort.md). After enabling the [TiDB Distributed eXecution Framework (DXF)](/tidb-distributed-execution-framework.md), you can use the Global Sort feature by configuring the URI and pointing it to an appropriate cloud storage path with the necessary permissions to access the storage. For more details, see [URI Formats of External Storage Services](https://docs.pingcap.com/tidb/stable/external-storage-uri). - The following statements can use the Global Sort feature. - The [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) statement. - The [`IMPORT INTO`](https://docs.pingcap.com/tidb/v7.2/sql-statement-import-into) statement for import jobs of TiDB Self-Hosted. For TiDB Cloud, the `IMPORT INTO` statement is not applicable. @@ -4851,12 +4851,12 @@ SHOW WARNINGS; - Type: String - Default value: "" - Optional Value: "", `background` -- This variable is an instance-level system variable. You can use it to control the service scope of TiDB nodes under the [TiDB distributed execution framework](/tidb-distributed-execution-framework.md). When you set `tidb_service_scope` of a TiDB node to `background`, the TiDB distributed execution framework schedules that TiDB node to execute background tasks, such as [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) and [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md). +- This variable is an instance-level system variable. You can use it to control the service scope of TiDB nodes under the [TiDB Distributed eXecution Framework (DXF)](/tidb-distributed-execution-framework.md). When you set `tidb_service_scope` of a TiDB node to `background`, the DXF schedules that TiDB node to execute background tasks, such as [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) and [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md). > **Note:** > -> - If `tidb_service_scope` is not set for any TiDB node in a cluster, the TiDB distributed execution framework schedules all TiDB nodes to execute background tasks. If you are concerned about performance impact on current business, you can set `tidb_service_scope` to `background` for a few of the TiDB nodes. Only those nodes will execute background tasks. -> - For newly scaled nodes, the TiDB distributed execution framework tasks are not executed by default to avoid consuming the resources of the scaled node. If you want this scaled node to execute background tasks, you need to manually set `tidb_service_scop` of this node to `background`. +> - If `tidb_service_scope` is not set for any TiDB node in a cluster, the DXF schedules all TiDB nodes to execute background tasks. If you are concerned about performance impact on current business, you can set `tidb_service_scope` to `background` for a few of the TiDB nodes. Only those nodes will execute background tasks. +> - For newly scaled nodes, the DXF tasks are not executed by default to avoid consuming the resources of the scaled node. If you want this scaled node to execute background tasks, you need to manually set `tidb_service_scop` of this node to `background`. @@ -4868,12 +4868,12 @@ SHOW WARNINGS; - Type: String - Default value: "" - Optional Value: "", `background` -- This variable is an instance-level system variable. You can use it to control the service scope of TiDB nodes under the [TiDB distributed execution framework](/tidb-distributed-execution-framework.md). When you set `tidb_service_scope` of a TiDB node to `background`, the TiDB distributed execution framework schedules that TiDB node to execute background tasks, such as [`ADD INDEX`](/sql-statements/sql-statement-add-index.md). +- This variable is an instance-level system variable. You can use it to control the service scope of TiDB nodes under the [TiDB Distributed eXecution Framework (DXF)](/tidb-distributed-execution-framework.md). When you set `tidb_service_scope` of a TiDB node to `background`, the DXF schedules that TiDB node to execute background tasks, such as [`ADD INDEX`](/sql-statements/sql-statement-add-index.md). > **Note:** > -> - If `tidb_service_scope` is not set for any TiDB node in a cluster, the TiDB distributed execution framework schedules all TiDB nodes to execute background tasks. If you are concerned about performance impact on current business, you can set `tidb_service_scope` to `background` for a few of the TiDB nodes. Only those nodes will execute background tasks. -> - For newly scaled nodes, the TiDB distributed execution framework tasks are not executed by default to avoid consuming the resources of the scaled node. If you want this scaled node to execute background tasks, you need to manually set `tidb_service_scop` of this node to `background`. +> - If `tidb_service_scope` is not set for any TiDB node in a cluster, the DXF schedules all TiDB nodes to execute background tasks. If you are concerned about performance impact on current business, you can set `tidb_service_scope` to `background` for a few of the TiDB nodes. Only those nodes will execute background tasks. +> - For newly scaled nodes, the DXF tasks are not executed by default to avoid consuming the resources of the scaled node. If you want this scaled node to execute background tasks, you need to manually set `tidb_service_scop` of this node to `background`. diff --git a/tidb-distributed-execution-framework.md b/tidb-distributed-execution-framework.md index a2120cc0670be..1ed889e71eec6 100644 --- a/tidb-distributed-execution-framework.md +++ b/tidb-distributed-execution-framework.md @@ -1,9 +1,9 @@ --- -title: TiDB Backend Task Distributed Execution Framework -summary: Learn the use cases, limitations, usage, and implementation principles of the TiDB backend task distributed execution framework. +title: TiDB Distributed eXecution Framework (DXF) +summary: Learn the use cases, limitations, usage, and implementation principles of the TiDB Distributed eXecution Framework (DXF). --- -# TiDB Backend Task Distributed Execution Framework +# TiDB Distributed eXecution Framework (DXF) @@ -13,25 +13,21 @@ summary: Learn the use cases, limitations, usage, and implementation principles -TiDB adopts a computing-storage separation architecture with excellent scalability and elasticity. Starting from v7.1.0, TiDB introduces a backend task distributed execution framework to further leverage the resource advantages of the distributed architecture. The goal of this framework is to implement unified scheduling and distributed execution of all backend tasks, and to provide unified resource management capabilities for both overall and individual backend tasks, which better meets users' expectations for resource usage. +TiDB adopts a computing-storage separation architecture with excellent scalability and elasticity. Starting from v7.1.0, TiDB introduces a Distributed eXecution Framework (DXF) to further leverage the resource advantages of the distributed architecture. The goal of the DXF is to implement unified scheduling and distributed execution of tasks, and to provide unified resource management capabilities for both overall and individual tasks, which better meets users' expectations for resource usage. -This document describes the use cases, limitations, usage, and implementation principles of the TiDB backend task distributed execution framework. +This document describes the use cases, limitations, usage, and implementation principles of the DXF. -> **Note:** -> -> This framework does not support the distributed execution of SQL queries. - -## Use cases and limitations +## Use cases -In a database management system, in addition to the core transactional processing (TP) and analytical processing (AP) workloads, there are other important tasks, such as DDL operations, IMPORT INTO, TTL, Analyze, and Backup/Restore, which are called **backend tasks**. These backend tasks need to process a large amount of data in database objects (tables), so they typically have the following characteristics: +In a database management system, in addition to the core transactional processing (TP) and analytical processing (AP) workloads, there are other important tasks, such as DDL operations, IMPORT INTO, TTL, Analyze, and Backup/Restore. These tasks need to process a large amount of data in database objects (tables), so they typically have the following characteristics: -In a database management system, in addition to the core transactional processing (TP) and analytical processing (AP) workloads, there are other important tasks, such as DDL operations, TTL, Analyze, and Backup/Restore, which are called **backend tasks**. These backend tasks need to process a large amount of data in database objects (tables), so they typically have the following characteristics: +In a database management system, in addition to the core transactional processing (TP) and analytical processing (AP) workloads, there are other important tasks, such as DDL operations, TTL, Analyze, and Backup/Restore. These tasks need to process a large amount of data in database objects (tables), so they typically have the following characteristics: @@ -39,13 +35,26 @@ In a database management system, in addition to the core transactional processin - Might need to be executed periodically, but at a low frequency. - If the resources are not properly controlled, they are prone to affect TP and AP tasks, lowering the database service quality. -Enabling the TiDB backend task distributed execution framework can solve the above problems and has the following three advantages: +Enabling the DXF can solve the above problems and has the following three advantages: - The framework provides unified capabilities for high scalability, high availability, and high performance. -- The framework supports distributed execution of backend tasks, which can flexibly schedule the available computing resources of the entire TiDB cluster, thereby better utilizing the computing resources in a TiDB cluster. -- The framework provides unified resource usage and management capabilities for both overall and individual backend tasks. +- The DXF supports distributed execution of tasks, which can flexibly schedule the available computing resources of the entire TiDB cluster, thereby better utilizing the computing resources in a TiDB cluster. +- The DXF provides unified resource usage and management capabilities for both overall and individual tasks. + + + +Currently, the DXF supports the distributed execution of the `ADD INDEX`. `ADD INDEX` is a DDL statement used to create indexes. For example: + +```sql +ALTER TABLE t1 ADD INDEX idx1(c1); +CREATE INDEX idx1 ON table t1(c1); +``` + + -Currently, for TiDB Self-Hosted, the TiDB backend task distributed execution framework supports the distributed execution of the `ADD INDEX` and `IMPORT INTO` statements. For TiDB Cloud, the `IMPORT INTO` statement is not applicable. + + +Currently, for TiDB Self-Hosted, the DXF supports the distributed execution of the `ADD INDEX` and `IMPORT INTO` statements. - `ADD INDEX` is a DDL statement used to create indexes. For example: @@ -56,9 +65,15 @@ Currently, for TiDB Self-Hosted, the TiDB backend task distributed execution fra - `IMPORT INTO` is used to import data in formats such as `CSV`, `SQL`, and `PARQUET` into an empty table. For more information, see [`IMPORT INTO`](https://docs.pingcap.com/tidb/v7.2/sql-statement-import-into). + + +## Limitation + +The DXF can only schedule the distributed execution of one `ADD INDEX` task at a time. If a new `ADD INDEX` task is submitted before the current `ADD INDEX` distributed task has finished, the new task is executed through a transaction. + ## Prerequisites -Before using the distributed framework, you need to enable the [Fast Online DDL](/system-variables.md#tidb_ddl_enable_fast_reorg-new-in-v630) mode. +Before using the DXF to execute `ADD INDEX` tasks, you need to enable the [Fast Online DDL](/system-variables.md#tidb_ddl_enable_fast_reorg-new-in-v630) mode. @@ -73,7 +88,7 @@ Before using the distributed framework, you need to enable the [Fast Online DDL] > **Note:** > -> Before you upgrade TiDB to v6.5.0 or later, it is recommended that you check whether the [`temp-dir`](/tidb-configuration-file.md#temp-dir-new-in-v630) path of TiDB is correctly mounted to an SSD disk. Make sure that the operating system user that runs TiDB has the read and write permissions for this directory. Otherwise, The DDL operations might experience unpredictable issues. This path is a TiDB configuration item, which takes effect after TiDB is restarted. Therefore, setting this configuration item in advance before upgrading can avoid another restart. +> It is recommended that you prepare at least 100 GiB of free space for the TiDB `temp-dir` directory. @@ -88,7 +103,7 @@ Adjust the following system variables related to Fast Online DDL: ## Usage -1. To enable the distributed framework, set the value of [`tidb_enable_dist_task`](/system-variables.md#tidb_enable_dist_task-new-in-v710) to `ON`: +1. To enable the DXF, set the value of [`tidb_enable_dist_task`](/system-variables.md#tidb_enable_dist_task-new-in-v710) to `ON`: ```sql SET GLOBAL tidb_enable_dist_task = ON; @@ -96,43 +111,40 @@ Adjust the following system variables related to Fast Online DDL: - When background tasks are running, the statements supported by the framework (such as [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) and [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md)) are executed in a distributed manner. All TiDB nodes run background tasks by default. + When the DXF tasks are running, the statements supported by the framework (such as [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) and [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md)) are executed in a distributed manner. All TiDB nodes run DXF tasks by default. - When background tasks are running, the statements supported by the framework (such as [`ADD INDEX`](/sql-statements/sql-statement-add-index.md)) are executed in a distributed manner. All TiDB nodes run background tasks by default. + When the DXF tasks are running, the statements supported by the framework (such as [`ADD INDEX`](/sql-statements/sql-statement-add-index.md)) are executed in a distributed manner. All TiDB nodes run DXF tasks by default. -2. Adjust the following system variables that might affect the distributed execution of DDL tasks according to your needs: +2. In general, for the following system variables that might affect the distributed execution of DDL tasks, it is recommended that you use their default values: * [`tidb_ddl_reorg_worker_cnt`](/system-variables.md#tidb_ddl_reorg_worker_cnt): use the default value `4`. The recommended maximum value is `16`. * [`tidb_ddl_reorg_priority`](/system-variables.md#tidb_ddl_reorg_priority) * [`tidb_ddl_error_count_limit`](/system-variables.md#tidb_ddl_error_count_limit) * [`tidb_ddl_reorg_batch_size`](/system-variables.md#tidb_ddl_reorg_batch_size): use the default value. The recommended maximum value is `1024`. -3. Starting from v7.4.0, you can adjust the number of TiDB nodes that perform background tasks according to actual needs. After deploying a TiDB cluster, you can set the instance-level system variable [`tidb_service_scope`](/system-variables.md#tidb_service_scope-new-in-v740) for each TiDB node in the cluster. When `tidb_service_scope` of a TiDB node is set to `background`, the TiDB node can execute background tasks. When `tidb_service_scope` of a TiDB node is set to the default value "", the TiDB node cannot execute background tasks. If `tidb_service_scope` is not set for any TiDB node in a cluster, the TiDB distributed execution framework schedules all TiDB nodes to execute background tasks by default. +3. Starting from v7.4.0, you can adjust the number of TiDB nodes that perform the DXF tasks according to actual needs. After deploying a TiDB cluster, you can set the instance-level system variable [`tidb_service_scope`](/system-variables.md#tidb_service_scope-new-in-v740) for each TiDB node in the cluster. When `tidb_service_scope` of a TiDB node is set to `background`, the TiDB node can execute the DXF tasks. When `tidb_service_scope` of a TiDB node is set to the default value "", the TiDB node cannot execute the DXF tasks. If `tidb_service_scope` is not set for any TiDB node in a cluster, the DXF schedules all TiDB nodes to execute tasks by default. + > **Note:** > > - During the execution of a distributed task, if some TiDB nodes are offline, the distributed task randomly assigns subtasks to available TiDB nodes instead of dynamically assigning subtasks according to `tidb_service_scope`. > - During the execution of a distributed task, changes to the `tidb_service_scope` configuration will not take effect for the current task, but will take effect from the next task. -> **Tip:** -> -> For distributed execution of `ADD INDEX` statements, you only need to set `tidb_ddl_reorg_worker_cnt`. - ## Implementation principles -The architecture of the TiDB backend task distributed execution framework is as follows: +The architecture of the DXF is as follows: -![Architecture of the TiDB backend task distributed execution framework](/media/dist-task/dist-task-architect.jpg) +![Architecture of the DXF](/media/dist-task/dist-task-architect.jpg) -As shown in the preceding diagram, the execution of backend tasks in the distributed framework is mainly handled by the following modules: +As shown in the preceding diagram, the execution of tasks in the DXF is mainly handled by the following modules: - Dispatcher: generates the distributed execution plan for each task, manages the execution process, converts the task status, and collects and feeds back the runtime task information. -- Scheduler: replicates the execution of distributed tasks among TiDB nodes to improve the efficiency of backend task execution. +- Scheduler: replicates the execution of distributed tasks among TiDB nodes to improve the efficiency of task execution. - Subtask Executor: the actual executor of distributed subtasks. In addition, the Subtask Executor returns the execution status of subtasks to the Scheduler, and the Scheduler updates the execution status of subtasks in a unified manner. - Resource pool: provides the basis for quantifying resource usage and management by pooling computing resources of the above modules. diff --git a/tidb-global-sort.md b/tidb-global-sort.md index 19a00403c7c32..010042381b02e 100644 --- a/tidb-global-sort.md +++ b/tidb-global-sort.md @@ -22,7 +22,7 @@ summary: Learn the use cases, limitations, usage, and implementation principles ## Overview -The TiDB Global Sort feature enhances the stability and efficiency of data import and DDL (Data Definition Language) operations. It serves as a general operator in the [distributed execution framework](/tidb-distributed-execution-framework.md), providing a global sort service on cloud. +The TiDB Global Sort feature enhances the stability and efficiency of data import and DDL (Data Definition Language) operations. It serves as a general operator in the [TiDB Distributed eXecution Framework (DXF)](/tidb-distributed-execution-framework.md), providing a global sort service on cloud. The Global Sort feature currently only supports using Amazon S3 as cloud storage. In future releases, it will be extended to support multiple shared storage interfaces, such as POSIX, enabling seamless integration with different storage systems. This flexibility enables efficient and adaptable data sorting for various use cases. @@ -30,7 +30,7 @@ The Global Sort feature currently only supports using Amazon S3 as cloud storage The Global Sort feature enhances the stability and efficiency of `IMPORT INTO` and `CREATE INDEX`. By globally sorting the data that are processed by the tasks, it improves the stability, controllability, and scalability of writing data to TiKV. This provides an enhanced user experience for data import and DDL tasks, as well as higher-quality services. -The Global Sort feature executes tasks within the unified distributed parallel execution framework, ensuring efficient and parallel sorting of data on a global scale. +The Global Sort feature executes tasks within the unified DXF, ensuring efficient and parallel sorting of data on a global scale. ## Limitations @@ -40,7 +40,7 @@ Currently, the Global Sort feature is not used as a component of the query execu To enable Global Sort, follow these steps: -1. Enable the distributed execution framework by setting the value of [`tidb_enable_dist_task`](/system-variables.md#tidb_enable_dist_task-new-in-v710) to `ON`: +1. Enable the DXF by setting the value of [`tidb_enable_dist_task`](/system-variables.md#tidb_enable_dist_task-new-in-v710) to `ON`: ```sql SET GLOBAL tidb_enable_dist_task = ON; diff --git a/tidb-roadmap.md b/tidb-roadmap.md index 9958f09882730..57bcc30db50ef 100644 --- a/tidb-roadmap.md +++ b/tidb-roadmap.md @@ -28,8 +28,8 @@ In the course of development, this roadmap is subject to change based on user ne