Skip to content

Commit

Permalink
Add links for DXF (pingcap#17522)
Browse files Browse the repository at this point in the history
  • Loading branch information
dveeden authored May 27, 2024
1 parent fb090fb commit 97749e6
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 14 deletions.
5 changes: 5 additions & 0 deletions sql-statements/sql-statement-add-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,10 @@ aliases: ['/docs/dev/sql-statements/sql-statement-add-index/','/docs/dev/referen

The `ALTER TABLE.. ADD INDEX` statement adds an index to an existing table. This operation is online in TiDB, which means that neither reads or writes to the table are blocked by adding an index.

> **Tip:**
>
> The [TiDB Distributed eXecution Framework (DXF)](/tidb-distributed-execution-framework.md) can be used to speed up the operation of this statement.
<CustomContent platform="tidb">

> **Warning:**
Expand Down Expand Up @@ -96,3 +100,4 @@ mysql> EXPLAIN SELECT * FROM t1 WHERE c1 = 3;
* [ADD COLUMN](/sql-statements/sql-statement-add-column.md)
* [CREATE TABLE](/sql-statements/sql-statement-create-table.md)
* [EXPLAIN](/sql-statements/sql-statement-explain.md)
* [TiDB Distributed eXecution Framework (DXF)](/tidb-distributed-execution-framework.md)
3 changes: 2 additions & 1 deletion sql-statements/sql-statement-import-into.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ The supported options are described as follows:
| `CHECKSUM_TABLE='<string>'` | All file formats | Configures whether to perform a checksum check on the target table after the import to validate the import integrity. The supported values include `"required"` (default), `"optional"`, and `"off"`. `"required"` means performing a checksum check after the import. If the checksum check fails, TiDB will return an error and the import will exit. `"optional"` means performing a checksum check after the import. If an error occurs, TiDB will return a warning and ignore the error. `"off"` means not performing a checksum check after the import. |
| `DETACHED` | All file formats | Controls whether to execute `IMPORT INTO` asynchronously. When this option is enabled, executing `IMPORT INTO` immediately returns the information of the import job (such as the `Job_ID`), and the job is executed asynchronously in the backend. |
| `CLOUD_STORAGE_URI` | All file formats | Specifies the target address where encoded KV data for [Global Sort](/tidb-global-sort.md) is stored. When `CLOUD_STORAGE_URI` is not specified, `IMPORT INTO` determines whether to use Global Sort based on the value of the system variable [`tidb_cloud_storage_uri`](/system-variables.md#tidb_cloud_storage_uri-new-in-v740). If this system variable specifies a target storage address, `IMPORT INTO` uses this address for Global Sort. When `CLOUD_STORAGE_URI` is specified with a non-empty value, `IMPORT INTO` uses that value as the target storage address. When `CLOUD_STORAGE_URI` is specified with an empty value, local sorting is enforced. Currently, the target storage address only supports S3. For details about the URI configuration, see [Amazon S3 URI format](/external-storage-uri.md#amazon-s3-uri-format). When this feature is used, all TiDB nodes must have read and write access for the target S3 bucket, including at least these permissions: `s3:ListBucket`, `s3:GetObject`, `s3:DeleteObject`, `s3:PutObject`, `s3: AbortMultipartUpload`. |
| `DISABLE_PRECHECK` | All file formats and query results of `SELECT` | Setting this option disables pre-checks of non-critical itemes, such as checking whether there are CDC or PITR tasks. |
| `DISABLE_PRECHECK` | All file formats and query results of `SELECT` | Setting this option disables pre-checks of non-critical items, such as checking whether there are CDC or PITR tasks. |

## `IMPORT INTO ... FROM FILE` usage

Expand Down Expand Up @@ -360,3 +360,4 @@ This statement is a TiDB extension to MySQL syntax.

* [`SHOW IMPORT JOB(s)`](/sql-statements/sql-statement-show-import-job.md)
* [`CANCEL IMPORT JOB`](/sql-statements/sql-statement-cancel-import-job.md)
* [TiDB Distributed eXecution Framework (DXF)](/tidb-distributed-execution-framework.md)
26 changes: 13 additions & 13 deletions tidb-distributed-execution-framework.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,13 @@ summary: Learn the use cases, limitations, usage, and implementation principles
>
> This feature is not available on [TiDB Serverless](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters.
TiDB adopts a computing-storage separation architecture with excellent scalability and elasticity. Starting from v7.1.0, TiDB introduces a Distributed eXecution Framework (DXF) to further leverage the resource advantages of the distributed architecture. The goal of the DXF is to implement unified scheduling and distributed execution of tasks, and to provide unified resource management capabilities for both overall and individual tasks, which better meets users' expectations for resource usage.
TiDB adopts a computing-storage separation architecture with excellent scalability and elasticity. Starting from v7.1.0, TiDB introduces a **Distributed eXecution Framework (DXF)** to further leverage the resource advantages of the distributed architecture. The goal of the DXF is to implement unified scheduling and distributed execution of tasks, and to provide unified resource management capabilities for both overall and individual tasks, which better meets users' expectations for resource usage.

This document describes the use cases, limitations, usage, and implementation principles of the DXF.

## Use cases

In a database management system, in addition to the core transactional processing (TP) and analytical processing (AP) workloads, there are other important tasks, such as DDL operations, IMPORT INTO, TTL, Analyze, and Backup/Restore. These tasks need to process a large amount of data in database objects (tables), so they typically have the following characteristics:
In a database management system, in addition to the core transactional processing (TP) and analytical processing (AP) workloads, there are other important tasks, such as DDL operations, [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md), [TTL](/time-to-live.md), [`ANALYZE`](/sql-statements/sql-statement-analyze-table.md), and Backup/Restore. These tasks need to process a large amount of data in database objects (tables), so they typically have the following characteristics:

- Need to process all data in a schema or a database object (table).
- Might need to be executed periodically, but at a low frequency.
Expand All @@ -27,29 +27,29 @@ Enabling the DXF can solve the above problems and has the following three advant
- The DXF supports distributed execution of tasks, which can flexibly schedule the available computing resources of the entire TiDB cluster, thereby better utilizing the computing resources in a TiDB cluster.
- The DXF provides unified resource usage and management capabilities for both overall and individual tasks.

Currently, the DXF supports the distributed execution of the `ADD INDEX` and `IMPORT INTO` statements.
Currently, the DXF supports the distributed execution of the [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) and [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md) statements.

- `ADD INDEX` is a DDL statement used to create indexes. For example:
- [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) is a DDL statement used to create indexes. For example:

```sql
ALTER TABLE t1 ADD INDEX idx1(c1);
CREATE INDEX idx1 ON table t1(c1);
```

- `IMPORT INTO` is used to import data in formats such as `CSV`, `SQL`, and `PARQUET` into an empty table. For more information, see [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md).
- [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md) is used to import data in formats such as CSV, SQL, and Parquet into an empty table.

## Limitation

The DXF can only schedule up to 16 tasks (including `ADD INDEX` tasks and `IMPORT INTO` tasks) simultaneously.
The DXF can only schedule up to 16 tasks (including [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) tasks and [`IMPORT INTO`](/sql-statements/sql-statement-import-into.md) tasks) simultaneously.

## `ADD INDEX` limitation

- For each cluster, only one `ADD INDEX` task is allowed for distributed execution at a time. If a new `ADD INDEX` task is submitted before the current `ADD INDEX` distributed task has finished, the new `ADD INDEX` task is executed through a transaction instead of being scheduled by DXF.
- For each cluster, only one [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) task is allowed for distributed execution at a time. If a new [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) task is submitted before the current [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) distributed task has finished, the new [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) task is executed through a transaction instead of being scheduled by DXF.
- Adding indexes on columns with the `TIMESTAMP` data type through the DXF is not supported, because it might lead to inconsistency between the index and the data.

## Prerequisites

Before using the DXF to execute `ADD INDEX` tasks, you need to enable the [Fast Online DDL](/system-variables.md#tidb_ddl_enable_fast_reorg-new-in-v630) mode.
Before using the DXF to execute [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) tasks, you need to enable the [Fast Online DDL](/system-variables.md#tidb_ddl_enable_fast_reorg-new-in-v630) mode.

<CustomContent platform="tidb">

Expand Down Expand Up @@ -100,14 +100,14 @@ By default, the DXF schedules all TiDB nodes to execute distributed tasks. Start

- For versions from v7.4.0 to v8.0.0, the optional values of [`tidb_service_scope`](/system-variables.md#tidb_service_scope-new-in-v740) are `''` or `background`. If the current cluster has TiDB nodes with `tidb_service_scope = 'background'`, the DXF schedules tasks to these nodes for execution. If the current cluster does not have TiDB nodes with `tidb_service_scope = 'background'`, whether due to faults or normal scaling in, the DXF schedules tasks to nodes with `tidb_service_scope = ''` for execution.

- Starting from v8.1.0, you can set [`tidb_service_scope`](/system-variables.md#tidb_service_scope-new-in-v740) to any valid value. When a distributed task is submitted, the task binds to the `tidb_service_scope` value of the currently connected TiDB node, and the DXF only schedules the task to the TiDB nodes with the same `tidb_service_scope` value for execution. However, for configuration compatibility with earlier versions, if a distributed task is submitted on a node with `tidb_service_scope = ''` and the current cluster has TiDB nodes with `tidb_service_scope = 'background'`, the DXF schedules the task to TiDB nodes with `tidb_service_scope = 'background'` for execution.
- Starting from v8.1.0, you can set [`tidb_service_scope`](/system-variables.md#tidb_service_scope-new-in-v740) to any valid value. When a distributed task is submitted, the task binds to the [`tidb_service_scope`](/system-variables.md#tidb_service_scope-new-in-v740) value of the currently connected TiDB node, and the DXF only schedules the task to the TiDB nodes with the same [`tidb_service_scope`](/system-variables.md#tidb_service_scope-new-in-v740) value for execution. However, for configuration compatibility with earlier versions, if a distributed task is submitted on a node with `tidb_service_scope = ''` and the current cluster has TiDB nodes with `tidb_service_scope = 'background'`, the DXF schedules the task to TiDB nodes with `tidb_service_scope = 'background'` for execution.

If new nodes are added during task execution, the DXF determines whether to schedule tasks to the new nodes for execution based on the preceding rules. If you do not want newly added nodes to execute tasks, it is recommended to set `tidb_service_scope` for these nodes in advance.
Starting from v8.1.0, if new nodes are added during task execution, the DXF determines whether to schedule tasks to the new nodes for execution based on the preceding rules. If you do not want newly added nodes to execute tasks, it is recommended to set a different [`tidb_service_scope`](/system-variables.md#tidb_service_scope-new-in-v740) for those newly added nodes in advance.

> **Note:**
>
> - For versions from v7.4.0 to v8.0.0, in clusters with multiple TiDB nodes, it is strongly recommended to set `tidb_service_scope` to `background` on two or more TiDB nodes. If this variable is set only on a single TiDB node, when that node restarts or fails, tasks will be rescheduled to TiDB nodes with `tidb_service_scope = ''`, which affects applications running on these TiDB nodes.
> - During the execution of a distributed task, changes to the `tidb_service_scope` configuration do not take effect for the current task, but take effect from the next task.
> - For versions from v7.4.0 to v8.0.0, in clusters with multiple TiDB nodes, it is strongly recommended to set [`tidb_service_scope`](/system-variables.md#tidb_service_scope-new-in-v740) to `background` on two or more TiDB nodes. If this variable is set only on a single TiDB node, when that node restarts or fails, tasks will be rescheduled to TiDB nodes with `tidb_service_scope = ''`, which affects applications running on these TiDB nodes.
> - During the execution of a distributed task, changes to the [`tidb_service_scope`](/system-variables.md#tidb_service_scope-new-in-v740) configuration do not take effect for the current task, but take effect from the next task.

## Implementation principles

Expand All @@ -133,4 +133,4 @@ As shown in the preceding diagram, the execution of tasks in the DXF is mainly h

* [Execution Principles and Best Practices of DDL Statements](https://docs.pingcap.com/tidb/stable/ddl-introduction)

</CustomContent>
</CustomContent>

0 comments on commit 97749e6

Please sign in to comment.