Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spill of parallel aggregation in TiDB #15755

Merged
merged 12 commits into from
Mar 20, 2024
14 changes: 3 additions & 11 deletions configure-memory-usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,8 +145,8 @@ TiDB supports disk spill for execution operators. When the memory usage of a SQL

- The disk spill behavior is jointly controlled by the following parameters: [`tidb_mem_quota_query`](/system-variables.md#tidb_mem_quota_query), [`tidb_enable_tmp_storage_on_oom`](/system-variables.md#tidb_enable_tmp_storage_on_oom), [`tmp-storage-path`](/tidb-configuration-file.md#tmp-storage-path), and [`tmp-storage-quota`](/tidb-configuration-file.md#tmp-storage-quota).
- When the disk spill is triggered, TiDB outputs a log containing the keywords `memory exceeds quota, spill to disk now` or `memory exceeds quota, set aggregate mode to spill-mode`.
- Disk spill for the Sort, MergeJoin, and HashJoin operator is introduced in v4.0.0; disk spill for the HashAgg operator is introduced in v5.2.0.
- When the SQL executions containing Sort, MergeJoin, or HashJoin cause OOM, TiDB triggers disk spill by default. When SQL executions containing HashAgg cause OOM, TiDB does not trigger disk spill by default. You can configure the system variable `tidb_executor_concurrency = 1` to trigger disk spill for HashAgg.
- Disk spill for the Sort, MergeJoin, and HashJoin operators is introduced in v4.0.0; disk spill for the non-concurrent algorithm of the HashAgg operator is introduced in v5.2.0; disk spill for the concurrent algorithm of the HashAgg operator is introduced in v7.6.0.
qiancai marked this conversation as resolved.
Show resolved Hide resolved
- When the SQL executions containing Sort, MergeJoin, HashJoin, or HashAgg cause OOM, TiDB triggers disk spill by default.

> **Note:**
>
Expand Down Expand Up @@ -178,15 +178,7 @@ The following example uses a memory-consuming SQL statement to demonstrate the d
ERROR 1105 (HY000): Out Of Memory Quota![conn_id=3]
```

4. Configure the system variable `tidb_executor_concurrency` to 1. With this configuration, when out of memory, HashAgg automatically tries to trigger disk spill.

{{< copyable "sql" >}}

```sql
SET tidb_executor_concurrency = 1;
```

5. Execute the same SQL statement. You can find that this time, the statement is successfully executed and no error message is returned. From the following detailed execution plan, you can see that HashAgg has used 600 MB of hard disk space.
4. Execute the same SQL statement. You can find that this time, the statement is successfully executed and no error message is returned. From the following detailed execution plan, you can see that HashAgg has used 600 MB of hard disk space.

{{< copyable "sql" >}}

Expand Down
13 changes: 13 additions & 0 deletions system-variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -1934,6 +1934,19 @@ mysql> SELECT job_info FROM mysql.analyze_jobs ORDER BY end_time DESC LIMIT 1;
- Default value: `OFF`
- This variable controls whether to enable TiDB to collect `PREDICATE COLUMNS`. After enabling the collection, if you disable it, the information of previously collected `PREDICATE COLUMNS` is cleared. For details, see [Collect statistics on some columns](/statistics.md#collect-statistics-on-some-columns).

### tidb_enable_concurrent_hashagg_spill <span class="version-mark">New in v7.6.0 </span>
xzhangxian1008 marked this conversation as resolved.
Show resolved Hide resolved

> **Warning:**
>
> Currently, it is an experimental feature. It is not recommended that you use it in production environments. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub.
xzhangxian1008 marked this conversation as resolved.
Show resolved Hide resolved

- Scope: SESSION | GLOBAL
- Persists to cluster: Yes
- Applies to hint [SET_VAR](/optimizer-hints.md#set_varvar_namevar_value): No
- Type: Boolean
- Default value: `ON`
- This variable controls whether TiDB supports disk spill for the concurrent HashAgg algorithm. When it is `ON`, disk spill can be triggered for the concurrent HashAgg algorithm. This variable will be deprecated when this feature is generally available in a future release.

### tidb_enable_enhanced_security

- Scope: NONE
Expand Down
Loading