From 5f32e54673437971a3a5adc84e104eeb4d99d8d6 Mon Sep 17 00:00:00 2001 From: Aolin Date: Fri, 24 Mar 2023 10:18:43 +0800 Subject: [PATCH] add docs for derive topn from window (#12946) --- TOC-tidb-cloud.md | 1 + TOC.md | 1 + blocklist-control-plan.md | 1 + derive-topn-from-window.md | 204 +++++++++++++++++++++++++++++++++++++ system-variables.md | 8 ++ 5 files changed, 215 insertions(+) create mode 100644 derive-topn-from-window.md diff --git a/TOC-tidb-cloud.md b/TOC-tidb-cloud.md index 2a8ced65ebfbb..51a4e711a5689 100644 --- a/TOC-tidb-cloud.md +++ b/TOC-tidb-cloud.md @@ -165,6 +165,7 @@ - [Partition Pruning](/partition-pruning.md) - [TopN and Limit Push Down](/topn-limit-push-down.md) - [Join Reorder](/join-reorder.md) + - [Derive TopN or Limit from Window Functions](/derive-topn-from-window.md) - Physical Optimization - [Overview](/sql-physical-optimization.md) - [Index Selection](/choose-index.md) diff --git a/TOC.md b/TOC.md index 7d5b816bcfee2..c0005c6d22eb1 100644 --- a/TOC.md +++ b/TOC.md @@ -245,6 +245,7 @@ - [Partition Pruning](/partition-pruning.md) - [TopN and Limit Push Down](/topn-limit-push-down.md) - [Join Reorder](/join-reorder.md) + - [Derive TopN or Limit from Window Functions](/derive-topn-from-window.md) - Physical Optimization - [Overview](/sql-physical-optimization.md) - [Index Selection](/choose-index.md) diff --git a/blocklist-control-plan.md b/blocklist-control-plan.md index e2e5fc5f7a617..ea3b69ae8ae30 100644 --- a/blocklist-control-plan.md +++ b/blocklist-control-plan.md @@ -26,6 +26,7 @@ The blocklist of optimization rules is one way to tune optimization rules, mainl | Aggregation pushdown | aggregation_push_down | Tries to push aggregations down to their children. | | TopN pushdown | topn_push_down | Tries to push the TopN operator to the place closer to the data source. | | Join reorder | join_reorder | Decides the order of multi-table joins. | +| Derive TopN or Limit from window functions | derive_topn_from_window | Derives the TopN or Limit operator from window functions. | ### Disable optimization rules diff --git a/derive-topn-from-window.md b/derive-topn-from-window.md new file mode 100644 index 0000000000000..d4bc663e10d9f --- /dev/null +++ b/derive-topn-from-window.md @@ -0,0 +1,204 @@ +--- +title: Derive TopN or Limit from Window Functions +summary: Introduce the optimization rule of deriving TopN or Limit from window functions and how to enable this rule. +--- + +# Derive TopN or Limit from Window Functions + +[Window Functions](/functions-and-operators/window-functions.md) are a common type of SQL function. When you use a window function for row numbering, such as `ROW_NUMBER()` or `RANK()`, it is common to filter the results after the window function is evaluated. For example: + +```sql +SELECT * FROM (SELECT ROW_NUMBER() OVER (ORDER BY a) AS rownumber FROM t) dt WHERE rownumber <= 3 +``` + +In a typical SQL execution process, TiDB first sorts all data in the table `t`, then calculates the `ROW_NUMBER()` result for each row, and finally filters with `rownumber <= 3`. + +Starting from v7.0.0, TiDB supports deriving the TopN or Limit operator from window functions. With this optimization rule, TiDB can rewrite the original SQL into an equivalent form as follows: + +```sql +WITH t_topN AS (SELECT a FROM t1 ORDER BY a LIMIT 3) SELECT * FROM (SELECT ROW_NUMBER() OVER (ORDER BY a) AS rownumber FROM t_topN) dt WHERE rownumber <= 3 +``` + +After rewriting, TiDB can derive a TopN operator from the window function and the subsequent filter condition. Compared with the Sort operator in the original SQL (`ORDER BY`), the TopN operator has a much higher execution efficiency. In addition, both TiKV and TiFlash support pushing down the TopN operator, which further improves the performance of the rewritten SQL. + +Deriving TopN or Limit from window functions is disabled by default. To enable this feature, you can set the session variable [tidb_opt_derive_topn](/system-variables.md#tidb_opt_derive_topn-new-in-v700) to `ON`. + +After enabling this feature, you can disable it by performing one of the following operations: + +* Set the session variable [tidb_opt_derive_topn](/system-variables.md#tidb_opt_derive_topn-new-in-v700) to `OFF`. +* Follow the steps described in [The blocklist of optimization rules and expression pushdown](/blocklist-control-plan.md). + +## Limitations + +* Only the `ROW_NUMBER()` window function is supported for SQL rewriting. +* TiDB can only rewrite SQL when filtering on the `ROW_NUMBER()` results and the filter condition is `<` or `<=`. + +## Usage examples + +The following examples demonstrate how to use the optimization rule. + +### Window functions without PARTITION BY + +#### Example 1: window functions without ORDER BY + +```sql +CREATE TABLE t(id int, value int); +SET tidb_opt_derive_topn=on; +EXPLAIN SELECT * FROM (SELECT ROW_NUMBER() OVER () AS rownumber FROM t) dt WHERE rownumber <= 3; +``` + +The result is as follows: + +``` ++----------------------------------+---------+-----------+---------------+-----------------------------------------------------------------------+ +| id | estRows | task | access object | operator info | ++----------------------------------+---------+-----------+---------------+-----------------------------------------------------------------------+ +| Projection_9 | 2.40 | root | | Column#5 | +| └─Selection_10 | 2.40 | root | | le(Column#5, 3) | +| └─Window_11 | 3.00 | root | | row_number()->Column#5 over(rows between current row and current row) | +| └─Limit_15 | 3.00 | root | | offset:0, count:3 | +| └─TableReader_26 | 3.00 | root | | data:Limit_25 | +| └─Limit_25 | 3.00 | cop[tikv] | | offset:0, count:3 | +| └─TableFullScan_24 | 3.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++----------------------------------+---------+-----------+---------------+-----------------------------------------------------------------------+ +``` + +In this query, the optimizer derives the Limit operator from the window function and pushes it down to TiKV. + +#### Example 2: window functions with ORDER BY + +```sql +CREATE TABLE t(id int, value int); +SET tidb_opt_derive_topn=on; +EXPLAIN SELECT * FROM (SELECT ROW_NUMBER() OVER (ORDER BY value) AS rownumber FROM t) dt WHERE rownumber <= 3; +``` + +The result is as follows: + +``` ++----------------------------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------+ +| id | estRows | task | access object | operator info | ++----------------------------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------+ +| Projection_10 | 2.40 | root | | Column#5 | +| └─Selection_11 | 2.40 | root | | le(Column#5, 3) | +| └─Window_12 | 3.00 | root | | row_number()->Column#5 over(order by test.t.value rows between current row and current row) | +| └─TopN_13 | 3.00 | root | | test.t.value, offset:0, count:3 | +| └─TableReader_25 | 3.00 | root | | data:TopN_24 | +| └─TopN_24 | 3.00 | cop[tikv] | | test.t.value, offset:0, count:3 | +| └─TableFullScan_23 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++----------------------------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------+ +``` + +In this query, the optimizer derives the TopN operator from the window function and pushes it down to TiKV. + +### Window functions with PARTITION BY + +> **Note:** +> +> For a window function containing `PARTITION BY`, the optimization rule only takes effect when the partition column is a prefix of the primary key and the primary key is a clustered index. + +#### Example 3: window functions without ORDER BY + +```sql +CREATE TABLE t(id1 int, id2 int, value1 int, value2 int, primary key(id1,id2) clustered); +SET tidb_opt_derive_topn=on; +EXPLAIN SELECT * FROM (SELECT ROW_NUMBER() OVER (PARTITION BY id1) AS rownumber FROM t) dt WHERE rownumber <= 3; +``` + +The result is as follows: + +``` ++------------------------------------+---------+-----------+---------------+-----------------------------------------------------------------------------------------------+ +| id | estRows | task | access object | operator info | ++------------------------------------+---------+-----------+---------------+-----------------------------------------------------------------------------------------------+ +| Projection_10 | 2.40 | root | | Column#6 | +| └─Selection_11 | 2.40 | root | | le(Column#6, 3) | +| └─Shuffle_26 | 3.00 | root | | execution info: concurrency:2, data sources:[TableReader_24] | +| └─Window_12 | 3.00 | root | | row_number()->Column#6 over(partition by test.t.id1 rows between current row and current row) | +| └─Sort_25 | 3.00 | root | | test.t.id1 | +| └─TableReader_24 | 3.00 | root | | data:Limit_23 | +| └─Limit_23 | 3.00 | cop[tikv] | | partition by test.t.id1, offset:0, count:3 | +| └─TableFullScan_22 | 3.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++------------------------------------+---------+-----------+---------------+-----------------------------------------------------------------------------------------------+ +``` + +In this query, the optimizer derives the Limit operator from the window function and pushes it down to TiKV. Note that this Limit is actually a partition Limit, which means that the Limit will be applied to each group of data with the same `id1` value. + +#### Example 4: window functions with ORDER BY + +```sql +CREATE TABLE t(id1 int, id2 int, value1 int, value2 int, primary key(id1,id2) clustered); +SET tidb_opt_derive_topn=on; +EXPLAIN SELECT * FROM (SELECT ROW_NUMBER() OVER (PARTITION BY id1 ORDER BY value1) AS rownumber FROM t) dt WHERE rownumber <= 3; +``` + +The result is as follows: + +``` ++------------------------------------+----------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------+ +| id | estRows | task | access object | operator info | ++------------------------------------+----------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------+ +| Projection_10 | 2.40 | root | | Column#6 | +| └─Selection_11 | 2.40 | root | | le(Column#6, 3) | +| └─Shuffle_23 | 3.00 | root | | execution info: concurrency:3, data sources:[TableReader_21] | +| └─Window_12 | 3.00 | root | | row_number()->Column#6 over(partition by test.t.id1 order by test.t.value1 rows between current row and current row) | +| └─Sort_22 | 3.00 | root | | test.t.id1, test.t.value1 | +| └─TableReader_21 | 3.00 | root | | data:TopN_19 | +| └─TopN_19 | 3.00 | cop[tikv] | | partition by test.t.id1 order by test.t.value1, offset:0, count:3 | +| └─TableFullScan_18 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++------------------------------------+----------+-----------+---------------+----------------------------------------------------------------------------------------------------------------------+ +``` + +In this query, the optimizer derives the TopN operator from the window function and pushes it down to TiKV. Note that this TopN is actually a partition TopN, which means that the TopN will be applied to each group of data with the same `id1` value. + +#### Example 5: PARTITION BY column is not a prefix of the primary key + +```sql +CREATE TABLE t(id1 int, id2 int, value1 int, value2 int, primary key(id1,id2) clustered); +SET tidb_opt_derive_topn=on; +EXPLAIN SELECT * FROM (SELECT ROW_NUMBER() OVER (PARTITION BY value1) AS rownumber FROM t) dt WHERE rownumber <= 3; +``` + +The result is as follows: + +``` ++----------------------------------+----------+-----------+---------------+--------------------------------------------------------------------------------------------------+ +| id | estRows | task | access object | operator info | ++----------------------------------+----------+-----------+---------------+--------------------------------------------------------------------------------------------------+ +| Projection_9 | 8000.00 | root | | Column#6 | +| └─Selection_10 | 8000.00 | root | | le(Column#6, 3) | +| └─Shuffle_15 | 10000.00 | root | | execution info: concurrency:5, data sources:[TableReader_13] | +| └─Window_11 | 10000.00 | root | | row_number()->Column#6 over(partition by test.t.value1 rows between current row and current row) | +| └─Sort_14 | 10000.00 | root | | test.t.value1 | +| └─TableReader_13 | 10000.00 | root | | data:TableFullScan_12 | +| └─TableFullScan_12 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++----------------------------------+----------+-----------+---------------+--------------------------------------------------------------------------------------------------+ +``` + +In this query, the SQL is not rewritten because the `PARTITION BY` column is not a prefix of the primary key. + +#### Example 6: PARTITION BY column is a prefix of the primary key but not a clustered index + +```sql +CREATE TABLE t(id1 int, id2 int, value1 int, value2 int, primary key(id1,id2) nonclustered); +SET tidb_opt_derive_topn=on; +EXPLAIN SELECT * FROM (SELECT ROW_NUMBER() OVER (PARTITION BY id1) AS rownumber FROM t use index()) dt WHERE rownumber <= 3; +``` + +The result is as follows: + +``` ++----------------------------------+----------+-----------+---------------+-----------------------------------------------------------------------------------------------+ +| id | estRows | task | access object | operator info | ++----------------------------------+----------+-----------+---------------+-----------------------------------------------------------------------------------------------+ +| Projection_9 | 8000.00 | root | | Column#7 | +| └─Selection_10 | 8000.00 | root | | le(Column#7, 3) | +| └─Shuffle_15 | 10000.00 | root | | execution info: concurrency:5, data sources:[TableReader_13] | +| └─Window_11 | 10000.00 | root | | row_number()->Column#7 over(partition by test.t.id1 rows between current row and current row) | +| └─Sort_14 | 10000.00 | root | | test.t.id1 | +| └─TableReader_13 | 10000.00 | root | | data:TableFullScan_12 | +| └─TableFullScan_12 | 10000.00 | cop[tikv] | table:t | keep order:false, stats:pseudo | ++----------------------------------+----------+-----------+---------------+-----------------------------------------------------------------------------------------------+ +``` + +In this query, although the `PARTITION BY` column is a prefix of the primary key, the SQL is not rewritten because the primary key is not a clustered index. diff --git a/system-variables.md b/system-variables.md index 574cf355c5997..1016e93374f0e 100644 --- a/system-variables.md +++ b/system-variables.md @@ -3040,6 +3040,14 @@ As shown in this diagram, when [`tidb_enable_paging`](#tidb_enable_paging-new-in - Default value: `3.0` - Indicates the CPU cost for TiDB to process one row. This variable is internally used in the [Cost Model](/cost-model.md), and it is **NOT** recommended to modify its value. +### `tidb_opt_derive_topn` New in v7.0.0 + +- Scope: SESSION | GLOBAL +- Persists to cluster: Yes +- Type: Boolean +- Default value: `OFF` +- Controls whether to enable the optimization rule of [Deriving TopN or Limit from window functions](/derive-topn-from-window.md). + ### tidb_opt_desc_factor - Scope: SESSION | GLOBAL