From 13fa8169b9d1b4c2edca874771e7b86428823146 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Dani=C3=ABl=20van=20Eeden?= Date: Fri, 17 May 2024 08:55:13 +0200 Subject: [PATCH] Extend CAST and multi-valued-index docs (#17472) --- choose-index.md | 8 +-- .../cast-functions-and-operators.md | 61 ++++++++++++++++++- sql-statements/sql-statement-create-index.md | 2 +- 3 files changed, 64 insertions(+), 7 deletions(-) diff --git a/choose-index.md b/choose-index.md index 3f8a4860eda67..19bf0f2f49d53 100644 --- a/choose-index.md +++ b/choose-index.md @@ -123,15 +123,15 @@ According to these factors and the cost model, the optimizer selects an index wi 1. The estimated number of rows is not accurate? - This is usually due to stale or inaccurate statistics. You can re-execute the `analyze table` statement or modify the parameters of the `analyze table` statement. + This is usually due to stale or inaccurate statistics. You can re-execute the `ANALYZE TABLE` statement or modify the parameters of the `ANALYZE TABLE` statement. 2. Statistics are accurate, and reading from TiFlash is faster, but why does the optimizer choose to read from TiKV? - At present, the cost model of distinguishing TiFlash from TiKV is still rough. You can decrease the value of `tidb_opt_seek_factor` parameter, then the optimizer prefers to choose TiFlash. + At present, the cost model of distinguishing TiFlash from TiKV is still rough. You can decrease the value of [`tidb_opt_seek_factor`](/system-variables.md#tidb_opt_seek_factor) parameter, then the optimizer prefers to choose TiFlash. 3. The statistics are accurate. Index A needs to retrieve rows from tables, but it actually executes faster than Index B that does not retrieve rows from tables. Why does the optimizer choose Index B? - In this case, the cost estimation may be too large for retrieving rows from tables. You can decrease the value of `tidb_opt_network_factor` parameter to reduce the cost of retrieving rows from tables. + In this case, the cost estimation may be too large for retrieving rows from tables. You can decrease the value of [`tidb_opt_network_factor`](/system-variables.md#tidb_opt_network_factor) parameter to reduce the cost of retrieving rows from tables. ## Control index selection @@ -143,7 +143,7 @@ The index selection can be controlled by a single query through [Optimizer Hints ## Use multi-valued indexes -[Multi-valued indexes](/sql-statements/sql-statement-create-index.md#multi-valued-indexes) are different from normal indexes. TiDB currently only uses [IndexMerge](/explain-index-merge.md) to access multi-valued indexes. Therefore, to use multi-valued indexes for data access, make sure that the value of the system variable `tidb_enable_index_merge` is set to `ON`. +[Multi-valued indexes](/sql-statements/sql-statement-create-index.md#multi-valued-indexes) are different from normal indexes. TiDB currently only uses [IndexMerge](/explain-index-merge.md) to access multi-valued indexes. Therefore, to use multi-valued indexes for data access, make sure that the value of the system variable [`tidb_enable_index_merge`](/system-variables.md#tidb_enable_index_merge-new-in-v40) is set to `ON`. For the limitations of multi-valued indexes, refer to [`CREATE INDEX`](/sql-statements/sql-statement-create-index.md#limitations). diff --git a/functions-and-operators/cast-functions-and-operators.md b/functions-and-operators/cast-functions-and-operators.md index 9ac5cab6eb9d0..4643760f7abd9 100644 --- a/functions-and-operators/cast-functions-and-operators.md +++ b/functions-and-operators/cast-functions-and-operators.md @@ -24,11 +24,31 @@ The [`BINARY`](https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html#opera ## CAST -The [`CAST()`](https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html#function_cast) function is used to cast an expression to a specific type. +The [`CAST( AS [ARRAY])`](https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html#function_cast) function is used to cast an expression to a specific type. This function is also used to create [Multi-valued indexes](/sql-statements/sql-statement-create-index.md#multi-valued-indexes). -Example: +The following types are supported: + +| Type | Description | Whether it can be used with multi-valued indexes | +|----------------------|------------------|------------------------------------------------------------| +| `BINARY(n)` | Binary string | No | +| `CHAR(n)` | Character string | Yes, but only if a length is specified | +| `DATE` | Date | Yes | +| `DATETIME(fsp)` | Date/time, where `fsp` is optional | Yes | +| `DECIMAL(n, m)` | Decimal number, where `n` and `m` are optional and are `10` and `0` if not specified | No | +| `DOUBLE` | Double precision floating-point number | No | +| `FLOAT(n)` | Floating-point number, where `n` is optional and should be between `0` and `53` | No | +| `JSON` | JSON | No | +| `REAL` | Floating-point number | Yes | +| `SIGNED [INTEGER]` | Signed integer | Yes | +| `TIME(fsp)` | Time | Yes | +| `UNSIGNED [INTEGER]` | Unsigned integer | Yes | +| `YEAR` | Year | No | + +Examples: + +The following statement converts a binary string from a HEX literal to a `CHAR`. ```sql SELECT CAST(0x54694442 AS CHAR); @@ -43,6 +63,43 @@ SELECT CAST(0x54694442 AS CHAR); 1 row in set (0.0002 sec) ``` +The following statement casts the values of the `a` attribute extracted from the JSON column to an unsigned array. Note that casting to an array is only supported as part of an index definition for multi-valued indexes. + +```sql +CREATE TABLE t ( + id INT PRIMARY KEY, + j JSON, + INDEX idx_a ((CAST(j->'$.a' AS UNSIGNED ARRAY))) +); +INSERT INTO t VALUES (1, JSON_OBJECT('a',JSON_ARRAY(1,2,3))); +INSERT INTO t VALUES (2, JSON_OBJECT('a',JSON_ARRAY(4,5,6))); +INSERT INTO t VALUES (3, JSON_OBJECT('a',JSON_ARRAY(7,8,9))); +ANALYZE TABLE t; +``` + +```sql + EXPLAIN SELECT * FROM t WHERE 1 MEMBER OF(j->'$.a')\G +*************************** 1. row *************************** + id: IndexMerge_10 + estRows: 2.00 + task: root +access object: +operator info: type: union +*************************** 2. row *************************** + id: ├─IndexRangeScan_8(Build) + estRows: 2.00 + task: cop[tikv] +access object: table:t, index:idx_a(cast(json_extract(`j`, _utf8mb4'$.a') as unsigned array)) +operator info: range:[1,1], keep order:false, stats:partial[j:unInitialized] +*************************** 3. row *************************** + id: └─TableRowIDScan_9(Probe) + estRows: 2.00 + task: cop[tikv] +access object: table:t +operator info: keep order:false, stats:partial[j:unInitialized] +3 rows in set (0.00 sec) +``` + ## CONVERT The [`CONVERT()`](https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html#function_convert) function is used to convert between [character sets](/character-set-and-collation.md). diff --git a/sql-statements/sql-statement-create-index.md b/sql-statements/sql-statement-create-index.md index 91ca862705fdb..57852a4a81ce2 100644 --- a/sql-statements/sql-statement-create-index.md +++ b/sql-statements/sql-statement-create-index.md @@ -232,7 +232,7 @@ Multi-valued indexes are a kind of secondary index defined on an array column. I ### Create multi-valued indexes -You can create multi-valued indexes by using the [`CAST(... AS ... ARRAY)`](/functions-and-operators/cast-functions-and-operators.md) expression in the index definition, as creating an expression index. +You can create multi-valued indexes by using the [`CAST(... AS ... ARRAY)`](/functions-and-operators/cast-functions-and-operators.md#cast) function in the index definition, as creating an expression index. ```sql mysql> CREATE TABLE customers (