Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend CAST and multi-valued-index docs #17472

Merged
merged 7 commits into from
May 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions choose-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,15 +123,15 @@ According to these factors and the cost model, the optimizer selects an index wi

1. The estimated number of rows is not accurate?

This is usually due to stale or inaccurate statistics. You can re-execute the `analyze table` statement or modify the parameters of the `analyze table` statement.
This is usually due to stale or inaccurate statistics. You can re-execute the `ANALYZE TABLE` statement or modify the parameters of the `ANALYZE TABLE` statement.

2. Statistics are accurate, and reading from TiFlash is faster, but why does the optimizer choose to read from TiKV?

At present, the cost model of distinguishing TiFlash from TiKV is still rough. You can decrease the value of `tidb_opt_seek_factor` parameter, then the optimizer prefers to choose TiFlash.
At present, the cost model of distinguishing TiFlash from TiKV is still rough. You can decrease the value of [`tidb_opt_seek_factor`](/system-variables.md#tidb_opt_seek_factor) parameter, then the optimizer prefers to choose TiFlash.

3. The statistics are accurate. Index A needs to retrieve rows from tables, but it actually executes faster than Index B that does not retrieve rows from tables. Why does the optimizer choose Index B?

In this case, the cost estimation may be too large for retrieving rows from tables. You can decrease the value of `tidb_opt_network_factor` parameter to reduce the cost of retrieving rows from tables.
In this case, the cost estimation may be too large for retrieving rows from tables. You can decrease the value of [`tidb_opt_network_factor`](/system-variables.md#tidb_opt_network_factor) parameter to reduce the cost of retrieving rows from tables.

## Control index selection

Expand All @@ -143,7 +143,7 @@ The index selection can be controlled by a single query through [Optimizer Hints

## Use multi-valued indexes

[Multi-valued indexes](/sql-statements/sql-statement-create-index.md#multi-valued-indexes) are different from normal indexes. TiDB currently only uses [IndexMerge](/explain-index-merge.md) to access multi-valued indexes. Therefore, to use multi-valued indexes for data access, make sure that the value of the system variable `tidb_enable_index_merge` is set to `ON`.
[Multi-valued indexes](/sql-statements/sql-statement-create-index.md#multi-valued-indexes) are different from normal indexes. TiDB currently only uses [IndexMerge](/explain-index-merge.md) to access multi-valued indexes. Therefore, to use multi-valued indexes for data access, make sure that the value of the system variable [`tidb_enable_index_merge`](/system-variables.md#tidb_enable_index_merge-new-in-v40) is set to `ON`.

For the limitations of multi-valued indexes, refer to [`CREATE INDEX`](/sql-statements/sql-statement-create-index.md#limitations).

Expand Down
61 changes: 59 additions & 2 deletions functions-and-operators/cast-functions-and-operators.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,11 +24,31 @@ The [`BINARY`](https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html#opera

## CAST

The [`CAST()`](https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html#function_cast) function is used to cast an expression to a specific type.
The [`CAST(<expression> AS <type> [ARRAY])`](https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html#function_cast) function is used to cast an expression to a specific type.

This function is also used to create [Multi-valued indexes](/sql-statements/sql-statement-create-index.md#multi-valued-indexes).

Example:
The following types are supported:

| Type | Description | Whether it can be used with multi-valued indexes |
|----------------------|------------------|------------------------------------------------------------|
| `BINARY(n)` | Binary string | No |
| `CHAR(n)` | Character string | Yes, but only if a length is specified |
| `DATE` | Date | Yes |
| `DATETIME(fsp)` | Date/time, where `fsp` is optional | Yes |
| `DECIMAL(n, m)` | Decimal number, where `n` and `m` are optional and are `10` and `0` if not specified | No |
| `DOUBLE` | Double precision floating-point number | No |
| `FLOAT(n)` | Floating-point number, where `n` is optional and should be between `0` and `53` | No |
| `JSON` | JSON | No |
| `REAL` | Floating-point number | Yes |
| `SIGNED [INTEGER]` | Signed integer | Yes |
| `TIME(fsp)` | Time | Yes |
| `UNSIGNED [INTEGER]` | Unsigned integer | Yes |
| `YEAR` | Year | No |

Examples:

The following statement converts a binary string from a HEX literal to a `CHAR`.

```sql
SELECT CAST(0x54694442 AS CHAR);
Expand All @@ -43,6 +63,43 @@ SELECT CAST(0x54694442 AS CHAR);
1 row in set (0.0002 sec)
```

The following statement casts the values of the `a` attribute extracted from the JSON column to an unsigned array. Note that casting to an array is only supported as part of an index definition for multi-valued indexes.

```sql
CREATE TABLE t (
id INT PRIMARY KEY,
j JSON,
INDEX idx_a ((CAST(j->'$.a' AS UNSIGNED ARRAY)))
);
INSERT INTO t VALUES (1, JSON_OBJECT('a',JSON_ARRAY(1,2,3)));
INSERT INTO t VALUES (2, JSON_OBJECT('a',JSON_ARRAY(4,5,6)));
INSERT INTO t VALUES (3, JSON_OBJECT('a',JSON_ARRAY(7,8,9)));
ANALYZE TABLE t;
```

```sql
EXPLAIN SELECT * FROM t WHERE 1 MEMBER OF(j->'$.a')\G
*************************** 1. row ***************************
id: IndexMerge_10
estRows: 2.00
task: root
access object:
operator info: type: union
*************************** 2. row ***************************
id: ├─IndexRangeScan_8(Build)
estRows: 2.00
task: cop[tikv]
access object: table:t, index:idx_a(cast(json_extract(`j`, _utf8mb4'$.a') as unsigned array))
operator info: range:[1,1], keep order:false, stats:partial[j:unInitialized]
*************************** 3. row ***************************
id: └─TableRowIDScan_9(Probe)
estRows: 2.00
task: cop[tikv]
access object: table:t
operator info: keep order:false, stats:partial[j:unInitialized]
3 rows in set (0.00 sec)
```

## CONVERT

The [`CONVERT()`](https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html#function_convert) function is used to convert between [character sets](/character-set-and-collation.md).
Expand Down
2 changes: 1 addition & 1 deletion sql-statements/sql-statement-create-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -232,7 +232,7 @@ Multi-valued indexes are a kind of secondary index defined on an array column. I

### Create multi-valued indexes

You can create multi-valued indexes by using the [`CAST(... AS ... ARRAY)`](/functions-and-operators/cast-functions-and-operators.md) expression in the index definition, as creating an expression index.
You can create multi-valued indexes by using the [`CAST(... AS ... ARRAY)`](/functions-and-operators/cast-functions-and-operators.md#cast) function in the index definition, as creating an expression index.

```sql
mysql> CREATE TABLE customers (
Expand Down
Loading