-
Notifications
You must be signed in to change notification settings - Fork 687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new Index advisor user doc #19867
base: master
Are you sure you want to change the base?
Conversation
@qw4990 Could you please invite a tech reviewer? Thanks! |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@qw4990: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
||
# Recommend Index command | ||
|
||
The SQL syntax for the new SQL command `Recommend Index` is shown below. The command can be used for a single SQL query (`For` option) or for a workload (`Run` option). The comamnd also supports setting options for subsequent runs of the command. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SQL syntax for the new SQL command `Recommend Index` is shown below. The command can be used for a single SQL query (`For` option) or for a workload (`Run` option). The comamnd also supports setting options for subsequent runs of the command. | |
SQL command `RECOMMEND INDEX` is introduced for index advisor tasks. Sub command `RUN` explores historical workloads and saves the recommendations in system tables. With option `FOR`, the command targets particular SQL statement even if it was not executed in the past. The command also accepts extra options for advance control. |
The SQL syntax for the new SQL command `Recommend Index` is shown below. The command can be used for a single SQL query (`For` option) or for a workload (`Run` option). The comamnd also supports setting options for subsequent runs of the command. | ||
|
||
```sql | ||
Recommend Index [Run | For <SQL> | <Options>] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Recommend Index [Run | For <SQL> | <Options>] | |
Recommend Index Run [ For <SQL> ] [<Options>] |
|
||
```sql | ||
mysql> CREATE TABLE t(a int, b int, c int); | ||
mysql> RECOMMEND INDEX RUN for "select a, b from t where a=1 and b=1"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it command accept "\G" as delimiter? column-style printing improves readability for single output.
mysql> CREATE TABLE t(a int, b int, c int); | ||
mysql> RECOMMEND INDEX RUN for "select a, b from t where a=1 and b=1"; | ||
+----------+-------+------------+---------------+------------+----------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+---------------------------------+ | ||
| database | table | index_name | index_columns | index_size | reason | top_impacted_query | create_index_statement | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- If "index_size" is not exact value, better to rename it with "est_index_size".
- It's recommended to append meaningful string as suffix of index name, so that we are able to understand it's from index advisor. Name conflict can be avoid as well. For example "idx_..._<trailing 8 chars in plan digest>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index_size is the maximum number of columns an index can have. May be we can rename to reflect that, say "max_index_columns".
+----------+-------+------------+---------------+------------+----------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+---------------------------------+ | ||
| database | table | index_name | index_columns | index_size | reason | top_impacted_query | create_index_statement | | ||
+----------+-------+------------+---------------+------------+----------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+---------------------------------+ | ||
| test | t | idx_a_b | a,b | 19872 | Column [a b] appear in Equal or Range Predicate clause(s) in query: select `a` , `b` from `test` . `t` where `a` = ? and `b` = ? | [{"Query":"SELECT `a`,`b` FROM `test`.`t` WHERE `a` = 1 AND `b` = 1","Improvement":0.999994}] | CREATE INDEX idx_a_b ON t(a,b); | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shall explain how we decide "Improvement" later in this section.
|
||
```sql | ||
Recommend Index Set <option> = <value>; | ||
Recommend Index Show; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this "Recommend Index Show Options"? We'd better reserve "Recommend Index Show " for future usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree since it is only used to show options.
|
||
Here are some current limitations of the index recommendation feature, which we plan to address in the future: | ||
1. It does not support prepared statements, meaning `RECOMMEND INDEX RUN` cannot recommend indexes for queries executed through the `Prepare` and `Execute` protocol. | ||
2. It does not provide recommendations for deleting indexes. We need to merge the removing index logic (see below) to the `RECOOEND` comamnd in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. It does not provide recommendations for deleting indexes. We need to merge the removing index logic (see below) to the `RECOOEND` comamnd in the future. | |
2. It does not provide recommendations for deleting indexes. We need to merge the removing index logic (see below) to the `RECOMMEND` command in the future. |
Typo
3. A UI for the Index Advisor will be available in the future. | ||
|
||
# Removing Unused Indexes | ||
TiDB provides two system views/tables to help users identify inactive indexes in their workload. Users can either mark such indexes as invisible as a transitional state before dropping them or drop them right away. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TiDB provides two system views/tables to help users identify inactive indexes in their workload. Users can either mark such indexes as invisible as a transitional state before dropping them or drop them right away. | |
For v8.0 or higher, TiDB provides two system views/tables to help users identify inactive indexes in their workload. Users can manage to drop these indexes to save the storage and overhead caused. For online systems, it's highly recommended to make the target indexes invisible and observe the impact for one business cycle before dropping them completely. |
## View sys.schema_unused_indexes | ||
|
||
The `sys.schema_unused_indexes` view identifies indexes that have not been used since the startup of all TiDB instances. The view is defined based on system tables that have schema, table and column information. The view provides the full specification for the index including index, table and schema names. Users can query this view and decide on making indexes invisible or deleting them. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> **Warning:** | |
> | |
> As this view shows the unused indexes since last startup of all TiDB instances, please make sure the TiDB instances are alive long enough. Otherwise, it could show false candidates in case certain workloads are not included. SQL `select START_TIME,UPTIME from INFORMATION_SCHEMA.CLUSTER_INFO where TYPE='tidb';` helps identify the ages of all TiDB instances. | |
|
||
## View sys.schema_unused_indexes | ||
|
||
The `sys.schema_unused_indexes` view identifies indexes that have not been used since the startup of all TiDB instances. The view is defined based on system tables that have schema, table and column information. The view provides the full specification for the index including index, table and schema names. Users can query this view and decide on making indexes invisible or deleting them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The `sys.schema_unused_indexes` view identifies indexes that have not been used since the startup of all TiDB instances. The view is defined based on system tables that have schema, table and column information. The view provides the full specification for the index including index, table and schema names. Users can query this view and decide on making indexes invisible or deleting them. | |
The [`sys.schema_unused_indexes`](/sys-schema/sys-schema-unused-indexes.md) view identifies indexes that have not been used since the startup of all TiDB instances. The view is defined based on system tables that have schema, table and column information. The view provides the full specification for the index including index, table and schema names. Users can query this view and decide on making indexes invisible or deleting them. |
|
||
## View information_schema.tidb_index_usage | ||
|
||
This table provides metrics like access patterns, last access time, and rows accessed. Below, we show SQL query recommendations on how to identify unused or inefficient indexes based on this table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This table provides metrics like access patterns, last access time, and rows accessed. Below, we show SQL query recommendations on how to identify unused or inefficient indexes based on this table. | |
[`information_schema.tidb_index_usage`](/information-schema/information-schema-tidb-indexes.md) provides metrics including selectivity buckets, last access time, and rows accessed. Below example shows the queries to identify unused or inefficient indexes based on this table. |
WHERE last_access_time IS NULL | ||
OR last_access_time < NOW() - INTERVAL 30 DAY; | ||
|
||
-- Find indexes with low efficiency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-- Find indexes with low efficiency | |
-- Find the indexes that are always scanned with over 50% of total records. |
WHERE last_access_time IS NOT NULL AND percentage_access_0 + percentage_access_0_1 + percentage_access_1_10 + percentage_access_10_20 + percentage_access_20_50 = 0; | ||
``` | ||
|
||
Users should be aware that the data in `tidb_index_usage` may be delayed by up to 5 minutes, and the usage data is reset whenever a TiDB node restarts. Additionally, index usage is only recorded if the table has valid statistics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users should be aware that the data in `tidb_index_usage` may be delayed by up to 5 minutes, and the usage data is reset whenever a TiDB node restarts. Additionally, index usage is only recorded if the table has valid statistics. | |
> **Note:** | |
> | |
> Users should be aware that the data in `tidb_index_usage` may be delayed by up to 5 minutes, and the usage data is reset whenever a TiDB node restarts. Additionally, index usage is only recorded if the table has valid statistics. |
First-time contributors' checklist
What is changed, added or deleted? (Required)
Which TiDB version(s) do your changes apply to? (Required)
Tips for choosing the affected version(s):
By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.
For details, see tips for choosing the affected versions.
What is the related PR or file link(s)?
Do your changes match any of the following descriptions?