Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new Index advisor user doc #19867

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

qw4990
Copy link
Contributor

@qw4990 qw4990 commented Jan 2, 2025

First-time contributors' checklist

What is changed, added or deleted? (Required)

Which TiDB version(s) do your changes apply to? (Required)

Tips for choosing the affected version(s):

By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.

For details, see tips for choosing the affected versions.

  • master (the latest development version)
  • v9.0 (TiDB 9.0 versions)
  • v8.5 (TiDB 8.5 versions)
  • v8.4 (TiDB 8.4 versions)
  • v8.3 (TiDB 8.3 versions)
  • v8.1 (TiDB 8.1 versions)
  • v7.5 (TiDB 7.5 versions)
  • v7.1 (TiDB 7.1 versions)
  • v6.5 (TiDB 6.5 versions)
  • v6.1 (TiDB 6.1 versions)
  • v5.4 (TiDB 5.4 versions)

What is the related PR or file link(s)?

  • This PR is translated from:
  • Other reference link(s):

Do your changes match any of the following descriptions?

  • Delete files
  • Change aliases
  • Need modification after applied to another branch
  • Might cause conflicts after applied to another branch

@ti-chi-bot ti-chi-bot bot added missing-translation-status This PR does not have translation status info. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 2, 2025
@Oreoxmt Oreoxmt self-assigned this Jan 2, 2025
@Oreoxmt Oreoxmt self-requested a review January 2, 2025 08:16
@Oreoxmt Oreoxmt added translation/doing This PR's assignee is translating this PR. v9.0 labels Jan 2, 2025
@ti-chi-bot ti-chi-bot bot removed the missing-translation-status This PR does not have translation status info. label Jan 2, 2025
@Oreoxmt Oreoxmt added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 2, 2025
@Oreoxmt
Copy link
Collaborator

Oreoxmt commented Jan 2, 2025

@qw4990 Could you please invite a tech reviewer? Thanks!

@Oreoxmt Oreoxmt added the needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. label Jan 2, 2025
Copy link

ti-chi-bot bot commented Jan 2, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from oreoxmt, ensuring that each of them provides their approval before proceeding. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

ti-chi-bot bot commented Jan 5, 2025

@qw4990: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-verify e9126e0 link true /test pull-verify

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.


# Recommend Index command

The SQL syntax for the new SQL command `Recommend Index` is shown below. The command can be used for a single SQL query (`For` option) or for a workload (`Run` option). The comamnd also supports setting options for subsequent runs of the command.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The SQL syntax for the new SQL command `Recommend Index` is shown below. The command can be used for a single SQL query (`For` option) or for a workload (`Run` option). The comamnd also supports setting options for subsequent runs of the command.
SQL command `RECOMMEND INDEX` is introduced for index advisor tasks. Sub command `RUN` explores historical workloads and saves the recommendations in system tables. With option `FOR`, the command targets particular SQL statement even if it was not executed in the past. The command also accepts extra options for advance control.

The SQL syntax for the new SQL command `Recommend Index` is shown below. The command can be used for a single SQL query (`For` option) or for a workload (`Run` option). The comamnd also supports setting options for subsequent runs of the command.

```sql
Recommend Index [Run | For <SQL> | <Options>]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Recommend Index [Run | For <SQL> | <Options>]
Recommend Index Run [ For <SQL> ] [<Options>]


```sql
mysql> CREATE TABLE t(a int, b int, c int);
mysql> RECOMMEND INDEX RUN for "select a, b from t where a=1 and b=1";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it command accept "\G" as delimiter? column-style printing improves readability for single output.

mysql> CREATE TABLE t(a int, b int, c int);
mysql> RECOMMEND INDEX RUN for "select a, b from t where a=1 and b=1";
+----------+-------+------------+---------------+------------+----------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+---------------------------------+
| database | table | index_name | index_columns | index_size | reason | top_impacted_query | create_index_statement |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • If "index_size" is not exact value, better to rename it with "est_index_size".
  • It's recommended to append meaningful string as suffix of index name, so that we are able to understand it's from index advisor. Name conflict can be avoid as well. For example "idx_..._<trailing 8 chars in plan digest>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index_size is the maximum number of columns an index can have. May be we can rename to reflect that, say "max_index_columns".

+----------+-------+------------+---------------+------------+----------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+---------------------------------+
| database | table | index_name | index_columns | index_size | reason | top_impacted_query | create_index_statement |
+----------+-------+------------+---------------+------------+----------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+---------------------------------+
| test | t | idx_a_b | a,b | 19872 | Column [a b] appear in Equal or Range Predicate clause(s) in query: select `a` , `b` from `test` . `t` where `a` = ? and `b` = ? | [{"Query":"SELECT `a`,`b` FROM `test`.`t` WHERE `a` = 1 AND `b` = 1","Improvement":0.999994}] | CREATE INDEX idx_a_b ON t(a,b); |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shall explain how we decide "Improvement" later in this section.


```sql
Recommend Index Set <option> = <value>;
Recommend Index Show;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this "Recommend Index Show Options"? We'd better reserve "Recommend Index Show " for future usage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree since it is only used to show options.


Here are some current limitations of the index recommendation feature, which we plan to address in the future:
1. It does not support prepared statements, meaning `RECOMMEND INDEX RUN` cannot recommend indexes for queries executed through the `Prepare` and `Execute` protocol.
2. It does not provide recommendations for deleting indexes. We need to merge the removing index logic (see below) to the `RECOOEND` comamnd in the future.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. It does not provide recommendations for deleting indexes. We need to merge the removing index logic (see below) to the `RECOOEND` comamnd in the future.
2. It does not provide recommendations for deleting indexes. We need to merge the removing index logic (see below) to the `RECOMMEND` command in the future.

Typo

3. A UI for the Index Advisor will be available in the future.

# Removing Unused Indexes
TiDB provides two system views/tables to help users identify inactive indexes in their workload. Users can either mark such indexes as invisible as a transitional state before dropping them or drop them right away.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
TiDB provides two system views/tables to help users identify inactive indexes in their workload. Users can either mark such indexes as invisible as a transitional state before dropping them or drop them right away.
For v8.0 or higher, TiDB provides two system views/tables to help users identify inactive indexes in their workload. Users can manage to drop these indexes to save the storage and overhead caused. For online systems, it's highly recommended to make the target indexes invisible and observe the impact for one business cycle before dropping them completely.

## View sys.schema_unused_indexes

The `sys.schema_unused_indexes` view identifies indexes that have not been used since the startup of all TiDB instances. The view is defined based on system tables that have schema, table and column information. The view provides the full specification for the index including index, table and schema names. Users can query this view and decide on making indexes invisible or deleting them.

Copy link
Contributor

@songrijie songrijie Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> **Warning:**
>
> As this view shows the unused indexes since last startup of all TiDB instances, please make sure the TiDB instances are alive long enough. Otherwise, it could show false candidates in case certain workloads are not included. SQL `select START_TIME,UPTIME from INFORMATION_SCHEMA.CLUSTER_INFO where TYPE='tidb';` helps identify the ages of all TiDB instances.


## View sys.schema_unused_indexes

The `sys.schema_unused_indexes` view identifies indexes that have not been used since the startup of all TiDB instances. The view is defined based on system tables that have schema, table and column information. The view provides the full specification for the index including index, table and schema names. Users can query this view and decide on making indexes invisible or deleting them.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `sys.schema_unused_indexes` view identifies indexes that have not been used since the startup of all TiDB instances. The view is defined based on system tables that have schema, table and column information. The view provides the full specification for the index including index, table and schema names. Users can query this view and decide on making indexes invisible or deleting them.
The [`sys.schema_unused_indexes`](/sys-schema/sys-schema-unused-indexes.md) view identifies indexes that have not been used since the startup of all TiDB instances. The view is defined based on system tables that have schema, table and column information. The view provides the full specification for the index including index, table and schema names. Users can query this view and decide on making indexes invisible or deleting them.


## View information_schema.tidb_index_usage

This table provides metrics like access patterns, last access time, and rows accessed. Below, we show SQL query recommendations on how to identify unused or inefficient indexes based on this table.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This table provides metrics like access patterns, last access time, and rows accessed. Below, we show SQL query recommendations on how to identify unused or inefficient indexes based on this table.
[`information_schema.tidb_index_usage`](/information-schema/information-schema-tidb-indexes.md) provides metrics including selectivity buckets, last access time, and rows accessed. Below example shows the queries to identify unused or inefficient indexes based on this table.

WHERE last_access_time IS NULL
OR last_access_time < NOW() - INTERVAL 30 DAY;

-- Find indexes with low efficiency
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-- Find indexes with low efficiency
-- Find the indexes that are always scanned with over 50% of total records.

WHERE last_access_time IS NOT NULL AND percentage_access_0 + percentage_access_0_1 + percentage_access_1_10 + percentage_access_10_20 + percentage_access_20_50 = 0;
```

Users should be aware that the data in `tidb_index_usage` may be delayed by up to 5 minutes, and the usage data is reset whenever a TiDB node restarts. Additionally, index usage is only recorded if the table has valid statistics.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Users should be aware that the data in `tidb_index_usage` may be delayed by up to 5 minutes, and the usage data is reset whenever a TiDB node restarts. Additionally, index usage is only recorded if the table has valid statistics.
> **Note:**
>
> Users should be aware that the data in `tidb_index_usage` may be delayed by up to 5 minutes, and the usage data is reset whenever a TiDB node restarts. Additionally, index usage is only recorded if the table has valid statistics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. translation/doing This PR's assignee is translating this PR. v9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants