Skip to content

Commit 69a1890

Browse files
authored
cloud: Add doc about learning vector index build progress (#18426)
1 parent fd3764c commit 69a1890

File tree

2 files changed

+34
-0
lines changed

2 files changed

+34
-0
lines changed

tidb-cloud/vector-search-improve-performance.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,10 @@ TiDB Vector Search allows you to perform ANN queries that search for results sim
1111

1212
The [vector search index](/tidb-cloud/vector-search-index.md) dramatically improves the performance of vector search queries, usually by 10x or more, with a trade-off of only a small decrease of recall rate.
1313

14+
## Ensure vector indexes are fully built
15+
16+
Vector indexes are built asynchronously. Until all vector data is indexed, vector search performance is suboptimal. To check the index build progress, see [View index build progress](/tidb-cloud/vector-search-index.md#view-index-build-progress).
17+
1418
## Reduce vector dimensions or shorten embeddings
1519

1620
The computational complexity of vector search indexing and queries increases significantly as the size of vectors grows, necessitating more floating point comparisons.

tidb-cloud/vector-search-index.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,36 @@ LIMIT 5;
133133

134134
See [Table Partitioning](/partitioned-table.md) for more information.
135135

136+
## View index build progress
137+
138+
Unlike other indexes, vector indexes are built asynchronously. Therefore, vector indexes might not be immediately available after bulk data insertion. This does not affect data correctness or consistency, and you can perform vector searches at any time and get complete results. However, performance will be suboptimal until vector indexes are fully built.
139+
140+
To view the index build progress, you can query the `INFORMATION_SCHEMA.TIFLASH_INDEXES` table as follows:
141+
142+
```sql
143+
SELECT * FROM INFORMATION_SCHEMA.TIFLASH_INDEXES;
144+
+---------------+------------+----------------+----------+--------------------+-------------+-----------+------------+---------------------+-------------------------+--------------------+------------------------+------------------+
145+
| TIDB_DATABASE | TIDB_TABLE | TIDB_PARTITION | TABLE_ID | BELONGING_TABLE_ID | COLUMN_NAME | COLUMN_ID | INDEX_KIND | ROWS_STABLE_INDEXED | ROWS_STABLE_NOT_INDEXED | ROWS_DELTA_INDEXED | ROWS_DELTA_NOT_INDEXED | TIFLASH_INSTANCE |
146+
+---------------+------------+----------------+----------+--------------------+-------------+-----------+------------+---------------------+-------------------------+--------------------+------------------------+------------------+
147+
| test | sample | NULL | 106 | -1 | vec | 2 | HNSW | 0 | 13000 | 0 | 2000 | store-6ba728d2 |
148+
| test | sample | NULL | 106 | -1 | vec | 2 | HNSW | 10500 | 0 | 0 | 4500 | store-7000164f |
149+
+---------------+------------+----------------+----------+--------------------+-------------+-----------+------------+---------------------+-------------------------+--------------------+------------------------+------------------+
150+
```
151+
152+
- The `ROWS_STABLE_INDEXED` and `ROWS_STABLE_NOT_INDEXED` columns show the index build progress. When `ROWS_STABLE_NOT_INDEXED` becomes 0, the index build is complete.
153+
154+
As a reference, indexing a 500 MiB vector dataset might take up to 20 minutes. The indexer can run in parallel for multiple tables. Currently, adjusting the indexer priority or speed is not supported.
155+
156+
- The `ROWS_DELTA_NOT_INDEXED` column shows the number of rows in the Delta layer. The Delta layer stores _recently_ inserted or updated rows and is periodically merged into the Stable layer according to the write workload. This merge process is called Compaction.
157+
158+
The Delta layer is always not indexed. To achieve optimal performance, you can force the merge of the Delta layer into the Stable layer so that all data can be indexed:
159+
160+
```sql
161+
ALTER TABLE <TABLE_NAME> COMPACT;
162+
```
163+
164+
For more information, see [`ALTER TABLE ... COMPACT`](/sql-statements/sql-statement-alter-table-compact.md).
165+
136166
## Check whether the vector index is used
137167

138168
Use the [`EXPLAIN`](/sql-statements/sql-statement-explain.md) or [`EXPLAIN ANALYZE`](/sql-statements/sql-statement-explain-analyze.md) statement to check whether this query is using the vector index. When `annIndex:` is presented in the `operator info` column for the `TableFullScan` executor, it means this table scan is utilizing the vector index.

0 commit comments

Comments
 (0)