Skip to content

Commit

Permalink
cloud: Add doc about learning vector index build progress (#18426)
Browse files Browse the repository at this point in the history
  • Loading branch information
breezewish authored Aug 5, 2024
1 parent fd3764c commit 69a1890
Show file tree
Hide file tree
Showing 2 changed files with 34 additions and 0 deletions.
4 changes: 4 additions & 0 deletions tidb-cloud/vector-search-improve-performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ TiDB Vector Search allows you to perform ANN queries that search for results sim

The [vector search index](/tidb-cloud/vector-search-index.md) dramatically improves the performance of vector search queries, usually by 10x or more, with a trade-off of only a small decrease of recall rate.

## Ensure vector indexes are fully built

Vector indexes are built asynchronously. Until all vector data is indexed, vector search performance is suboptimal. To check the index build progress, see [View index build progress](/tidb-cloud/vector-search-index.md#view-index-build-progress).

## Reduce vector dimensions or shorten embeddings

The computational complexity of vector search indexing and queries increases significantly as the size of vectors grows, necessitating more floating point comparisons.
Expand Down
30 changes: 30 additions & 0 deletions tidb-cloud/vector-search-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,36 @@ LIMIT 5;

See [Table Partitioning](/partitioned-table.md) for more information.

## View index build progress

Unlike other indexes, vector indexes are built asynchronously. Therefore, vector indexes might not be immediately available after bulk data insertion. This does not affect data correctness or consistency, and you can perform vector searches at any time and get complete results. However, performance will be suboptimal until vector indexes are fully built.

To view the index build progress, you can query the `INFORMATION_SCHEMA.TIFLASH_INDEXES` table as follows:

```sql
SELECT * FROM INFORMATION_SCHEMA.TIFLASH_INDEXES;
+---------------+------------+----------------+----------+--------------------+-------------+-----------+------------+---------------------+-------------------------+--------------------+------------------------+------------------+
| TIDB_DATABASE | TIDB_TABLE | TIDB_PARTITION | TABLE_ID | BELONGING_TABLE_ID | COLUMN_NAME | COLUMN_ID | INDEX_KIND | ROWS_STABLE_INDEXED | ROWS_STABLE_NOT_INDEXED | ROWS_DELTA_INDEXED | ROWS_DELTA_NOT_INDEXED | TIFLASH_INSTANCE |
+---------------+------------+----------------+----------+--------------------+-------------+-----------+------------+---------------------+-------------------------+--------------------+------------------------+------------------+
| test | sample | NULL | 106 | -1 | vec | 2 | HNSW | 0 | 13000 | 0 | 2000 | store-6ba728d2 |
| test | sample | NULL | 106 | -1 | vec | 2 | HNSW | 10500 | 0 | 0 | 4500 | store-7000164f |
+---------------+------------+----------------+----------+--------------------+-------------+-----------+------------+---------------------+-------------------------+--------------------+------------------------+------------------+
```

- The `ROWS_STABLE_INDEXED` and `ROWS_STABLE_NOT_INDEXED` columns show the index build progress. When `ROWS_STABLE_NOT_INDEXED` becomes 0, the index build is complete.

As a reference, indexing a 500 MiB vector dataset might take up to 20 minutes. The indexer can run in parallel for multiple tables. Currently, adjusting the indexer priority or speed is not supported.

- The `ROWS_DELTA_NOT_INDEXED` column shows the number of rows in the Delta layer. The Delta layer stores _recently_ inserted or updated rows and is periodically merged into the Stable layer according to the write workload. This merge process is called Compaction.

The Delta layer is always not indexed. To achieve optimal performance, you can force the merge of the Delta layer into the Stable layer so that all data can be indexed:

```sql
ALTER TABLE <TABLE_NAME> COMPACT;
```

For more information, see [`ALTER TABLE ... COMPACT`](/sql-statements/sql-statement-alter-table-compact.md).

## Check whether the vector index is used

Use the [`EXPLAIN`](/sql-statements/sql-statement-explain.md) or [`EXPLAIN ANALYZE`](/sql-statements/sql-statement-explain-analyze.md) statement to check whether this query is using the vector index. When `annIndex:` is presented in the `operator info` column for the `TableFullScan` executor, it means this table scan is utilizing the vector index.
Expand Down

0 comments on commit 69a1890

Please sign in to comment.