From ac62eef2beb53febe701eeb1315f7f342e33c87d Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Mon, 2 Sep 2024 17:51:17 +0800 Subject: [PATCH 01/16] reuse vector search docs --- {tidb-cloud => vector-search}/vector-search-changelogs.md | 0 {tidb-cloud => vector-search}/vector-search-data-types.md | 0 .../vector-search-functions-and-operators.md | 0 .../vector-search-get-started-using-python.md | 0 .../vector-search-get-started-using-sql.md | 0 .../vector-search-improve-performance.md | 0 .../vector-search-integrate-with-django-orm.md | 0 .../vector-search-integrate-with-jinaai-embedding.md | 0 .../vector-search-integrate-with-langchain.md | 0 .../vector-search-integrate-with-llamaindex.md | 0 .../vector-search-integrate-with-peewee.md | 0 .../vector-search-integrate-with-sqlalchemy.md | 0 .../vector-search-integration-overview.md | 0 {tidb-cloud => vector-search}/vector-search-limitations.md | 0 {tidb-cloud => vector-search}/vector-search-overview.md | 0 15 files changed, 0 insertions(+), 0 deletions(-) rename {tidb-cloud => vector-search}/vector-search-changelogs.md (100%) rename {tidb-cloud => vector-search}/vector-search-data-types.md (100%) rename {tidb-cloud => vector-search}/vector-search-functions-and-operators.md (100%) rename {tidb-cloud => vector-search}/vector-search-get-started-using-python.md (100%) rename {tidb-cloud => vector-search}/vector-search-get-started-using-sql.md (100%) rename {tidb-cloud => vector-search}/vector-search-improve-performance.md (100%) rename {tidb-cloud => vector-search}/vector-search-integrate-with-django-orm.md (100%) rename {tidb-cloud => vector-search}/vector-search-integrate-with-jinaai-embedding.md (100%) rename {tidb-cloud => vector-search}/vector-search-integrate-with-langchain.md (100%) rename {tidb-cloud => vector-search}/vector-search-integrate-with-llamaindex.md (100%) rename {tidb-cloud => vector-search}/vector-search-integrate-with-peewee.md (100%) rename {tidb-cloud => vector-search}/vector-search-integrate-with-sqlalchemy.md (100%) rename {tidb-cloud => vector-search}/vector-search-integration-overview.md (100%) rename {tidb-cloud => vector-search}/vector-search-limitations.md (100%) rename {tidb-cloud => vector-search}/vector-search-overview.md (100%) diff --git a/tidb-cloud/vector-search-changelogs.md b/vector-search/vector-search-changelogs.md similarity index 100% rename from tidb-cloud/vector-search-changelogs.md rename to vector-search/vector-search-changelogs.md diff --git a/tidb-cloud/vector-search-data-types.md b/vector-search/vector-search-data-types.md similarity index 100% rename from tidb-cloud/vector-search-data-types.md rename to vector-search/vector-search-data-types.md diff --git a/tidb-cloud/vector-search-functions-and-operators.md b/vector-search/vector-search-functions-and-operators.md similarity index 100% rename from tidb-cloud/vector-search-functions-and-operators.md rename to vector-search/vector-search-functions-and-operators.md diff --git a/tidb-cloud/vector-search-get-started-using-python.md b/vector-search/vector-search-get-started-using-python.md similarity index 100% rename from tidb-cloud/vector-search-get-started-using-python.md rename to vector-search/vector-search-get-started-using-python.md diff --git a/tidb-cloud/vector-search-get-started-using-sql.md b/vector-search/vector-search-get-started-using-sql.md similarity index 100% rename from tidb-cloud/vector-search-get-started-using-sql.md rename to vector-search/vector-search-get-started-using-sql.md diff --git a/tidb-cloud/vector-search-improve-performance.md b/vector-search/vector-search-improve-performance.md similarity index 100% rename from tidb-cloud/vector-search-improve-performance.md rename to vector-search/vector-search-improve-performance.md diff --git a/tidb-cloud/vector-search-integrate-with-django-orm.md b/vector-search/vector-search-integrate-with-django-orm.md similarity index 100% rename from tidb-cloud/vector-search-integrate-with-django-orm.md rename to vector-search/vector-search-integrate-with-django-orm.md diff --git a/tidb-cloud/vector-search-integrate-with-jinaai-embedding.md b/vector-search/vector-search-integrate-with-jinaai-embedding.md similarity index 100% rename from tidb-cloud/vector-search-integrate-with-jinaai-embedding.md rename to vector-search/vector-search-integrate-with-jinaai-embedding.md diff --git a/tidb-cloud/vector-search-integrate-with-langchain.md b/vector-search/vector-search-integrate-with-langchain.md similarity index 100% rename from tidb-cloud/vector-search-integrate-with-langchain.md rename to vector-search/vector-search-integrate-with-langchain.md diff --git a/tidb-cloud/vector-search-integrate-with-llamaindex.md b/vector-search/vector-search-integrate-with-llamaindex.md similarity index 100% rename from tidb-cloud/vector-search-integrate-with-llamaindex.md rename to vector-search/vector-search-integrate-with-llamaindex.md diff --git a/tidb-cloud/vector-search-integrate-with-peewee.md b/vector-search/vector-search-integrate-with-peewee.md similarity index 100% rename from tidb-cloud/vector-search-integrate-with-peewee.md rename to vector-search/vector-search-integrate-with-peewee.md diff --git a/tidb-cloud/vector-search-integrate-with-sqlalchemy.md b/vector-search/vector-search-integrate-with-sqlalchemy.md similarity index 100% rename from tidb-cloud/vector-search-integrate-with-sqlalchemy.md rename to vector-search/vector-search-integrate-with-sqlalchemy.md diff --git a/tidb-cloud/vector-search-integration-overview.md b/vector-search/vector-search-integration-overview.md similarity index 100% rename from tidb-cloud/vector-search-integration-overview.md rename to vector-search/vector-search-integration-overview.md diff --git a/tidb-cloud/vector-search-limitations.md b/vector-search/vector-search-limitations.md similarity index 100% rename from tidb-cloud/vector-search-limitations.md rename to vector-search/vector-search-limitations.md diff --git a/tidb-cloud/vector-search-overview.md b/vector-search/vector-search-overview.md similarity index 100% rename from tidb-cloud/vector-search-overview.md rename to vector-search/vector-search-overview.md From 47e1f705f404a1b6c87038d4825103f0d1a0c457 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Tue, 3 Sep 2024 10:28:47 +0800 Subject: [PATCH 02/16] move vector search files --- {vector-search => tidb-cloud}/vector-search-changelogs.md | 0 .../vector-search-data-types.md => vector-search-data-types.md | 0 ...s-and-operators.md => vector-search-functions-and-operators.md | 0 ...d-using-python.md => vector-search-get-started-using-python.md | 0 ...started-using-sql.md => vector-search-get-started-using-sql.md | 0 ...improve-performance.md => vector-search-improve-performance.md | 0 ...th-django-orm.md => vector-search-integrate-with-django-orm.md | 0 ...bedding.md => vector-search-integrate-with-jinaai-embedding.md | 0 ...with-langchain.md => vector-search-integrate-with-langchain.md | 0 ...th-llamaindex.md => vector-search-integrate-with-llamaindex.md | 0 ...grate-with-peewee.md => vector-search-integrate-with-peewee.md | 0 ...th-sqlalchemy.md => vector-search-integrate-with-sqlalchemy.md | 0 ...tegration-overview.md => vector-search-integration-overview.md | 0 .../vector-search-limitations.md => vector-search-limitations.md | 0 .../vector-search-overview.md => vector-search-overview.md | 0 15 files changed, 0 insertions(+), 0 deletions(-) rename {vector-search => tidb-cloud}/vector-search-changelogs.md (100%) rename vector-search/vector-search-data-types.md => vector-search-data-types.md (100%) rename vector-search/vector-search-functions-and-operators.md => vector-search-functions-and-operators.md (100%) rename vector-search/vector-search-get-started-using-python.md => vector-search-get-started-using-python.md (100%) rename vector-search/vector-search-get-started-using-sql.md => vector-search-get-started-using-sql.md (100%) rename vector-search/vector-search-improve-performance.md => vector-search-improve-performance.md (100%) rename vector-search/vector-search-integrate-with-django-orm.md => vector-search-integrate-with-django-orm.md (100%) rename vector-search/vector-search-integrate-with-jinaai-embedding.md => vector-search-integrate-with-jinaai-embedding.md (100%) rename vector-search/vector-search-integrate-with-langchain.md => vector-search-integrate-with-langchain.md (100%) rename vector-search/vector-search-integrate-with-llamaindex.md => vector-search-integrate-with-llamaindex.md (100%) rename vector-search/vector-search-integrate-with-peewee.md => vector-search-integrate-with-peewee.md (100%) rename vector-search/vector-search-integrate-with-sqlalchemy.md => vector-search-integrate-with-sqlalchemy.md (100%) rename vector-search/vector-search-integration-overview.md => vector-search-integration-overview.md (100%) rename vector-search/vector-search-limitations.md => vector-search-limitations.md (100%) rename vector-search/vector-search-overview.md => vector-search-overview.md (100%) diff --git a/vector-search/vector-search-changelogs.md b/tidb-cloud/vector-search-changelogs.md similarity index 100% rename from vector-search/vector-search-changelogs.md rename to tidb-cloud/vector-search-changelogs.md diff --git a/vector-search/vector-search-data-types.md b/vector-search-data-types.md similarity index 100% rename from vector-search/vector-search-data-types.md rename to vector-search-data-types.md diff --git a/vector-search/vector-search-functions-and-operators.md b/vector-search-functions-and-operators.md similarity index 100% rename from vector-search/vector-search-functions-and-operators.md rename to vector-search-functions-and-operators.md diff --git a/vector-search/vector-search-get-started-using-python.md b/vector-search-get-started-using-python.md similarity index 100% rename from vector-search/vector-search-get-started-using-python.md rename to vector-search-get-started-using-python.md diff --git a/vector-search/vector-search-get-started-using-sql.md b/vector-search-get-started-using-sql.md similarity index 100% rename from vector-search/vector-search-get-started-using-sql.md rename to vector-search-get-started-using-sql.md diff --git a/vector-search/vector-search-improve-performance.md b/vector-search-improve-performance.md similarity index 100% rename from vector-search/vector-search-improve-performance.md rename to vector-search-improve-performance.md diff --git a/vector-search/vector-search-integrate-with-django-orm.md b/vector-search-integrate-with-django-orm.md similarity index 100% rename from vector-search/vector-search-integrate-with-django-orm.md rename to vector-search-integrate-with-django-orm.md diff --git a/vector-search/vector-search-integrate-with-jinaai-embedding.md b/vector-search-integrate-with-jinaai-embedding.md similarity index 100% rename from vector-search/vector-search-integrate-with-jinaai-embedding.md rename to vector-search-integrate-with-jinaai-embedding.md diff --git a/vector-search/vector-search-integrate-with-langchain.md b/vector-search-integrate-with-langchain.md similarity index 100% rename from vector-search/vector-search-integrate-with-langchain.md rename to vector-search-integrate-with-langchain.md diff --git a/vector-search/vector-search-integrate-with-llamaindex.md b/vector-search-integrate-with-llamaindex.md similarity index 100% rename from vector-search/vector-search-integrate-with-llamaindex.md rename to vector-search-integrate-with-llamaindex.md diff --git a/vector-search/vector-search-integrate-with-peewee.md b/vector-search-integrate-with-peewee.md similarity index 100% rename from vector-search/vector-search-integrate-with-peewee.md rename to vector-search-integrate-with-peewee.md diff --git a/vector-search/vector-search-integrate-with-sqlalchemy.md b/vector-search-integrate-with-sqlalchemy.md similarity index 100% rename from vector-search/vector-search-integrate-with-sqlalchemy.md rename to vector-search-integrate-with-sqlalchemy.md diff --git a/vector-search/vector-search-integration-overview.md b/vector-search-integration-overview.md similarity index 100% rename from vector-search/vector-search-integration-overview.md rename to vector-search-integration-overview.md diff --git a/vector-search/vector-search-limitations.md b/vector-search-limitations.md similarity index 100% rename from vector-search/vector-search-limitations.md rename to vector-search-limitations.md diff --git a/vector-search/vector-search-overview.md b/vector-search-overview.md similarity index 100% rename from vector-search/vector-search-overview.md rename to vector-search-overview.md From 209809bdf6066e0440577a46594d036cd5b4f8cf Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Tue, 3 Sep 2024 10:35:14 +0800 Subject: [PATCH 03/16] change the links to vector search docs --- TOC-tidb-cloud.md | 30 +++++++++---------- tidb-cloud/data-service-manage-endpoint.md | 2 +- tidb-cloud/tidb-cloud-release-notes.md | 12 ++++---- tidb-cloud/vector-search-index.md | 14 ++++----- vector-search-data-types.md | 12 ++++---- vector-search-functions-and-operators.md | 12 ++++---- vector-search-get-started-using-python.md | 8 ++--- vector-search-get-started-using-sql.md | 10 +++---- vector-search-integrate-with-django-orm.md | 2 +- ...-search-integrate-with-jinaai-embedding.md | 2 +- vector-search-integrate-with-langchain.md | 4 +-- vector-search-integrate-with-llamaindex.md | 4 +-- vector-search-integrate-with-peewee.md | 4 +-- vector-search-integrate-with-sqlalchemy.md | 4 +-- vector-search-integration-overview.md | 6 ++-- vector-search-limitations.md | 2 +- vector-search-overview.md | 8 ++--- 17 files changed, 68 insertions(+), 68 deletions(-) diff --git a/TOC-tidb-cloud.md b/TOC-tidb-cloud.md index a09a6fcbd2168..7464ec1f3f7c0 100644 --- a/TOC-tidb-cloud.md +++ b/TOC-tidb-cloud.md @@ -10,7 +10,7 @@ - [Roadmap](/tidb-cloud/tidb-cloud-roadmap.md) - Get Started - [Try Out TiDB Cloud](/tidb-cloud/tidb-cloud-quickstart.md) - - [Try Out TiDB + AI](/tidb-cloud/vector-search-get-started-using-python.md) + - [Try Out TiDB + AI](/vector-search-get-started-using-python.md) - [Try Out HTAP](/tidb-cloud/tidb-cloud-htap-quickstart.md) - [Try Out TiDB Cloud CLI](/tidb-cloud/get-started-with-cli.md) - [Perform a PoC](/tidb-cloud/tidb-cloud-poc.md) @@ -241,27 +241,27 @@ - Explore Data - [Chat2Query (Beta) in SQL Editor](/tidb-cloud/explore-data-with-chat2query.md) - Vector Search (Beta) - - [Overview](/tidb-cloud/vector-search-overview.md) + - [Overview](/vector-search-overview.md) - Get Started - - [Get Started with SQL](/tidb-cloud/vector-search-get-started-using-sql.md) - - [Get Started with Python](/tidb-cloud/vector-search-get-started-using-python.md) + - [Get Started with SQL](/vector-search-get-started-using-sql.md) + - [Get Started with Python](/vector-search-get-started-using-python.md) - Integrations - - [Overview](/tidb-cloud/vector-search-integration-overview.md) + - [Overview](/vector-search-integration-overview.md) - AI Frameworks - - [LlamaIndex](/tidb-cloud/vector-search-integrate-with-llamaindex.md) - - [Langchain](/tidb-cloud/vector-search-integrate-with-langchain.md) + - [LlamaIndex](/vector-search-integrate-with-llamaindex.md) + - [Langchain](/vector-search-integrate-with-langchain.md) - Embedding Models/Services - - [Jina AI](/tidb-cloud/vector-search-integrate-with-jinaai-embedding.md) + - [Jina AI](/vector-search-integrate-with-jinaai-embedding.md) - ORM Libraries - - [SQLAlchemy](/tidb-cloud/vector-search-integrate-with-sqlalchemy.md) - - [peewee](/tidb-cloud/vector-search-integrate-with-peewee.md) - - [Django ORM](/tidb-cloud/vector-search-integrate-with-django-orm.md) + - [SQLAlchemy](/vector-search-integrate-with-sqlalchemy.md) + - [peewee](/vector-search-integrate-with-peewee.md) + - [Django ORM](/vector-search-integrate-with-django-orm.md) - Reference - - [Vector Data Types](/tidb-cloud/vector-search-data-types.md) - - [Vector Functions and Operators](/tidb-cloud/vector-search-functions-and-operators.md) + - [Vector Data Types](/vector-search-data-types.md) + - [Vector Functions and Operators](/vector-search-functions-and-operators.md) - [Vector Index](/tidb-cloud/vector-search-index.md) - - [Improve Performance](/tidb-cloud/vector-search-improve-performance.md) - - [Limitations](/tidb-cloud/vector-search-limitations.md) + - [Improve Performance](/vector-search-improve-performance.md) + - [Limitations](/vector-search-limitations.md) - [Changelogs](/tidb-cloud/vector-search-changelogs.md) - Data Service (Beta) - [Overview](/tidb-cloud/data-service-overview.md) diff --git a/tidb-cloud/data-service-manage-endpoint.md b/tidb-cloud/data-service-manage-endpoint.md index b0e154c8e38ea..0814155c75fc6 100644 --- a/tidb-cloud/data-service-manage-endpoint.md +++ b/tidb-cloud/data-service-manage-endpoint.md @@ -44,7 +44,7 @@ In TiDB Cloud Data Service, you can generate one or multiple endpoints automatic For each operation you select, TiDB Cloud Data Service will generate a corresponding endpoint. If you select a batch operation (such as `POST (Batch Create)`), the generated endpoint lets you operate on multiple rows in a single request. - If the table you selected contains [vector data types](/tidb-cloud/vector-search-data-types.md), you can enable the **Vector Search Operations** option and select a vector distance function to generate a vector search endpoint that automatically calculates vector distances based on your selected distance function. The supported [vector distance functions](/tidb-cloud/vector-search-functions-and-operators.md) include the following: + If the table you selected contains [vector data types](/vector-search-data-types.md), you can enable the **Vector Search Operations** option and select a vector distance function to generate a vector search endpoint that automatically calculates vector distances based on your selected distance function. The supported [vector distance functions](/vector-search-functions-and-operators.md) include the following: - `VEC_L2_DISTANCE` (default): calculates the L2 distance (Euclidean distance) between two vectors. - `VEC_COSINE_DISTANCE`: calculates the cosine distance between two vectors. diff --git a/tidb-cloud/tidb-cloud-release-notes.md b/tidb-cloud/tidb-cloud-release-notes.md index 3724a880f9217..a58e28b067661 100644 --- a/tidb-cloud/tidb-cloud-release-notes.md +++ b/tidb-cloud/tidb-cloud-release-notes.md @@ -40,7 +40,7 @@ This page lists the release notes of [TiDB Cloud](https://www.pingcap.com/tidb-c - [Data Service (beta)](https://tidbcloud.com/console/data-service) supports automatically generating vector search endpoints. - If your table contains [vector data types](/tidb-cloud/vector-search-data-types.md), you can automatically generate a vector search endpoint that calculates vector distances based on your selected distance function. + If your table contains [vector data types](/vector-search-data-types.md), you can automatically generate a vector search endpoint that calculates vector distances based on your selected distance function. This feature enables seamless integration with AI platforms such as [Dify](https://docs.dify.ai/guides/tools) and [GPTs](https://openai.com/blog/introducing-gpts), enhancing your applications with advanced natural language processing and AI capabilities for more complex tasks and intelligent solutions. @@ -86,12 +86,12 @@ This page lists the release notes of [TiDB Cloud](https://www.pingcap.com/tidb-c The vector search (beta) feature provides an advanced search solution for performing semantic similarity searches across various data types, including documents, images, audio, and video. This feature enables developers to easily build scalable applications with generative artificial intelligence (AI) capabilities using familiar MySQL skills. Key features include: - - [Vector data types](/tidb-cloud/vector-search-data-types.md), [vector index](/tidb-cloud/vector-search-index.md), and [vector functions and operators](/tidb-cloud/vector-search-functions-and-operators.md). - - Ecosystem integrations with [LangChain](/tidb-cloud/vector-search-integrate-with-langchain.md), [LlamaIndex](/tidb-cloud/vector-search-integrate-with-llamaindex.md), and [JinaAI](/tidb-cloud/vector-search-integrate-with-jinaai-embedding.md). - - Programming language support for Python: [SQLAlchemy](/tidb-cloud/vector-search-integrate-with-sqlalchemy.md), [Peewee](/tidb-cloud/vector-search-integrate-with-peewee.md), and [Django ORM](/tidb-cloud/vector-search-integrate-with-django-orm.md). - - Sample applications and tutorials: perform semantic searches for documents using [Python](/tidb-cloud/vector-search-get-started-using-python.md) or [SQL](/tidb-cloud/vector-search-get-started-using-sql.md). + - [Vector data types](/vector-search-data-types.md), [vector index](/tidb-cloud/vector-search-index.md), and [vector functions and operators](/vector-search-functions-and-operators.md). + - Ecosystem integrations with [LangChain](/vector-search-integrate-with-langchain.md), [LlamaIndex](/vector-search-integrate-with-llamaindex.md), and [JinaAI](/vector-search-integrate-with-jinaai-embedding.md). + - Programming language support for Python: [SQLAlchemy](/vector-search-integrate-with-sqlalchemy.md), [Peewee](/vector-search-integrate-with-peewee.md), and [Django ORM](/vector-search-integrate-with-django-orm.md). + - Sample applications and tutorials: perform semantic searches for documents using [Python](/vector-search-get-started-using-python.md) or [SQL](/vector-search-get-started-using-sql.md). - For more information, see [Vector search (beta) overview](/tidb-cloud/vector-search-overview.md). + For more information, see [Vector search (beta) overview](/vector-search-overview.md). - [TiDB Serverless](/tidb-cloud/select-cluster-tier.md#tidb-serverless) now offers weekly email reports for organization owners. diff --git a/tidb-cloud/vector-search-index.md b/tidb-cloud/vector-search-index.md index 5efbedfa48fdc..443c2f91bc8e8 100644 --- a/tidb-cloud/vector-search-index.md +++ b/tidb-cloud/vector-search-index.md @@ -7,7 +7,7 @@ summary: Learn how to build and use the vector search index to accelerate K-Near K-nearest neighbors (KNN) search is the problem of finding the K closest points for a given point in a vector space. The most straightforward approach to solving this problem is a brute force search, where the distance between all points in the vector space and the reference point is computed. This method guarantees perfect accuracy, but it is usually too slow for practical applications. Thus, nearest neighbors search problems are often solved with approximate algorithms. -In TiDB, you can create and utilize vector search indexes for such approximate nearest neighbor (ANN) searches over columns with [vector data types](/tidb-cloud/vector-search-data-types.md). By using vector search indexes, vector search queries could be finished in milliseconds. +In TiDB, you can create and utilize vector search indexes for such approximate nearest neighbor (ANN) searches over columns with [vector data types](/vector-search-data-types.md). By using vector search indexes, vector search queries could be finished in milliseconds. TiDB currently supports the following vector search index algorithms: @@ -21,7 +21,7 @@ TiDB currently supports the following vector search index algorithms: [HNSW](https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world) is one of the most popular vector indexing algorithms. The HNSW index provides good performance with relatively high accuracy (> 98% in typical cases). -To create an HNSW vector index, specify the index definition in the comment of a column with a [vector data type](/tidb-cloud/vector-search-data-types.md) when creating the table: +To create an HNSW vector index, specify the index definition in the comment of a column with a [vector data type](/vector-search-data-types.md) when creating the table: ```sql CREATE TABLE vector_table_with_index ( @@ -44,9 +44,9 @@ The vector index can only be created for fixed-dimensional vector columns like ` If you are using programming language SDKs or ORMs, refer to the following documentation for creating vector indexes: - Python: [TiDB Vector SDK for Python](https://github.com/pingcap/tidb-vector-python) -- Python: [SQLAlchemy](/tidb-cloud/vector-search-integrate-with-sqlalchemy.md) -- Python: [Peewee](/tidb-cloud/vector-search-integrate-with-peewee.md) -- Python: [Django](/tidb-cloud/vector-search-integrate-with-django-orm.md) +- Python: [SQLAlchemy](/vector-search-integrate-with-sqlalchemy.md) +- Python: [Peewee](/vector-search-integrate-with-peewee.md) +- Python: [Django](/vector-search-integrate-with-django-orm.md) Be aware of the following limitations when creating the vector index. These limitations might be removed in future releases: @@ -270,5 +270,5 @@ See [`EXPLAIN`](/sql-statements/sql-statement-explain.md), [`EXPLAIN ANALYZE`](/ ## See also -- [Improve Vector Search Performance](/tidb-cloud/vector-search-improve-performance.md) -- [Vector Data Types](/tidb-cloud/vector-search-data-types.md) +- [Improve Vector Search Performance](/vector-search-improve-performance.md) +- [Vector Data Types](/vector-search-data-types.md) diff --git a/vector-search-data-types.md b/vector-search-data-types.md index 542fd3327c873..742091b9bba6d 100644 --- a/vector-search-data-types.md +++ b/vector-search-data-types.md @@ -57,7 +57,7 @@ As dimension 3 is enforced for the `embedding` column in the preceding example, ERROR 1105 (HY000): vector has 2 dimensions, does not fit VECTOR(3) ``` -See [Vector Functions and Operators](/tidb-cloud/vector-search-functions-and-operators.md) for available functions and operators over the Vector data type. +See [Vector Functions and Operators](/vector-search-functions-and-operators.md) for available functions and operators over the Vector data type. See [Vector Search Index](/tidb-cloud/vector-search-index.md) for building and using a vector search index. @@ -79,7 +79,7 @@ However you cannot build a [Vector Search Index](/tidb-cloud/vector-search-index ## Comparison -You can compare vector data types using [comparison operators](/functions-and-operators/operators.md) such as `=`, `!=`, `<`, `>`, `<=`, and `>=`. For a complete list of comparison operators and functions for vector data types, see [Vector Functions and Operators](/tidb-cloud/vector-search-functions-and-operators.md). +You can compare vector data types using [comparison operators](/functions-and-operators/operators.md) such as `=`, `!=`, `<`, `>`, `<=`, and `>=`. For a complete list of comparison operators and functions for vector data types, see [Vector Functions and Operators](/vector-search-functions-and-operators.md). Vector data types are compared element-wise numerically. Examples: @@ -228,7 +228,7 @@ To cast vector into its string representation explicitly, use the `VEC_AS_TEXT() 1 row in set (0.01 sec) ``` -For additional cast functions, see [Vector Functions and Operators](/tidb-cloud/vector-search-functions-and-operators.md). +For additional cast functions, see [Vector Functions and Operators](/vector-search-functions-and-operators.md). ### Cast between Vector ⇔ other data types @@ -240,7 +240,7 @@ It is currently not possible to cast between Vector and other data types (like ` - You cannot store `NaN`, `Infinity`, or `-Infinity` values in the vector data type. - Currently Vector data types cannot store double-precision floating numbers. This will be supported in future release. -For other limitations, see [Vector Search Limitations](/tidb-cloud/vector-search-limitations.md). +For other limitations, see [Vector Search Limitations](/vector-search-limitations.md). ## MySQL compatibility @@ -248,6 +248,6 @@ Vector data types are TiDB specific, and are not supported in MySQL. ## See also -- [Vector Functions and Operators](/tidb-cloud/vector-search-functions-and-operators.md) +- [Vector Functions and Operators](/vector-search-functions-and-operators.md) - [Vector Search Index](/tidb-cloud/vector-search-index.md) -- [Improve Vector Search Performance](/tidb-cloud/vector-search-improve-performance.md) +- [Improve Vector Search Performance](/vector-search-improve-performance.md) diff --git a/vector-search-functions-and-operators.md b/vector-search-functions-and-operators.md index b54070ed34f9f..3e56377560751 100644 --- a/vector-search-functions-and-operators.md +++ b/vector-search-functions-and-operators.md @@ -11,7 +11,7 @@ summary: Learn about functions and operators available for Vector Data Types. ## Vector functions -The following functions are designed specifically for [Vector Data Types](/tidb-cloud/vector-search-data-types.md). +The following functions are designed specifically for [Vector Data Types](/vector-search-data-types.md). **Vector Distance Functions:** @@ -33,7 +33,7 @@ The following functions are designed specifically for [Vector Data Types](/tidb- ## Extended built-in functions and operators -The following built-in functions and operators are extended, supporting operating on [Vector Data Types](/tidb-cloud/vector-search-data-types.md). +The following built-in functions and operators are extended, supporting operating on [Vector Data Types](/vector-search-data-types.md). **Arithmetic operators:** @@ -42,7 +42,7 @@ The following built-in functions and operators are extended, supporting operatin | [`+`](https://dev.mysql.com/doc/refman/8.0/en/arithmetic-functions.html#operator_plus) | Vector element-wise addition operator | | [`-`](https://dev.mysql.com/doc/refman/8.0/en/arithmetic-functions.html#operator_minus) | Vector element-wise subtraction operator | -For more information about how vector arithmetic works, see [Vector Data Type | Arithmetic](/tidb-cloud/vector-search-data-types.md#arithmetic). +For more information about how vector arithmetic works, see [Vector Data Type | Arithmetic](/vector-search-data-types.md#arithmetic). **Aggregate (GROUP BY) functions:** @@ -74,7 +74,7 @@ For more information about how vector arithmetic works, see [Vector Data Type | | [`!=`, `<>`](https://dev.mysql.com/doc/refman/8.0/en/comparison-operators.html#operator_not-equal) | Not equal operator | | [`NOT IN()`](https://dev.mysql.com/doc/refman/8.0/en/comparison-operators.html#operator_not-in) | Check whether a value is not within a set of values | -For more information about how vectors are compared, see [Vector Data Type | Comparison](/tidb-cloud/vector-search-data-types.md#comparison). +For more information about how vectors are compared, see [Vector Data Type | Comparison](/vector-search-data-types.md#comparison). **Control flow functions:** @@ -92,7 +92,7 @@ For more information about how vectors are compared, see [Vector Data Type | Com | [`CAST()`](https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html#function_cast) | Cast a value as a certain type | | [`CONVERT()`](https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html#function_convert) | Cast a value as a certain type | -For more information about how to use `CAST()`, see [Vector Data Type | Cast](/tidb-cloud/vector-search-data-types.md#cast). +For more information about how to use `CAST()`, see [Vector Data Type | Cast](/vector-search-data-types.md#cast). ## Full references @@ -279,4 +279,4 @@ The vector functions and the extended usage of built-in functions and operators ## See also -- [Vector Data Types](/tidb-cloud/vector-search-data-types.md) +- [Vector Data Types](/vector-search-data-types.md) diff --git a/vector-search-get-started-using-python.md b/vector-search-get-started-using-python.md index 08a3e315db98b..8b67883893aa4 100644 --- a/vector-search-get-started-using-python.md +++ b/vector-search-get-started-using-python.md @@ -7,7 +7,7 @@ summary: Learn how to quickly develop an AI application that performs semantic s This tutorial demonstrates how to develop a simple AI application that provides **semantic search** features. Unlike traditional keyword search, semantic search intelligently understands the meaning behind your query. For example, if you have documents titled "dog", "fish", and "tree", and you search for "a swimming animal", the application would identify "fish" as the most relevant result. -Throughout this tutorial, you will develop this AI application using [TiDB Vector Search](/tidb-cloud/vector-search-overview.md), Python, [TiDB Vector SDK for Python](https://github.com/pingcap/tidb-vector-python), and AI models. +Throughout this tutorial, you will develop this AI application using [TiDB Vector Search](/vector-search-overview.md), Python, [TiDB Vector SDK for Python](https://github.com/pingcap/tidb-vector-python), and AI models. > **Note** > @@ -44,7 +44,7 @@ pip install sqlalchemy pymysql sentence-transformers tidb-vector python-dotenv ``` - `tidb-vector`: the Python client for interacting with Vector Search in TiDB Cloud. -- [`sentence-transformers`](https://sbert.net): a Python library that provides pre-trained models for generating [vector embeddings](/tidb-cloud/vector-search-overview.md#vector-embedding) from text. +- [`sentence-transformers`](https://sbert.net): a Python library that provides pre-trained models for generating [vector embeddings](/vector-search-overview.md#vector-embedding) from text. ### Step 3. Configure the connection string to the TiDB cluster @@ -79,7 +79,7 @@ pip install sqlalchemy pymysql sentence-transformers tidb-vector python-dotenv ### Step 4. Initialize the embedding model -An [embedding model](/tidb-cloud/vector-search-overview.md#embedding-model) transforms data into [vector embeddings](/tidb-cloud/vector-search-overview.md#vector-embedding). This example uses the pre-trained model [**msmarco-MiniLM-L12-cos-v5**](https://huggingface.co/sentence-transformers/msmarco-MiniLM-L12-cos-v5) for text embedding. This lightweight model, provided by the `sentence-transformers` library, transforms text data into 384-dimensional vector embeddings. +An [embedding model](/vector-search-overview.md#embedding-model) transforms data into [vector embeddings](/vector-search-overview.md#vector-embedding). This example uses the pre-trained model [**msmarco-MiniLM-L12-cos-v5**](https://huggingface.co/sentence-transformers/msmarco-MiniLM-L12-cos-v5) for text embedding. This lightweight model, provided by the `sentence-transformers` library, transforms text data into 384-dimensional vector embeddings. To set up the model, copy the following code into the `example.py` file. This code initializes a `SentenceTransformer` instance and defines a `text_to_embedding()` function for later use. @@ -191,5 +191,5 @@ This demonstration shows how vector search can efficiently locate the most relev ## See also -- [Vector Data Types](/tidb-cloud/vector-search-data-types.md) +- [Vector Data Types](/vector-search-data-types.md) - [Vector Search Index](/tidb-cloud/vector-search-index.md) diff --git a/vector-search-get-started-using-sql.md b/vector-search-get-started-using-sql.md index 5da699c224bae..beb7a9c136b32 100644 --- a/vector-search-get-started-using-sql.md +++ b/vector-search-get-started-using-sql.md @@ -5,7 +5,7 @@ summary: Learn how to quickly get started with Vector Search in TiDB Cloud using # Get Started with Vector Search via SQL -TiDB extends MySQL syntax to support [Vector Search](/tidb-cloud/vector-search-overview.md) and introduce new [Vector data types](/tidb-cloud/vector-search-data-types.md) and several [vector functions](/tidb-cloud/vector-search-functions-and-operators.md). +TiDB extends MySQL syntax to support [Vector Search](/vector-search-overview.md) and introduce new [Vector data types](/vector-search-data-types.md) and several [vector functions](/vector-search-functions-and-operators.md). This tutorial demonstrates how to get started with TiDB Vector Search just using SQL statements. You will learn how to use the [MySQL command-line client](https://dev.mysql.com/doc/refman/8.4/en/mysql.html) to: @@ -45,7 +45,7 @@ To complete this tutorial, you need: ### Step 2. Create a vector table -With vector search support, you can use the `VECTOR` type column to store [vector embeddings](/tidb-cloud/vector-search-overview.md#vector-embedding) in TiDB. +With vector search support, you can use the `VECTOR` type column to store [vector embeddings](/vector-search-overview.md#vector-embedding) in TiDB. To create a table with a three-dimensional `VECTOR` column, execute the following SQL statements using your MySQL CLI: @@ -68,7 +68,7 @@ Query OK, 0 rows affected (0.27 sec) ### Step 3. Store the vector embeddings -Insert three documents with their [vector embeddings](/tidb-cloud/vector-search-overview.md#vector-embedding) into the `embedded_documents` table: +Insert three documents with their [vector embeddings](/vector-search-overview.md#vector-embedding) into the `embedded_documents` table: ```sql INSERT INTO embedded_documents @@ -89,7 +89,7 @@ Records: 3 Duplicates: 0 Warnings: 0 > > This example simplifies the dimensions of the vector embeddings and uses only 3-dimensional vectors for demonstration purposes. > -> In real-world applications, [embedding models](/tidb-cloud/vector-search-overview.md#embedding-model) often produce vector embeddings with hundreds or thousands of dimensions. +> In real-world applications, [embedding models](/vector-search-overview.md#embedding-model) often produce vector embeddings with hundreds or thousands of dimensions. ### Step 4. Query the vector table @@ -144,5 +144,5 @@ From the output, the swimming animal is most likely a fish, or a dog with a gift ## See also -- [Vector Data Types](/tidb-cloud/vector-search-data-types.md) +- [Vector Data Types](/vector-search-data-types.md) - [Vector Search Index](/tidb-cloud/vector-search-index.md) diff --git a/vector-search-integrate-with-django-orm.md b/vector-search-integrate-with-django-orm.md index cc11b69ad1d4e..cc21072cdc62b 100644 --- a/vector-search-integrate-with-django-orm.md +++ b/vector-search-integrate-with-django-orm.md @@ -233,5 +233,5 @@ results = Document.objects.annotate( ## See also -- [Vector Data Types](/tidb-cloud/vector-search-data-types.md) +- [Vector Data Types](/vector-search-data-types.md) - [Vector Search Index](/tidb-cloud/vector-search-index.md) diff --git a/vector-search-integrate-with-jinaai-embedding.md b/vector-search-integrate-with-jinaai-embedding.md index 1ec86cf0d1017..b1d90590d2931 100644 --- a/vector-search-integrate-with-jinaai-embedding.md +++ b/vector-search-integrate-with-jinaai-embedding.md @@ -242,5 +242,5 @@ with Session(engine) as session: ## See also -- [Vector Data Types](/tidb-cloud/vector-search-data-types.md) +- [Vector Data Types](/vector-search-data-types.md) - [Vector Search Index](/tidb-cloud/vector-search-index.md) diff --git a/vector-search-integrate-with-langchain.md b/vector-search-integrate-with-langchain.md index 21be2741ca4bb..d9a93d11b8148 100644 --- a/vector-search-integrate-with-langchain.md +++ b/vector-search-integrate-with-langchain.md @@ -5,7 +5,7 @@ summary: Learn how to integrate Vector Search in TiDB Cloud with LangChain. # Integrate Vector Search with LangChain -This tutorial demonstrates how to integrate the [vector search](/tidb-cloud/vector-search-overview.md) feature in TiDB Cloud with [LangChain](https://python.langchain.com/). +This tutorial demonstrates how to integrate the [vector search](/vector-search-overview.md) feature in TiDB Cloud with [LangChain](https://python.langchain.com/). > **Note** > @@ -581,5 +581,5 @@ The expected output is as follows: ## See also -- [Vector Data Types](/tidb-cloud/vector-search-data-types.md) +- [Vector Data Types](/vector-search-data-types.md) - [Vector Search Index](/tidb-cloud/vector-search-index.md) diff --git a/vector-search-integrate-with-llamaindex.md b/vector-search-integrate-with-llamaindex.md index 094388563de34..39646d8ac9cb3 100644 --- a/vector-search-integrate-with-llamaindex.md +++ b/vector-search-integrate-with-llamaindex.md @@ -5,7 +5,7 @@ summary: Learn how to integrate Vector Search in TiDB Cloud with LlamaIndex. # Integrate Vector Search with LlamaIndex -This tutorial demonstrates how to integrate the [vector search](/tidb-cloud/vector-search-overview.md) feature in TiDB Cloud with [LlamaIndex](https://www.llamaindex.ai). +This tutorial demonstrates how to integrate the [vector search](/vector-search-overview.md) feature in TiDB Cloud with [LlamaIndex](https://www.llamaindex.ai). > **Note** > @@ -260,5 +260,5 @@ Empty Response ## See also -- [Vector Data Types](/tidb-cloud/vector-search-data-types.md) +- [Vector Data Types](/vector-search-data-types.md) - [Vector Search Index](/tidb-cloud/vector-search-index.md) diff --git a/vector-search-integrate-with-peewee.md b/vector-search-integrate-with-peewee.md index 0af42329cc0f1..2e89ff6345719 100644 --- a/vector-search-integrate-with-peewee.md +++ b/vector-search-integrate-with-peewee.md @@ -5,7 +5,7 @@ summary: Learn how to integrate TiDB Vector Search with peewee to store embeddin # Integrate TiDB Vector Search with peewee -This tutorial walks you through how to use [peewee](https://docs.peewee-orm.com/) to interact with the [TiDB Vector Search](/tidb-cloud/vector-search-overview.md), store embeddings, and perform vector search queries. +This tutorial walks you through how to use [peewee](https://docs.peewee-orm.com/) to interact with the [TiDB Vector Search](/vector-search-overview.md), store embeddings, and perform vector search queries. > **Note** > @@ -223,5 +223,5 @@ results = Document.select(Document, distance).where(distance_expression < 0.2).o ## See also -- [Vector Data Types](/tidb-cloud/vector-search-data-types.md) +- [Vector Data Types](/vector-search-data-types.md) - [Vector Search Index](/tidb-cloud/vector-search-index.md) diff --git a/vector-search-integrate-with-sqlalchemy.md b/vector-search-integrate-with-sqlalchemy.md index f66cc6c97f676..1dad860a884f2 100644 --- a/vector-search-integrate-with-sqlalchemy.md +++ b/vector-search-integrate-with-sqlalchemy.md @@ -5,7 +5,7 @@ summary: Learn how to integrate TiDB Vector Search with SQLAlchemy to store embe # Integrate TiDB Vector Search with SQLAlchemy -This tutorial walks you through how to use [SQLAlchemy](https://www.sqlalchemy.org/) to interact with [TiDB Vector Search](/tidb-cloud/vector-search-overview.md), store embeddings, and perform vector search queries. +This tutorial walks you through how to use [SQLAlchemy](https://www.sqlalchemy.org/) to interact with [TiDB Vector Search](/vector-search-overview.md), store embeddings, and perform vector search queries. > **Note** > @@ -195,5 +195,5 @@ with Session(engine) as session: ## See also -- [Vector Data Types](/tidb-cloud/vector-search-data-types.md) +- [Vector Data Types](/vector-search-data-types.md) - [Vector Search Index](/tidb-cloud/vector-search-index.md) diff --git a/vector-search-integration-overview.md b/vector-search-integration-overview.md index 1bf586375259a..efc93cb5657b5 100644 --- a/vector-search-integration-overview.md +++ b/vector-search-integration-overview.md @@ -17,8 +17,8 @@ TiDB provides official support for the following AI frameworks, enabling you to | AI frameworks | Tutorial | |---------------|---------------------------------------------------------------------------------------------------| -| Langchain | [Integrate Vector Search with LangChain](/tidb-cloud/vector-search-integrate-with-langchain.md) | -| LlamaIndex | [Integrate Vector Search with LlamaIndex](/tidb-cloud/vector-search-integrate-with-llamaindex.md) | +| Langchain | [Integrate Vector Search with LangChain](/vector-search-integrate-with-langchain.md) | +| LlamaIndex | [Integrate Vector Search with LlamaIndex](/vector-search-integrate-with-llamaindex.md) | Moreover, you can also use TiDB for various purposes, such as document storage and knowledge graph storage for AI applications. @@ -32,7 +32,7 @@ The following table lists some mainstream embedding service providers and the co | Embedding service providers | Tutorial | |-----------------------------|---------------------------------------------------------------------------------------------------------------------| -| Jina AI | [Integrate Vector Search with Jina AI Embeddings API](/tidb-cloud/vector-search-integrate-with-jinaai-embedding.md) | +| Jina AI | [Integrate Vector Search with Jina AI Embeddings API](/vector-search-integrate-with-jinaai-embedding.md) | ## Object Relational Mapping (ORM) libraries diff --git a/vector-search-limitations.md b/vector-search-limitations.md index de84f5e50d8c5..fb82632103070 100644 --- a/vector-search-limitations.md +++ b/vector-search-limitations.md @@ -9,7 +9,7 @@ This document describes the known limitations of TiDB Vector Search. We are cont - TiDB Vector Search is only available for [TiDB Serverless](/tidb-cloud/select-cluster-tier.md#tidb-serverless) clusters. It is not available for TiDB Dedicated or TiDB Self-Hosted. -- Each [vector](/tidb-cloud/vector-search-data-types.md) supports up to 16,000 dimensions. +- Each [vector](/vector-search-data-types.md) supports up to 16,000 dimensions. - Vector data supports only single-precision floating-point numbers (Float32). diff --git a/vector-search-overview.md b/vector-search-overview.md index 1b9c3ddaa1d03..ecaeabb1eb919 100644 --- a/vector-search-overview.md +++ b/vector-search-overview.md @@ -23,7 +23,7 @@ A vector embedding, also known as an embedding, is a sequence of numbers that re Vector embeddings are essential in machine learning and serve as the foundation for semantic similarity searches. -TiDB introduces [Vector data types](/tidb-cloud/vector-search-data-types.md) designed to optimize the storage and retrieval of vector embeddings, enhancing their use in AI applications. You can store vector embeddings in TiDB and perform vector search queries to find the most relevant data using these data types. +TiDB introduces [Vector data types](/vector-search-data-types.md) designed to optimize the storage and retrieval of vector embeddings, enhancing their use in AI applications. You can store vector embeddings in TiDB and perform vector search queries to find the most relevant data using these data types. ### Embedding model @@ -37,7 +37,7 @@ To learn how to generate vector embeddings for your specific data types, refer t After converting raw data into vector embeddings and storing them in TiDB, your application can execute vector search queries to find the data most semantically or contextually relevant to a user's query. -Vector Search in TiDB Cloud identifies the top-k nearest neighbor (KNN) vectors by using a [distance function](/tidb-cloud/vector-search-functions-and-operators.md) to calculate the distance between the given vector and vectors stored in the database. The vectors closest to the query represent the most similar data in meaning. +Vector Search in TiDB Cloud identifies the top-k nearest neighbor (KNN) vectors by using a [distance function](/vector-search-functions-and-operators.md) to calculate the distance between the given vector and vectors stored in the database. The vectors closest to the query represent the most similar data in meaning. ![The Schematic TiDB Vector Search](/media/vector-search/embedding-search.png) @@ -61,5 +61,5 @@ A recommendation engine is a system that proactively suggests content, products, To get started with TiDB Vector Search, see the following documents: -- [Get started with vector search using Python](/tidb-cloud/vector-search-get-started-using-python.md) -- [Get started with vector search using SQL](/tidb-cloud/vector-search-get-started-using-sql.md) +- [Get started with vector search using Python](/vector-search-get-started-using-python.md) +- [Get started with vector search using SQL](/vector-search-get-started-using-sql.md) From 0d48edacbbc3e5a804379766eaacbb826b34ccc9 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Tue, 3 Sep 2024 14:53:17 +0800 Subject: [PATCH 04/16] fix broken links --- vector-search-data-types.md | 29 ++++++++++++++++ vector-search-functions-and-operators.md | 4 +++ vector-search-get-started-using-python.md | 25 ++++++++++++++ vector-search-get-started-using-sql.md | 24 ++++++++++++++ vector-search-improve-performance.md | 12 +++++-- vector-search-integrate-with-django-orm.md | 21 +++++++++++- ...-search-integrate-with-jinaai-embedding.md | 25 ++++++++++++++ vector-search-integrate-with-langchain.md | 33 +++++++++++++++++-- vector-search-integrate-with-llamaindex.md | 33 +++++++++++++++++-- vector-search-integrate-with-peewee.md | 31 ++++++++++++++++- vector-search-integrate-with-sqlalchemy.md | 31 ++++++++++++++++- vector-search-integration-overview.md | 4 +++ vector-search-limitations.md | 22 ++++++++++++- vector-search-overview.md | 4 +++ 14 files changed, 288 insertions(+), 10 deletions(-) diff --git a/vector-search-data-types.md b/vector-search-data-types.md index 742091b9bba6d..0538ba51e1e34 100644 --- a/vector-search-data-types.md +++ b/vector-search-data-types.md @@ -14,6 +14,14 @@ The following Vector data type is currently available: The Vector data type provides these advantages over storing in a `JSON` column: + + +- Dimension enforcement. A dimension can be specified to forbid inserting vectors with different dimensions. +- Optimized storage format. Vector data types are stored even more space-efficient than `JSON` data type. + + + + - Vector Index support. A [Vector Search Index](/tidb-cloud/vector-search-index.md) can be built to speed up vector searching. - Dimension enforcement. A dimension can be specified to forbid inserting vectors with different dimensions. - Optimized storage format. Vector data types are stored even more space-efficient than `JSON` data type. @@ -22,6 +30,8 @@ The Vector data type provides these advantages over storing in a `JSON` column: > > Vector data types are only available for [TiDB Serverless](/tidb-cloud/select-cluster-tier.md#tidb-serverless) clusters. + + ## Value syntax A Vector value contains an arbitrary number of floating numbers. You can use a string in the following syntax to represent a Vector value: @@ -59,8 +69,12 @@ ERROR 1105 (HY000): vector has 2 dimensions, does not fit VECTOR(3) See [Vector Functions and Operators](/vector-search-functions-and-operators.md) for available functions and operators over the Vector data type. + + See [Vector Search Index](/tidb-cloud/vector-search-index.md) for building and using a vector search index. + + ## Vectors with different dimensions You can store vectors with different dimensions in the same column by omitting the dimension parameter in the `VECTOR` type: @@ -75,8 +89,12 @@ INSERT INTO vector_table VALUES (1, '[0.3, 0.5, -0.1]'); -- 3 dimensions vector, INSERT INTO vector_table VALUES (2, '[0.3, 0.5]'); -- 2 dimensions vector, OK ``` + + However you cannot build a [Vector Search Index](/tidb-cloud/vector-search-index.md) for this column, as vector distances can be only calculated between vectors with the same dimensions. + + ## Comparison You can compare vector data types using [comparison operators](/functions-and-operators/operators.md) such as `=`, `!=`, `<`, `>`, `<=`, and `>=`. For a complete list of comparison operators and functions for vector data types, see [Vector Functions and Operators](/vector-search-functions-and-operators.md). @@ -248,6 +266,17 @@ Vector data types are TiDB specific, and are not supported in MySQL. ## See also + + +- [Vector Functions and Operators](/vector-search-functions-and-operators.md) +- [Improve Vector Search Performance](/vector-search-improve-performance.md) + + + + + - [Vector Functions and Operators](/vector-search-functions-and-operators.md) - [Vector Search Index](/tidb-cloud/vector-search-index.md) - [Improve Vector Search Performance](/vector-search-improve-performance.md) + + \ No newline at end of file diff --git a/vector-search-functions-and-operators.md b/vector-search-functions-and-operators.md index 3e56377560751..6b7c1c6aa92d0 100644 --- a/vector-search-functions-and-operators.md +++ b/vector-search-functions-and-operators.md @@ -5,10 +5,14 @@ summary: Learn about functions and operators available for Vector Data Types. # Vector Functions and Operators + + > **Note:** > > Vector data types and these vector functions are only available for [TiDB Serverless](/tidb-cloud/select-cluster-tier.md#tidb-serverless) clusters. + + ## Vector functions The following functions are designed specifically for [Vector Data Types](/vector-search-data-types.md). diff --git a/vector-search-get-started-using-python.md b/vector-search-get-started-using-python.md index 8b67883893aa4..6ac9ec43bc5de 100644 --- a/vector-search-get-started-using-python.md +++ b/vector-search-get-started-using-python.md @@ -9,18 +9,33 @@ This tutorial demonstrates how to develop a simple AI application that provides Throughout this tutorial, you will develop this AI application using [TiDB Vector Search](/vector-search-overview.md), Python, [TiDB Vector SDK for Python](https://github.com/pingcap/tidb-vector-python), and AI models. + + > **Note** > > TiDB Vector Search is currently in beta and only available for [TiDB Serverless](/tidb-cloud/select-cluster-tier.md#tidb-serverless) clusters. + + ## Prerequisites To complete this tutorial, you need: + + +- [Python 3.8 or higher](https://www.python.org/downloads/) installed. +- [Git](https://git-scm.com/downloads) installed. +- A TiDB cluster. Follow [Deploy a local test TiDB cluster](/quick-start-with-tidb.md#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](/production-deployment-using-tiup.md) to create a local cluster. + + + + - [Python 3.8 or higher](https://www.python.org/downloads/) installed. - [Git](https://git-scm.com/downloads) installed. - A TiDB Serverless cluster. Follow [creating a TiDB Serverless cluster](/tidb-cloud/create-tidb-cluster-serverless.md) to create your own TiDB Cloud cluster if you don't have one. + + ## Get started To run the demo directly, check out the sample code in the [pingcap/tidb-vector-python](https://github.com/pingcap/tidb-vector-python/blob/main/examples/python-client-quickstart) repository. @@ -191,5 +206,15 @@ This demonstration shows how vector search can efficiently locate the most relev ## See also + + +- [Vector Data Types](/vector-search-data-types.md) + + + + + - [Vector Data Types](/vector-search-data-types.md) - [Vector Search Index](/tidb-cloud/vector-search-index.md) + + \ No newline at end of file diff --git a/vector-search-get-started-using-sql.md b/vector-search-get-started-using-sql.md index beb7a9c136b32..aba4eaa84a619 100644 --- a/vector-search-get-started-using-sql.md +++ b/vector-search-get-started-using-sql.md @@ -14,17 +14,31 @@ This tutorial demonstrates how to get started with TiDB Vector Search just using - Store vector embeddings. - Perform vector search queries. + + > **Note** > > TiDB Vector Search is currently in beta and only available for [TiDB Serverless](/tidb-cloud/select-cluster-tier.md#tidb-serverless) clusters. + + ## Prerequisites To complete this tutorial, you need: + + +- [MySQL command-line client](https://dev.mysql.com/doc/refman/8.4/en/mysql.html) (MySQL CLI) installed on your machine. +- A TiDB cluster. Follow [Deploy a local test TiDB cluster](/quick-start-with-tidb.md#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](/production-deployment-using-tiup.md) to create a local cluster. + + + + - [MySQL command-line client](https://dev.mysql.com/doc/refman/8.4/en/mysql.html) (MySQL CLI) installed on your machine. - A TiDB Serverless cluster. Follow [creating a TiDB Serverless cluster](/tidb-cloud/create-tidb-cluster-serverless.md) to create your own TiDB Cloud cluster if you don't have one. + + ## Get started ### Step 1. Connect to the TiDB cluster @@ -144,5 +158,15 @@ From the output, the swimming animal is most likely a fish, or a dog with a gift ## See also + + +- [Vector Data Types](/vector-search-data-types.md) + + + + + - [Vector Data Types](/vector-search-data-types.md) - [Vector Search Index](/tidb-cloud/vector-search-index.md) + + diff --git a/vector-search-improve-performance.md b/vector-search-improve-performance.md index 651bc94251370..8c40032346500 100644 --- a/vector-search-improve-performance.md +++ b/vector-search-improve-performance.md @@ -9,11 +9,19 @@ TiDB Vector Search allows you to perform ANN queries that search for results sim ## Add vector search index for vector columns -The [vector search index](/tidb-cloud/vector-search-index.md) dramatically improves the performance of vector search queries, usually by 10x or more, with a trade-off of only a small decrease of recall rate. +> **Note** +> +> This practice is only applicable to [TiDB Serverless](/https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters. + +The [vector search index](https://docs.pingcap.com/tidbcloud/vector-search-index) dramatically improves the performance of vector search queries, usually by 10x or more, with a trade-off of only a small decrease of recall rate. ## Ensure vector indexes are fully built -Vector indexes are built asynchronously. Until all vector data is indexed, vector search performance is suboptimal. To check the index build progress, see [View index build progress](/tidb-cloud/vector-search-index.md#view-index-build-progress). +> **Note** +> +> This practice is only applicable to [TiDB Serverless](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters. + +Vector indexes are built asynchronously. Until all vector data is indexed, vector search performance is suboptimal. To check the index build progress, see [View index build progress](https://docs.pingcap.com/tidbcloud/vector-search-index#view-index-build-progress). ## Reduce vector dimensions or shorten embeddings diff --git a/vector-search-integrate-with-django-orm.md b/vector-search-integrate-with-django-orm.md index cc21072cdc62b..d65b339e53405 100644 --- a/vector-search-integrate-with-django-orm.md +++ b/vector-search-integrate-with-django-orm.md @@ -7,18 +7,33 @@ summary: Learn how to integrate TiDB Vector Search with Django ORM to store embe This tutorial walks you through how to use [Django](https://www.djangoproject.com/) ORM to interact with the TiDB Vector Search, store embeddings, and perform vector search queries. + + > **Note** > > TiDB Vector Search is currently in beta and only available for [TiDB Serverless](/tidb-cloud/select-cluster-tier.md#tidb-serverless) clusters. + + ## Prerequisites To complete this tutorial, you need: + + +- [Python 3.8 or higher](https://www.python.org/downloads/) installed. +- [Git](https://git-scm.com/downloads) installed. +- A TiDB cluster. Follow [Deploy a local test TiDB cluster](/quick-start-with-tidb.md#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](/production-deployment-using-tiup.md) to create a local cluster. + + + + - [Python 3.8 or higher](https://www.python.org/downloads/) installed. - [Git](https://git-scm.com/downloads) installed. - A TiDB Serverless cluster. Follow [creating a TiDB Serverless cluster](/tidb-cloud/create-tidb-cluster-serverless.md) to create your own TiDB Cloud cluster if you don't have one. + + ## Run the sample app You can quickly learn about how to integrate TiDB Vector Search with Django ORM by following the steps below. @@ -182,7 +197,11 @@ class Document(models.Model): #### Define a vector column optimized with index -Define a 3-dimensional vector column and optimize it with a [vector search index](/tidb-cloud/vector-search-index.md) (HNSW index). +> **Note** +> +> This code snippet is only applicable to [TiDB Serverless](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters. + +Define a 3-dimensional vector column and optimize it with a [vector search index](https://docs.pingcap.com/tidbcloud/vector-search-index) (HNSW index). ```python class DocumentWithIndex(models.Model): diff --git a/vector-search-integrate-with-jinaai-embedding.md b/vector-search-integrate-with-jinaai-embedding.md index b1d90590d2931..57d570f9001de 100644 --- a/vector-search-integrate-with-jinaai-embedding.md +++ b/vector-search-integrate-with-jinaai-embedding.md @@ -7,18 +7,33 @@ summary: Learn how to integrate TiDB Vector Search with Jina AI Embeddings API t This tutorial walks you through how to use [Jina AI](https://jina.ai/) to generate embeddings for text data, and then store the embeddings in TiDB Vector Storage and search similar texts based on embeddings. + + > **Note** > > TiDB Vector Search is currently in beta and only available for [TiDB Serverless](/tidb-cloud/select-cluster-tier.md#tidb-serverless) clusters. + + ## Prerequisites To complete this tutorial, you need: + + +- [Python 3.8 or higher](https://www.python.org/downloads/) installed. +- [Git](https://git-scm.com/downloads) installed. +- A TiDB cluster. Follow [Deploy a local test TiDB cluster](/quick-start-with-tidb.md#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](/production-deployment-using-tiup.md) to create a local cluster. + + + + - [Python 3.8 or higher](https://www.python.org/downloads/) installed. - [Git](https://git-scm.com/downloads) installed. - A TiDB Serverless cluster. Follow [creating a TiDB Serverless cluster](/tidb-cloud/create-tidb-cluster-serverless.md) to create your own TiDB Cloud cluster if you don't have one. + + ## Run the sample app You can quickly learn about how to integrate TiDB Vector Search with JinaAI Embedding by following the steps below. @@ -242,5 +257,15 @@ with Session(engine) as session: ## See also + + +- [Vector Data Types](/vector-search-data-types.md) + + + + + - [Vector Data Types](/vector-search-data-types.md) - [Vector Search Index](/tidb-cloud/vector-search-index.md) + + \ No newline at end of file diff --git a/vector-search-integrate-with-langchain.md b/vector-search-integrate-with-langchain.md index d9a93d11b8148..46c6b75baf23a 100644 --- a/vector-search-integrate-with-langchain.md +++ b/vector-search-integrate-with-langchain.md @@ -7,20 +7,39 @@ summary: Learn how to integrate Vector Search in TiDB Cloud with LangChain. This tutorial demonstrates how to integrate the [vector search](/vector-search-overview.md) feature in TiDB Cloud with [LangChain](https://python.langchain.com/). + + > **Note** > -> - TiDB Vector Search is currently in beta and only available for [TiDB Serverless](/tidb-cloud/select-cluster-tier.md#tidb-serverless) clusters. -> - You can view the complete [sample code](https://github.com/langchain-ai/langchain/blob/master/docs/docs/integrations/vectorstores/tidb_vector.ipynb) on Jupyter Notebook, or run the sample code directly in the [Colab](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/integrations/vectorstores/tidb_vector.ipynb) online environment. +> TiDB Vector Search is currently in beta and only available for [TiDB Serverless](/tidb-cloud/select-cluster-tier.md#tidb-serverless) clusters. + + + +> **Tip** +> +> You can view the complete [sample code](https://github.com/langchain-ai/langchain/blob/master/docs/docs/integrations/vectorstores/tidb_vector.ipynb) on Jupyter Notebook, or run the sample code directly in the [Colab](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/integrations/vectorstores/tidb_vector.ipynb) online environment. ## Prerequisites To complete this tutorial, you need: + + +- [Python 3.8 or higher](https://www.python.org/downloads/) installed. +- [Jupyter Notebook](https://jupyter.org/install) installed. +- [Git](https://git-scm.com/downloads) installed. +- A TiDB cluster. Follow [Deploy a local test TiDB cluster](/quick-start-with-tidb.md#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](/production-deployment-using-tiup.md) to create a local cluster. + + + + - [Python 3.8 or higher](https://www.python.org/downloads/) installed. - [Jupyter Notebook](https://jupyter.org/install) installed. - [Git](https://git-scm.com/downloads) installed. - A TiDB Serverless cluster. Follow [creating a TiDB Serverless cluster](/tidb-cloud/create-tidb-cluster-serverless.md) to create your own TiDB Cloud cluster if you don't have one. + + ## Get started This section provides step-by-step instructions for integrating TiDB Vector Search with LangChain to perform semantic searches. @@ -581,5 +600,15 @@ The expected output is as follows: ## See also + + +- [Vector Data Types](/vector-search-data-types.md) + + + + + - [Vector Data Types](/vector-search-data-types.md) - [Vector Search Index](/tidb-cloud/vector-search-index.md) + + \ No newline at end of file diff --git a/vector-search-integrate-with-llamaindex.md b/vector-search-integrate-with-llamaindex.md index 39646d8ac9cb3..bf6478435148e 100644 --- a/vector-search-integrate-with-llamaindex.md +++ b/vector-search-integrate-with-llamaindex.md @@ -7,20 +7,39 @@ summary: Learn how to integrate Vector Search in TiDB Cloud with LlamaIndex. This tutorial demonstrates how to integrate the [vector search](/vector-search-overview.md) feature in TiDB Cloud with [LlamaIndex](https://www.llamaindex.ai). + + > **Note** > -> - TiDB Vector Search is currently in beta and only available for [TiDB Serverless](/tidb-cloud/select-cluster-tier.md#tidb-serverless) clusters. -> - You can view the complete [sample code](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/vector_stores/TiDBVector.ipynb) on Jupyter Notebook, or run the sample code directly in the [Colab](https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/vector_stores/TiDBVector.ipynb) online environment. +> TiDB Vector Search is currently in beta and only available for [TiDB Serverless](/tidb-cloud/select-cluster-tier.md#tidb-serverless) clusters. + + + +> **Tip** +> +> You can view the complete [sample code](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/vector_stores/TiDBVector.ipynb) on Jupyter Notebook, or run the sample code directly in the [Colab](https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/vector_stores/TiDBVector.ipynb) online environment. ## Prerequisites To complete this tutorial, you need: + + +- [Python 3.8 or higher](https://www.python.org/downloads/) installed. +- [Jupyter Notebook](https://jupyter.org/install) installed. +- [Git](https://git-scm.com/downloads) installed. +- A TiDB cluster. Follow [Deploy a local test TiDB cluster](/quick-start-with-tidb.md#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](/production-deployment-using-tiup.md) to create a local cluster. + + + + - [Python 3.8 or higher](https://www.python.org/downloads/) installed. - [Jupyter Notebook](https://jupyter.org/install) installed. - [Git](https://git-scm.com/downloads) installed. - A TiDB Serverless cluster. Follow [creating a TiDB Serverless cluster](/tidb-cloud/create-tidb-cluster-serverless.md) to create your own TiDB Cloud cluster if you don't have one. + + ## Get started This section provides step-by-step instructions for integrating TiDB Vector Search with LlamaIndex to perform semantic searches. @@ -260,5 +279,15 @@ Empty Response ## See also + + +- [Vector Data Types](/vector-search-data-types.md) + + + + + - [Vector Data Types](/vector-search-data-types.md) - [Vector Search Index](/tidb-cloud/vector-search-index.md) + + diff --git a/vector-search-integrate-with-peewee.md b/vector-search-integrate-with-peewee.md index 2e89ff6345719..552223fa7de23 100644 --- a/vector-search-integrate-with-peewee.md +++ b/vector-search-integrate-with-peewee.md @@ -7,18 +7,33 @@ summary: Learn how to integrate TiDB Vector Search with peewee to store embeddin This tutorial walks you through how to use [peewee](https://docs.peewee-orm.com/) to interact with the [TiDB Vector Search](/vector-search-overview.md), store embeddings, and perform vector search queries. + + > **Note** > > TiDB Vector Search is currently in beta and only available for [TiDB Serverless](/tidb-cloud/select-cluster-tier.md#tidb-serverless) clusters. + + ## Prerequisites To complete this tutorial, you need: + + +- [Python 3.8 or higher](https://www.python.org/downloads/) installed. +- [Git](https://git-scm.com/downloads) installed. +- A TiDB cluster. Follow [Deploy a local test TiDB cluster](/quick-start-with-tidb.md#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](/production-deployment-using-tiup.md) to create a local cluster. + + + + - [Python 3.8 or higher](https://www.python.org/downloads/) installed. - [Git](https://git-scm.com/downloads) installed. - A TiDB Serverless cluster. Follow [creating a TiDB Serverless cluster](/tidb-cloud/create-tidb-cluster-serverless.md) to create your own TiDB Cloud cluster if you don't have one. + + ## Run the sample app You can quickly learn about how to integrate TiDB Vector Search with peewee by following the steps below. @@ -180,7 +195,11 @@ class Document(Model): #### Define a vector column optimized with index -Define a 3-dimensional vector column and optimize it with a [vector search index](/tidb-cloud/vector-search-index.md) (HNSW index). +> **Note** +> +> This code snippet is only applicable to [TiDB Serverless](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters. + +Define a 3-dimensional vector column and optimize it with a [vector search index](https://docs.pingcap.com/tidbcloud/vector-search-index) (HNSW index). ```python class DocumentWithIndex(Model): @@ -223,5 +242,15 @@ results = Document.select(Document, distance).where(distance_expression < 0.2).o ## See also + + +- [Vector Data Types](/vector-search-data-types.md) + + + + + - [Vector Data Types](/vector-search-data-types.md) - [Vector Search Index](/tidb-cloud/vector-search-index.md) + + diff --git a/vector-search-integrate-with-sqlalchemy.md b/vector-search-integrate-with-sqlalchemy.md index 1dad860a884f2..22539f0880f04 100644 --- a/vector-search-integrate-with-sqlalchemy.md +++ b/vector-search-integrate-with-sqlalchemy.md @@ -7,18 +7,33 @@ summary: Learn how to integrate TiDB Vector Search with SQLAlchemy to store embe This tutorial walks you through how to use [SQLAlchemy](https://www.sqlalchemy.org/) to interact with [TiDB Vector Search](/vector-search-overview.md), store embeddings, and perform vector search queries. + + > **Note** > > TiDB Vector Search is currently in beta and only available for [TiDB Serverless](/tidb-cloud/select-cluster-tier.md#tidb-serverless) clusters. + + ## Prerequisites To complete this tutorial, you need: + + +- [Python 3.8 or higher](https://www.python.org/downloads/) installed. +- [Git](https://git-scm.com/downloads) installed. +- A TiDB cluster. Follow [Deploy a local test TiDB cluster](/quick-start-with-tidb.md#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](/production-deployment-using-tiup.md) to create a local cluster. + + + + - [Python 3.8 or higher](https://www.python.org/downloads/) installed. - [Git](https://git-scm.com/downloads) installed. - A TiDB Serverless cluster. Follow [creating a TiDB Serverless cluster](/tidb-cloud/create-tidb-cluster-serverless.md) to create your own TiDB Cloud cluster if you don't have one. + + ## Run the sample app You can quickly learn about how to integrate TiDB Vector Search with SQLAlchemy by following the steps below. @@ -147,7 +162,11 @@ class Document(Base): #### Define a vector column optimized with index -Define a 3-dimensional vector column and optimize it with a [vector search index](/tidb-cloud/vector-search-index.md) (HNSW index). +> **Note** +> +> This code snippet is only applicable to [TiDB Serverless](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters. + +Define a 3-dimensional vector column and optimize it with a [vector search index](https://docs.pingcap.com/tidbcloud/vector-search-index) (HNSW index). ```python class DocumentWithIndex(Base): @@ -195,5 +214,15 @@ with Session(engine) as session: ## See also + + +- [Vector Data Types](/vector-search-data-types.md) + + + + + - [Vector Data Types](/vector-search-data-types.md) - [Vector Search Index](/tidb-cloud/vector-search-index.md) + + diff --git a/vector-search-integration-overview.md b/vector-search-integration-overview.md index efc93cb5657b5..ada66750451c4 100644 --- a/vector-search-integration-overview.md +++ b/vector-search-integration-overview.md @@ -7,10 +7,14 @@ summary: An overview of TiDB vector search integration, including supported AI f This document provides an overview of TiDB vector search integration, including supported AI frameworks, embedding models, and Object Relational Mapping (ORM) libraries. + + > **Note** > > TiDB Vector Search is currently in beta and is only available for [TiDB Serverless](/tidb-cloud/select-cluster-tier.md#tidb-serverless) clusters. + + ## AI frameworks TiDB provides official support for the following AI frameworks, enabling you to easily integrate AI applications developed based on these frameworks with TiDB Vector Search. diff --git a/vector-search-limitations.md b/vector-search-limitations.md index fb82632103070..e8596f3973669 100644 --- a/vector-search-limitations.md +++ b/vector-search-limitations.md @@ -7,7 +7,25 @@ summary: Learn the limitations of the TiDB Vector Search. This document describes the known limitations of TiDB Vector Search. We are continuously working to enhance your experience by adding more features. -- TiDB Vector Search is only available for [TiDB Serverless](/tidb-cloud/select-cluster-tier.md#tidb-serverless) clusters. It is not available for TiDB Dedicated or TiDB Self-Hosted. + + +- TiDB Vector Search is only available for the following clusters. It is not available for TiDB Dedicated clusters. + + - TiDB Self-Hosted clusters with TiDB versions of 8.4.0 or later + - [TiDB Serverless](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters + +- Each [vector](/vector-search-data-types.md) supports up to 16,000 dimensions. + +- Vector data supports only single-precision floating-point numbers (Float32). + + + + + +- TiDB Vector Search is only available for the following clusters. It is not available for TiDB Dedicated clusters. + + - [TiDB Serverless](/tidb-cloud/select-cluster-tier.md#tidb-serverless) clusters + - TiDB Self-Hosted clusters with TiDB versions of 8.4.0 or later - Each [vector](/vector-search-data-types.md) supports up to 16,000 dimensions. @@ -15,6 +33,8 @@ This document describes the known limitations of TiDB Vector Search. We are cont - Only cosine distance and L2 distance are supported when you create a [vector search index](/tidb-cloud/vector-search-index.md). + + ## Feedback We value your feedback and are always here to help: diff --git a/vector-search-overview.md b/vector-search-overview.md index ecaeabb1eb919..1fdbaf5c41d68 100644 --- a/vector-search-overview.md +++ b/vector-search-overview.md @@ -7,10 +7,14 @@ summary: Learn about Vector Search in TiDB Cloud. This feature provides an advan TiDB Vector Search (beta) provides an advanced search solution for performing semantic similarity searches across various data types, including documents, images, audio, and video. This feature enables developers to easily build scalable applications with generative artificial intelligence (AI) capabilities using familiar MySQL skills. + + > **Note** > > TiDB Vector Search is currently in beta and only available for [TiDB Serverless](/tidb-cloud/select-cluster-tier.md#tidb-serverless) clusters. + + ## Concepts Vector search is a search method that prioritizes the meaning of your data to deliver relevant results. This differs from traditional full-text search, which relies primarily on exact keyword matches and word frequency. From 3aa879745b344d0937076c75b1c0b2e608a9ee81 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Tue, 3 Sep 2024 15:12:35 +0800 Subject: [PATCH 05/16] fix broken links --- vector-search-data-types.md | 2 +- vector-search-get-started-using-python.md | 2 +- vector-search-improve-performance.md | 2 +- vector-search-integrate-with-django-orm.md | 10 ++++++++++ vector-search-integrate-with-jinaai-embedding.md | 2 +- vector-search-integrate-with-langchain.md | 2 +- 6 files changed, 15 insertions(+), 5 deletions(-) diff --git a/vector-search-data-types.md b/vector-search-data-types.md index 0538ba51e1e34..cd5052ee49a7f 100644 --- a/vector-search-data-types.md +++ b/vector-search-data-types.md @@ -279,4 +279,4 @@ Vector data types are TiDB specific, and are not supported in MySQL. - [Vector Search Index](/tidb-cloud/vector-search-index.md) - [Improve Vector Search Performance](/vector-search-improve-performance.md) - \ No newline at end of file + diff --git a/vector-search-get-started-using-python.md b/vector-search-get-started-using-python.md index 6ac9ec43bc5de..dc165411bbd76 100644 --- a/vector-search-get-started-using-python.md +++ b/vector-search-get-started-using-python.md @@ -217,4 +217,4 @@ This demonstration shows how vector search can efficiently locate the most relev - [Vector Data Types](/vector-search-data-types.md) - [Vector Search Index](/tidb-cloud/vector-search-index.md) - \ No newline at end of file + diff --git a/vector-search-improve-performance.md b/vector-search-improve-performance.md index 8c40032346500..6cf5200ca9870 100644 --- a/vector-search-improve-performance.md +++ b/vector-search-improve-performance.md @@ -11,7 +11,7 @@ TiDB Vector Search allows you to perform ANN queries that search for results sim > **Note** > -> This practice is only applicable to [TiDB Serverless](/https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters. +> This practice is only applicable to [TiDB Serverless](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters. The [vector search index](https://docs.pingcap.com/tidbcloud/vector-search-index) dramatically improves the performance of vector search queries, usually by 10x or more, with a trade-off of only a small decrease of recall rate. diff --git a/vector-search-integrate-with-django-orm.md b/vector-search-integrate-with-django-orm.md index d65b339e53405..934f2f5e3a8cb 100644 --- a/vector-search-integrate-with-django-orm.md +++ b/vector-search-integrate-with-django-orm.md @@ -252,5 +252,15 @@ results = Document.objects.annotate( ## See also + + +- [Vector Data Types](/vector-search-data-types.md) + + + + + - [Vector Data Types](/vector-search-data-types.md) - [Vector Search Index](/tidb-cloud/vector-search-index.md) + + diff --git a/vector-search-integrate-with-jinaai-embedding.md b/vector-search-integrate-with-jinaai-embedding.md index 57d570f9001de..b4b4ecd388b52 100644 --- a/vector-search-integrate-with-jinaai-embedding.md +++ b/vector-search-integrate-with-jinaai-embedding.md @@ -268,4 +268,4 @@ with Session(engine) as session: - [Vector Data Types](/vector-search-data-types.md) - [Vector Search Index](/tidb-cloud/vector-search-index.md) - \ No newline at end of file + diff --git a/vector-search-integrate-with-langchain.md b/vector-search-integrate-with-langchain.md index 46c6b75baf23a..37053bef2d087 100644 --- a/vector-search-integrate-with-langchain.md +++ b/vector-search-integrate-with-langchain.md @@ -611,4 +611,4 @@ The expected output is as follows: - [Vector Data Types](/vector-search-data-types.md) - [Vector Search Index](/tidb-cloud/vector-search-index.md) - \ No newline at end of file + From e035fd493ed0e12aa7bc583a3f3c539ed9924d0b Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Thu, 19 Sep 2024 11:34:16 +0800 Subject: [PATCH 06/16] add vector search index --- TOC-tidb-cloud.md | 2 +- tidb-cloud/tidb-cloud-release-notes.md | 2 +- vector-search-data-types.md | 41 +++---------------- vector-search-functions-and-operators.md | 6 +-- vector-search-get-started-using-python.md | 12 +----- vector-search-get-started-using-sql.md | 12 +----- ...-search-index.md => vector-search-index.md | 0 vector-search-integrate-with-django-orm.md | 12 +----- ...-search-integrate-with-jinaai-embedding.md | 12 +----- vector-search-integrate-with-langchain.md | 12 +----- vector-search-integrate-with-llamaindex.md | 12 +----- vector-search-integrate-with-peewee.md | 12 +----- vector-search-integrate-with-sqlalchemy.md | 12 +----- vector-search-limitations.md | 19 +-------- 14 files changed, 18 insertions(+), 148 deletions(-) rename tidb-cloud/vector-search-index.md => vector-search-index.md (100%) diff --git a/TOC-tidb-cloud.md b/TOC-tidb-cloud.md index fae204f448435..144979bdde275 100644 --- a/TOC-tidb-cloud.md +++ b/TOC-tidb-cloud.md @@ -259,7 +259,7 @@ - Reference - [Vector Data Types](/vector-search-data-types.md) - [Vector Functions and Operators](/vector-search-functions-and-operators.md) - - [Vector Index](/tidb-cloud/vector-search-index.md) + - [Vector Index](/vector-search-index.md) - [Improve Performance](/vector-search-improve-performance.md) - [Limitations](/vector-search-limitations.md) - [Changelogs](/tidb-cloud/vector-search-changelogs.md) diff --git a/tidb-cloud/tidb-cloud-release-notes.md b/tidb-cloud/tidb-cloud-release-notes.md index def974e79fcd3..aba9c9fd1fe8b 100644 --- a/tidb-cloud/tidb-cloud-release-notes.md +++ b/tidb-cloud/tidb-cloud-release-notes.md @@ -122,7 +122,7 @@ This page lists the release notes of [TiDB Cloud](https://www.pingcap.com/tidb-c The vector search (beta) feature provides an advanced search solution for performing semantic similarity searches across various data types, including documents, images, audio, and video. This feature enables developers to easily build scalable applications with generative artificial intelligence (AI) capabilities using familiar MySQL skills. Key features include: - - [Vector data types](/vector-search-data-types.md), [vector index](/tidb-cloud/vector-search-index.md), and [vector functions and operators](/vector-search-functions-and-operators.md). + - [Vector data types](/vector-search-data-types.md), [vector index](/vector-search-index.md), and [vector functions and operators](/vector-search-functions-and-operators.md). - Ecosystem integrations with [LangChain](/vector-search-integrate-with-langchain.md), [LlamaIndex](/vector-search-integrate-with-llamaindex.md), and [JinaAI](/vector-search-integrate-with-jinaai-embedding.md). - Programming language support for Python: [SQLAlchemy](/vector-search-integrate-with-sqlalchemy.md), [Peewee](/vector-search-integrate-with-peewee.md), and [Django ORM](/vector-search-integrate-with-django-orm.md). - Sample applications and tutorials: perform semantic searches for documents using [Python](/vector-search-get-started-using-python.md) or [SQL](/vector-search-get-started-using-sql.md). diff --git a/vector-search-data-types.md b/vector-search-data-types.md index 63a6e9662ba33..f406d45d5fa0b 100644 --- a/vector-search-data-types.md +++ b/vector-search-data-types.md @@ -14,23 +14,13 @@ The following Vector data type is currently available: The Vector data type provides these advantages over storing in a `JSON` column: - - -- Dimension enforcement. A dimension can be specified to forbid inserting vectors with different dimensions. -- Optimized storage format. Vector data types are stored even more space-efficient than `JSON` data type. - - - - -- Vector Index support. A [Vector Search Index](/tidb-cloud/vector-search-index.md) can be built to speed up vector searching. +- Vector Index support. A [Vector Search Index](/vector-search-index.md) can be built to speed up vector searching. - Dimension enforcement. A dimension can be specified to forbid inserting vectors with different dimensions. - Optimized storage format. Vector data types are stored even more space-efficient than `JSON` data type. > **Note:** > -> Vector data types are only available for [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. - - +> Vector data types are only available for TiDB Self-Managed clusters [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. ## Value syntax @@ -69,11 +59,7 @@ ERROR 1105 (HY000): vector has 2 dimensions, does not fit VECTOR(3) See [Vector Functions and Operators](/vector-search-functions-and-operators.md) for available functions and operators over the Vector data type. - - -See [Vector Search Index](/tidb-cloud/vector-search-index.md) for building and using a vector search index. - - +See [Vector Search Index](/vector-search-index.md) for building and using a vector search index. ## Vectors with different dimensions @@ -89,11 +75,7 @@ INSERT INTO vector_table VALUES (1, '[0.3, 0.5, -0.1]'); -- 3 dimensions vector, INSERT INTO vector_table VALUES (2, '[0.3, 0.5]'); -- 2 dimensions vector, OK ``` - - -However you cannot build a [Vector Search Index](/tidb-cloud/vector-search-index.md) for this column, as vector distances can be only calculated between vectors with the same dimensions. - - +However you cannot build a [Vector Search Index](/vector-search-index.md) for this column, as vector distances can be only calculated between vectors with the same dimensions. ## Comparison @@ -266,17 +248,6 @@ Vector data types are TiDB specific, and are not supported in MySQL. ## See also - - -- [Vector Functions and Operators](/vector-search-functions-and-operators.md) -- [Improve Vector Search Performance](/vector-search-improve-performance.md) - - - - - - [Vector Functions and Operators](/vector-search-functions-and-operators.md) -- [Vector Search Index](/tidb-cloud/vector-search-index.md) -- [Improve Vector Search Performance](/vector-search-improve-performance.md) - - +- [Vector Search Index](/vector-search-index.md) +- [Improve Vector Search Performance](/vector-search-improve-performance.md) \ No newline at end of file diff --git a/vector-search-functions-and-operators.md b/vector-search-functions-and-operators.md index 4421199e701a6..d34a63cebb91f 100644 --- a/vector-search-functions-and-operators.md +++ b/vector-search-functions-and-operators.md @@ -5,13 +5,9 @@ summary: Learn about functions and operators available for Vector Data Types. # Vector Functions and Operators - - > **Note:** > -> Vector data types and these vector functions are only available for [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. - - +> Vector data types and these vector functions are only available for TiDB Self-Managed clusters and [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. ## Vector functions diff --git a/vector-search-get-started-using-python.md b/vector-search-get-started-using-python.md index 2692b1a16b0a4..522de647c8cfa 100644 --- a/vector-search-get-started-using-python.md +++ b/vector-search-get-started-using-python.md @@ -206,15 +206,5 @@ This demonstration shows how vector search can efficiently locate the most relev ## See also - - -- [Vector Data Types](/vector-search-data-types.md) - - - - - - [Vector Data Types](/vector-search-data-types.md) -- [Vector Search Index](/tidb-cloud/vector-search-index.md) - - +- [Vector Search Index](/vector-search-index.md) \ No newline at end of file diff --git a/vector-search-get-started-using-sql.md b/vector-search-get-started-using-sql.md index a01596efd3c74..fc411b5cac56b 100644 --- a/vector-search-get-started-using-sql.md +++ b/vector-search-get-started-using-sql.md @@ -158,15 +158,5 @@ From the output, the swimming animal is most likely a fish, or a dog with a gift ## See also - - -- [Vector Data Types](/vector-search-data-types.md) - - - - - - [Vector Data Types](/vector-search-data-types.md) -- [Vector Search Index](/tidb-cloud/vector-search-index.md) - - +- [Vector Search Index](/vector-search-index.md) diff --git a/tidb-cloud/vector-search-index.md b/vector-search-index.md similarity index 100% rename from tidb-cloud/vector-search-index.md rename to vector-search-index.md diff --git a/vector-search-integrate-with-django-orm.md b/vector-search-integrate-with-django-orm.md index d369c56fd2e0c..2069a5bb06487 100644 --- a/vector-search-integrate-with-django-orm.md +++ b/vector-search-integrate-with-django-orm.md @@ -252,15 +252,5 @@ results = Document.objects.annotate( ## See also - - -- [Vector Data Types](/vector-search-data-types.md) - - - - - - [Vector Data Types](/vector-search-data-types.md) -- [Vector Search Index](/tidb-cloud/vector-search-index.md) - - +- [Vector Search Index](/vector-search-index.md) diff --git a/vector-search-integrate-with-jinaai-embedding.md b/vector-search-integrate-with-jinaai-embedding.md index c4e8a1135c2b7..89b04b02d9e28 100644 --- a/vector-search-integrate-with-jinaai-embedding.md +++ b/vector-search-integrate-with-jinaai-embedding.md @@ -257,15 +257,5 @@ with Session(engine) as session: ## See also - - -- [Vector Data Types](/vector-search-data-types.md) - - - - - - [Vector Data Types](/vector-search-data-types.md) -- [Vector Search Index](/tidb-cloud/vector-search-index.md) - - +- [Vector Search Index](/vector-search-index.md) diff --git a/vector-search-integrate-with-langchain.md b/vector-search-integrate-with-langchain.md index 78b176d846d8d..a682f733e72d5 100644 --- a/vector-search-integrate-with-langchain.md +++ b/vector-search-integrate-with-langchain.md @@ -600,15 +600,5 @@ The expected output is as follows: ## See also - - -- [Vector Data Types](/vector-search-data-types.md) - - - - - - [Vector Data Types](/vector-search-data-types.md) -- [Vector Search Index](/tidb-cloud/vector-search-index.md) - - +- [Vector Search Index](/vector-search-index.md) diff --git a/vector-search-integrate-with-llamaindex.md b/vector-search-integrate-with-llamaindex.md index 5af4db4af3d8e..b6f7e2c05ef10 100644 --- a/vector-search-integrate-with-llamaindex.md +++ b/vector-search-integrate-with-llamaindex.md @@ -279,15 +279,5 @@ Empty Response ## See also - - -- [Vector Data Types](/vector-search-data-types.md) - - - - - - [Vector Data Types](/vector-search-data-types.md) -- [Vector Search Index](/tidb-cloud/vector-search-index.md) - - +- [Vector Search Index](/vector-search-index.md) diff --git a/vector-search-integrate-with-peewee.md b/vector-search-integrate-with-peewee.md index 89adb8a1a02c7..b2ed15394fc06 100644 --- a/vector-search-integrate-with-peewee.md +++ b/vector-search-integrate-with-peewee.md @@ -242,15 +242,5 @@ results = Document.select(Document, distance).where(distance_expression < 0.2).o ## See also - - -- [Vector Data Types](/vector-search-data-types.md) - - - - - - [Vector Data Types](/vector-search-data-types.md) -- [Vector Search Index](/tidb-cloud/vector-search-index.md) - - +- [Vector Search Index](/vector-search-index.md) diff --git a/vector-search-integrate-with-sqlalchemy.md b/vector-search-integrate-with-sqlalchemy.md index ce0ef2153f704..f326b01e8935b 100644 --- a/vector-search-integrate-with-sqlalchemy.md +++ b/vector-search-integrate-with-sqlalchemy.md @@ -214,15 +214,5 @@ with Session(engine) as session: ## See also - - -- [Vector Data Types](/vector-search-data-types.md) - - - - - - [Vector Data Types](/vector-search-data-types.md) -- [Vector Search Index](/tidb-cloud/vector-search-index.md) - - +- [Vector Search Index](/vector-search-index.md) diff --git a/vector-search-limitations.md b/vector-search-limitations.md index e8596f3973669..2ff13f3df95fe 100644 --- a/vector-search-limitations.md +++ b/vector-search-limitations.md @@ -7,21 +7,6 @@ summary: Learn the limitations of the TiDB Vector Search. This document describes the known limitations of TiDB Vector Search. We are continuously working to enhance your experience by adding more features. - - -- TiDB Vector Search is only available for the following clusters. It is not available for TiDB Dedicated clusters. - - - TiDB Self-Hosted clusters with TiDB versions of 8.4.0 or later - - [TiDB Serverless](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters - -- Each [vector](/vector-search-data-types.md) supports up to 16,000 dimensions. - -- Vector data supports only single-precision floating-point numbers (Float32). - - - - - - TiDB Vector Search is only available for the following clusters. It is not available for TiDB Dedicated clusters. - [TiDB Serverless](/tidb-cloud/select-cluster-tier.md#tidb-serverless) clusters @@ -31,9 +16,7 @@ This document describes the known limitations of TiDB Vector Search. We are cont - Vector data supports only single-precision floating-point numbers (Float32). -- Only cosine distance and L2 distance are supported when you create a [vector search index](/tidb-cloud/vector-search-index.md). - - +- Only cosine distance and L2 distance are supported when you create a [vector search index](/vector-search-index.md). ## Feedback From c6fa2b354d60b8d92636d317e8ca2734da339c48 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Thu, 19 Sep 2024 17:03:16 +0800 Subject: [PATCH 07/16] Update vector-search-data-types.md --- vector-search-data-types.md | 75 +++++++++++++++++++------------------ 1 file changed, 39 insertions(+), 36 deletions(-) diff --git a/vector-search-data-types.md b/vector-search-data-types.md index f406d45d5fa0b..ad8d09ca741ac 100644 --- a/vector-search-data-types.md +++ b/vector-search-data-types.md @@ -5,26 +5,34 @@ summary: Learn about the Vector data types in TiDB. # Vector Data Types -TiDB provides Vector data type specifically optimized for AI Vector Embedding use cases. By using the Vector data type, you can store and query a sequence of floating numbers efficiently, such as `[0.3, 0.5, -0.1, ...]`. +A vector is a sequence of floating-point numbers, such as `[0.3, 0.5, -0.1, ...]`. TiDB offers Vector data types, specifically optimized for efficiently storing and querying vector embeddings widely used in AI applications. -The following Vector data type is currently available: + -- `VECTOR`: A sequence of single-precision floating numbers. The dimensions can be different for each row. -- `VECTOR(D)`: A sequence of single-precision floating numbers with a fixed dimension `D`. - -The Vector data type provides these advantages over storing in a `JSON` column: +> **Warning:** +> +> This feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. -- Vector Index support. A [Vector Search Index](/vector-search-index.md) can be built to speed up vector searching. -- Dimension enforcement. A dimension can be specified to forbid inserting vectors with different dimensions. -- Optimized storage format. Vector data types are stored even more space-efficient than `JSON` data type. + > **Note:** > -> Vector data types are only available for TiDB Self-Managed clusters [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. +> Vector data types are only available for TiDB Self-Managed clusters and [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. -## Value syntax +The following Vector data types are currently available: -A Vector value contains an arbitrary number of floating numbers. You can use a string in the following syntax to represent a Vector value: +- `VECTOR`: A sequence of single-precision floating-point numbers with any dimension. +- `VECTOR(D)`: A sequence of single-precision floating-point numbers with a fixed dimension `D`. + +Using vector data types provide the following advantages over using the [`JSON`](/data-type-json.md) type: + +- Vector Index support. A [vector search index](/vector-search-index.md) can be built to speed up vector searching. +- Dimension enforcement. A dimension can be specified to forbid inserting vectors with different dimensions. +- Optimized storage format. Vector data types are optimized for handling vector data, offering better space efficiency and performance compared to `JSON` types. + +## Syntax + +You can use a string in the following syntax to represent a Vector value: ```sql '[, , ...]' @@ -50,18 +58,18 @@ Inserting vector values with invalid syntax will result in an error: ERROR 1105 (HY000): Invalid vector text: [5, ] ``` -As dimension 3 is enforced for the `embedding` column in the preceding example, inserting a vector with a different dimension will result in an error: +In the following example, because dimension `3` is enforced for the `embedding` column when the table is created, inserting a vector with a different dimension will result in an error: ```sql [tidb]> INSERT INTO vector_table VALUES (4, '[0.3, 0.5]'); ERROR 1105 (HY000): vector has 2 dimensions, does not fit VECTOR(3) ``` -See [Vector Functions and Operators](/vector-search-functions-and-operators.md) for available functions and operators over the Vector data type. +See [Vector Functions and Operators](/vector-search-functions-and-operators.md) for available functions and operators over the Vector data types. See [Vector Search Index](/vector-search-index.md) for building and using a vector search index. -## Vectors with different dimensions +## Store vectors with different dimensions You can store vectors with different dimensions in the same column by omitting the dimension parameter in the `VECTOR` type: @@ -75,33 +83,28 @@ INSERT INTO vector_table VALUES (1, '[0.3, 0.5, -0.1]'); -- 3 dimensions vector, INSERT INTO vector_table VALUES (2, '[0.3, 0.5]'); -- 2 dimensions vector, OK ``` -However you cannot build a [Vector Search Index](/vector-search-index.md) for this column, as vector distances can be only calculated between vectors with the same dimensions. +However, note that you cannot build a [vector search index](/vector-search-index.md) for this column, as vector distances can be only calculated between vectors with the same dimensions. ## Comparison You can compare vector data types using [comparison operators](/functions-and-operators/operators.md) such as `=`, `!=`, `<`, `>`, `<=`, and `>=`. For a complete list of comparison operators and functions for vector data types, see [Vector Functions and Operators](/vector-search-functions-and-operators.md). -Vector data types are compared element-wise numerically. Examples: +Vector data types are compared element-wise numerically. For example: - `[1] < [12]` - `[1,2,3] < [1,2,5]` - `[1,2,3] = [1,2,3]` - `[2,2,3] > [1,2,3]` -Vectors with different dimensions are compared using lexicographical comparison, with the following properties: +Two vectors with different dimensions are compared using lexicographical comparison, with the following rules: -- Two vectors are compared element by element, and each element is compared numerically. +- Two vectors are compared element by element from the start, and each element is compared numerically. - The first mismatching element determines which vector is lexicographically _less_ or _greater_ than the other. -- If one vector is a prefix of another, the shorter vector is lexicographically _less_ than the other. +- If one vector is a prefix of another, the shorter vector is lexicographically _less_ than the other. For example, `[1,2,3] < [1,2,3,0]`. - Vectors of the same length with identical elements are lexicographically _equal_. -- An empty vector is lexicographically _less_ than any non-empty vector. +- An empty vector is lexicographically _less_ than any non-empty vector. For example, `[] < [1]`. - Two empty vectors are lexicographically _equal_. -Examples: - -- `[] < [1]` -- `[1,2,3] < [1,2,3,0]` - When comparing vector constants, consider performing an [explicit cast](#cast) from string to vector to avoid comparisons based on string values: ```sql @@ -126,7 +129,7 @@ When comparing vector constants, consider performing an [explicit cast](#cast) f ## Arithmetic -Vector data types support element-wise arithmetic operations `+` (addition) and `-` (subtraction). However, performing arithmetic operations between vectors with different dimensions results in an error. +Vector data types support arithmetic operations `+` (addition) and `-` (subtraction). However, arithmetic operations between vectors with different dimensions are not supported and will result in an error. Examples: @@ -162,10 +165,10 @@ To cast between Vector and String, use the following functions: - `VEC_FROM_TEXT`: String ⇒ Vector - `VEC_AS_TEXT`: Vector ⇒ String -There are implicit casts when calling functions receiving vector data types: +For ease of use, if you are calling a function that only supports vector data types (such as a vector correlation distance function), you can also just pass in a format-compliant string, because TiDB will perform an implicit cast in this case: ```sql --- There is an implicit cast here, since VEC_DIMS only accepts VECTOR arguments: +-- The VEC_DIMS function only accepts VECTOR arguments, so you can directly pass in a string for an implicit cast. [tidb]> SELECT VEC_DIMS('[0.3, 0.5, -0.1]'); +------------------------------+ | VEC_DIMS('[0.3, 0.5, -0.1]') | @@ -174,7 +177,7 @@ There are implicit casts when calling functions receiving vector data types: +------------------------------+ 1 row in set (0.01 sec) --- Cast explicitly using VEC_FROM_TEXT: +-- You can also explicitly cast a string to a vector using VEC_FROM_TEXT and then pass the vector to the VEC_DIMS function. [tidb]> SELECT VEC_DIMS(VEC_FROM_TEXT('[0.3, 0.5, -0.1]')); +---------------------------------------------+ | VEC_DIMS(VEC_FROM_TEXT('[0.3, 0.5, -0.1]')) | @@ -183,7 +186,7 @@ There are implicit casts when calling functions receiving vector data types: +---------------------------------------------+ 1 row in set (0.01 sec) --- Cast explicitly using CAST(... AS VECTOR): +-- You can also cast explicitly using CAST(... AS VECTOR): [tidb]> SELECT VEC_DIMS(CAST('[0.3, 0.5, -0.1]' AS VECTOR)); +----------------------------------------------+ | VEC_DIMS(CAST('[0.3, 0.5, -0.1]' AS VECTOR)) | @@ -193,7 +196,7 @@ There are implicit casts when calling functions receiving vector data types: 1 row in set (0.01 sec) ``` -Use explicit casts when operators or functions accept multiple data types. For example, in comparisons, use explicit casts to compare vector numeric values instead of string values: +When using an operator or function that accept multiple data types, you need to explicitly cast the string type to the vector type before passing the string to that operator or function, because TiDB does not perform implicit casts in this case. For example, before performing comparison operations, you need to explicitly cast strings to vectors; otherwise, TiDB will compare them as string values rather than as vector numeric values: ```sql -- Because string is given, TiDB is comparing strings: @@ -215,10 +218,10 @@ Use explicit casts when operators or functions accept multiple data types. For e 1 row in set (0.01 sec) ``` -To cast vector into its string representation explicitly, use the `VEC_AS_TEXT()` function: +You can also explicitly cast a vector to its string representation. Take using the `VEC_AS_TEXT()` function as an example: ```sql --- String representation is normalized: +-- The string is first implicitly casted to a vector, and then the vector is explicitly casted to a string, thus returning a string in the normalized format: [tidb]> SELECT VEC_AS_TEXT('[0.3, 0.5, -0.1]'); +--------------------------------------+ | VEC_AS_TEXT('[0.3, 0.5, -0.1]') | @@ -232,13 +235,13 @@ For additional cast functions, see [Vector Functions and Operators](/vector-sear ### Cast between Vector ⇔ other data types -It is currently not possible to cast between Vector and other data types (like `JSON`) directly. You need to use String as an intermediate type. +Currently, direct casting between Vector and other data types (such as `JSON`) is not supported, but you can use String as an intermediate data type for casting. ## Restrictions - The maximum supported Vector dimension is 16000. - You cannot store `NaN`, `Infinity`, or `-Infinity` values in the vector data type. -- Currently, Vector data types cannot store double-precision floating point numbers. This is planned to be supported in a future release. In the meantime, if you import double-precision floating point numbers for Vector data types, they are converted to single-precision numbers. +- Currently, Vector data types cannot store double-precision floating-point numbers. This is planned to be supported in a future release. In the meantime, if you import double-precision floating-point numbers for Vector data types, they are converted to single-precision numbers. For other limitations, see [Vector Search Limitations](/vector-search-limitations.md). From 766eccd63075825b0e76510d2a85ca86c0e1f4aa Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Fri, 20 Sep 2024 14:45:12 +0800 Subject: [PATCH 08/16] Update vector-search-functions-and-operators.md --- vector-search-functions-and-operators.md | 80 +++++++++++++----------- 1 file changed, 45 insertions(+), 35 deletions(-) diff --git a/vector-search-functions-and-operators.md b/vector-search-functions-and-operators.md index d34a63cebb91f..208dd4fe0c99b 100644 --- a/vector-search-functions-and-operators.md +++ b/vector-search-functions-and-operators.md @@ -1,19 +1,29 @@ --- title: Vector Functions and Operators -summary: Learn about functions and operators available for Vector Data Types. +summary: Learn about functions and operators available for Vector data types. --- # Vector Functions and Operators +This document lists the functions and operators available for Vector data types. + + + +> **Warning:** +> +> This feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. + + + > **Note:** > > Vector data types and these vector functions are only available for TiDB Self-Managed clusters and [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. ## Vector functions -The following functions are designed specifically for [Vector Data Types](/vector-search-data-types.md). +The following functions are designed specifically for [Vector data types](/vector-search-data-types.md). -**Vector Distance Functions:** +**Vector distance functions:** | Function Name | Description | | --------------------------------------------------------- | ---------------------------------------------------------------- | @@ -22,7 +32,7 @@ The following functions are designed specifically for [Vector Data Types](/vecto | [VEC_NEGATIVE_INNER_PRODUCT](#vec_negative_inner_product) | Calculates the negative of the inner product between two vectors | | [VEC_L1_DISTANCE](#vec_l1_distance) | Calculates L1 distance (Manhattan distance) between two vectors | -**Other Vector Functions:** +**Other vector functions:** | Function Name | Description | | ------------------------------- | --------------------------------------------------- | @@ -33,7 +43,7 @@ The following functions are designed specifically for [Vector Data Types](/vecto ## Extended built-in functions and operators -The following built-in functions and operators are extended, supporting operating on [Vector Data Types](/vector-search-data-types.md). +The following built-in functions and operators are extended to support operations on [Vector data types](/vector-search-data-types.md). **Arithmetic operators:** @@ -65,8 +75,8 @@ For more information about how vector arithmetic works, see [Vector Data Type | | [`>=`](https://dev.mysql.com/doc/refman/8.0/en/comparison-operators.html#operator_greater-than-or-equal) | Greater than or equal operator | | [`GREATEST()`](https://dev.mysql.com/doc/refman/8.0/en/comparison-operators.html#function_greatest) | Return the largest argument | | [`IN()`](https://dev.mysql.com/doc/refman/8.0/en/comparison-operators.html#operator_in) | Check whether a value is within a set of values | -| [`IS NULL`](https://dev.mysql.com/doc/refman/8.0/en/comparison-operators.html#operator_is-null) | NULL value test | -| [`ISNULL()`](https://dev.mysql.com/doc/refman/8.0/en/comparison-operators.html#function_isnull) | Test whether the argument is NULL | +| [`IS NULL`](https://dev.mysql.com/doc/refman/8.0/en/comparison-operators.html#operator_is-null) | Test whether a value is `NULL` | +| [`ISNULL()`](https://dev.mysql.com/doc/refman/8.0/en/comparison-operators.html#function_isnull) | Test whether the argument is `NULL` | | [`LEAST()`](https://dev.mysql.com/doc/refman/8.0/en/comparison-operators.html#function_least) | Return the smallest argument | | [`<`](https://dev.mysql.com/doc/refman/8.0/en/comparison-operators.html#operator_less-than) | Less than operator | | [`<=`](https://dev.mysql.com/doc/refman/8.0/en/comparison-operators.html#operator_less-than-or-equal) | Less than or equal operator | @@ -83,14 +93,14 @@ For more information about how vectors are compared, see [Vector Data Type | Com | [`CASE`](https://dev.mysql.com/doc/refman/8.0/en/flow-control-functions.html#operator_case) | Case operator | | [`IF()`](https://dev.mysql.com/doc/refman/8.0/en/flow-control-functions.html#function_if) | If/else construct | | [`IFNULL()`](https://dev.mysql.com/doc/refman/8.0/en/flow-control-functions.html#function_ifnull) | Null if/else construct | -| [`NULLIF()`](https://dev.mysql.com/doc/refman/8.0/en/flow-control-functions.html#function_nullif) | Return NULL if expr1 = expr2 | +| [`NULLIF()`](https://dev.mysql.com/doc/refman/8.0/en/flow-control-functions.html#function_nullif) | Return `NULL` if expr1 = expr2 | **Cast functions:** | Name | Description | | :------------------------------------------------------------------------------------------ | :----------------------------- | -| [`CAST()`](https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html#function_cast) | Cast a value as a certain type | -| [`CONVERT()`](https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html#function_convert) | Cast a value as a certain type | +| [`CAST()`](https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html#function_cast) | Cast a value as a string or vector | +| [`CONVERT()`](https://dev.mysql.com/doc/refman/8.0/en/cast-functions.html#function_convert) | Cast a value as a string | For more information about how to use `CAST()`, see [Vector Data Type | Cast](/vector-search-data-types.md#cast). @@ -102,16 +112,16 @@ For more information about how to use `CAST()`, see [Vector Data Type | Cast](/v VEC_L2_DISTANCE(vector1, vector2) ``` -Calculates the L2 distance (Euclidean distance) between two vectors using the following formula: +Calculates the [L2 distance](https://en.wikipedia.org/wiki/Euclidean_distance) (Euclidean distance) between two vectors using the following formula: $DISTANCE(p,q)=\sqrt {\sum \limits _{i=1}^{n}{(p_{i}-q_{i})^{2}}}$ -The two vectors must have the same dimension. Otherwise an error is returned. +The two vectors must have the same dimension. Otherwise, an error is returned. -Examples: +Example: ```sql -[tidb]> select VEC_L2_DISTANCE('[0,3]', '[4,0]'); +[tidb]> SELECT VEC_L2_DISTANCE('[0,3]', '[4,0]'); +-----------------------------------+ | VEC_L2_DISTANCE('[0,3]', '[4,0]') | +-----------------------------------+ @@ -125,16 +135,16 @@ Examples: VEC_COSINE_DISTANCE(vector1, vector2) ``` -Calculates the cosine distance between two vectors using the following formula: +Calculates the [cosine distance](https://en.wikipedia.org/wiki/Cosine_similarity) between two vectors using the following formula: $DISTANCE(p,q)=1.0 - {\frac {\sum \limits _{i=1}^{n}{p_{i}q_{i}}}{{\sqrt {\sum \limits _{i=1}^{n}{p_{i}^{2}}}}\cdot {\sqrt {\sum \limits _{i=1}^{n}{q_{i}^{2}}}}}}$ -The two vectors must have the same dimension. Otherwise an error is returned. +The two vectors must have the same dimension. Otherwise, an error is returned. -Examples: +Example: ```sql -[tidb]> select VEC_COSINE_DISTANCE('[1, 1]', '[-1, -1]'); +[tidb]> SELECT VEC_COSINE_DISTANCE('[1, 1]', '[-1, -1]'); +-------------------------------------------+ | VEC_COSINE_DISTANCE('[1, 1]', '[-1, -1]') | +-------------------------------------------+ @@ -148,16 +158,16 @@ Examples: VEC_NEGATIVE_INNER_PRODUCT(vector1, vector2) ``` -Calculates the distance by using the negative of the inner product between two vectors, using the following formula: +Calculates the distance by using the negative of the [inner product](https://en.wikipedia.org/wiki/Dot_product) between two vectors, using the following formula: $DISTANCE(p,q)=- INNER\_PROD(p,q)=-\sum \limits _{i=1}^{n}{p_{i}q_{i}}$ -The two vectors must have the same dimension. Otherwise an error is returned. +The two vectors must have the same dimension. Otherwise, an error is returned. -Examples: +Example: ```sql -[tidb]> select VEC_NEGATIVE_INNER_PRODUCT('[1,2]', '[3,4]'); +[tidb]> SELECT VEC_NEGATIVE_INNER_PRODUCT('[1,2]', '[3,4]'); +----------------------------------------------+ | VEC_NEGATIVE_INNER_PRODUCT('[1,2]', '[3,4]') | +----------------------------------------------+ @@ -171,16 +181,16 @@ Examples: VEC_L1_DISTANCE(vector1, vector2) ``` -Calculates the L1 distance (Manhattan distance) between two vectors using the following formula: +Calculates the [L1 distance](https://en.wikipedia.org/wiki/Taxicab_geometry) (Manhattan distance) between two vectors using the following formula: $DISTANCE(p,q)=\sum \limits _{i=1}^{n}{|p_{i}-q_{i}|}$ -The two vectors must have the same dimension. Otherwise an error is returned. +The two vectors must have the same dimension. Otherwise, an error is returned. -Examples: +Example: ```sql -[tidb]> select VEC_L1_DISTANCE('[0,0]', '[3,4]'); +[tidb]> SELECT VEC_L1_DISTANCE('[0,0]', '[3,4]'); +-----------------------------------+ | VEC_L1_DISTANCE('[0,0]', '[3,4]') | +-----------------------------------+ @@ -199,14 +209,14 @@ Returns the dimension of a vector. Examples: ```sql -[tidb]> select VEC_DIMS('[1,2,3]'); +[tidb]> SELECT VEC_DIMS('[1,2,3]'); +---------------------+ | VEC_DIMS('[1,2,3]') | +---------------------+ | 3 | +---------------------+ -[tidb]> select VEC_DIMS('[]'); +[tidb]> SELECT VEC_DIMS('[]'); +----------------+ | VEC_DIMS('[]') | +----------------+ @@ -220,14 +230,14 @@ Examples: VEC_L2_NORM(vector) ``` -Calculates the L2 norm (Euclidean norm) of a vector using the following formula: +Calculates the [L2 norm](https://en.wikipedia.org/wiki/Norm_(mathematics)) (Euclidean norm) of a vector using the following formula: $NORM(p)=\sqrt {\sum \limits _{i=1}^{n}{p_{i}^{2}}}$ -Examples: +Example: ```sql -[tidb]> select VEC_L2_NORM('[3,4]'); +[tidb]> SELECT VEC_L2_NORM('[3,4]'); +----------------------+ | VEC_L2_NORM('[3,4]') | +----------------------+ @@ -243,10 +253,10 @@ VEC_FROM_TEXT(string) Converts a string into a vector. -Examples: +Example: ```sql -[tidb]> select VEC_FROM_TEXT('[1,2]') + VEC_FROM_TEXT('[3,4]'); +[tidb]> SELECT VEC_FROM_TEXT('[1,2]') + VEC_FROM_TEXT('[3,4]'); +-------------------------------------------------+ | VEC_FROM_TEXT('[1,2]') + VEC_FROM_TEXT('[3,4]') | +-------------------------------------------------+ @@ -262,10 +272,10 @@ VEC_AS_TEXT(vector) Converts a vector into a string. -Examples: +Example: ```sql -[tidb]> select VEC_AS_TEXT('[1.000, 2.5]'); +[tidb]> SELECT VEC_AS_TEXT('[1.000, 2.5]'); +-------------------------------+ | VEC_AS_TEXT('[1.000, 2.5]') | +-------------------------------+ From 088ec475a4589973c45baf08d75b4dc7a97592e4 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Fri, 20 Sep 2024 16:23:40 +0800 Subject: [PATCH 09/16] Update vector-search-overview.md --- vector-search-overview.md | 41 ++++++++++++++++++++++++++++----------- 1 file changed, 30 insertions(+), 11 deletions(-) diff --git a/vector-search-overview.md b/vector-search-overview.md index d6640c648ec58..022ea38517147 100644 --- a/vector-search-overview.md +++ b/vector-search-overview.md @@ -1,25 +1,41 @@ --- -title: Vector Search (Beta) Overview +title: Vector Search Overview summary: Learn about Vector Search in TiDB Cloud. This feature provides an advanced search solution for performing semantic similarity searches across various data types, including documents, images, audio, and video. --- -# Vector Search (Beta) Overview +# Vector Search Overview -TiDB Vector Search (beta) provides an advanced search solution for performing semantic similarity searches across various data types, including documents, images, audio, and video. This feature enables developers to easily build scalable applications with generative artificial intelligence (AI) capabilities using familiar MySQL skills. +TiDB Vector Search provides an advanced search solution for performing semantic similarity searches across various data types, including documents, images, audio, and video. This feature enables developers to easily build scalable applications with generative artificial intelligence (AI) capabilities using familiar MySQL skills. + + + +> **Warning:** +> +> The vector search feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. + + -> **Note** +> **Warning:** > -> TiDB Vector Search is currently in beta and only available for [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. +> The vector search feature is in beta. It might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. +> **Note:** +> +> The vector search feature is only available for TiDB Self-Managed clusters and [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. + ## Concepts -Vector search is a search method that prioritizes the meaning of your data to deliver relevant results. This differs from traditional full-text search, which relies primarily on exact keyword matches and word frequency. +Vector search is a search method that prioritizes the meaning of your data to deliver relevant results. + +Unlike traditional full-text search, which relies on exact keyword matching and word frequency, vector search converts various data types (such as text, images, or audio) into high-dimensional vectors and queries based on the similarity between these vectors. This search method captures the semantic meaning and contextual information of the data, leading to a more precise understanding of user intent. -For example, a full-text search for "a swimming animal" only returns results with those exact keywords. In contrast, vector search can return results for other swimming animals, such as fish or ducks, even if the exact keywords are not present. +Even when the search terms do not exactly match the content in the database, vector search can still provide results that align with the user's intent by analyzing the semantics of the data. + +For example, a full-text search for "a swimming animal" only returns results containing these exact keywords. In contrast, vector search can return results for other swimming animals, such as fish or ducks, even if these results do not contain the exact keywords. ### Vector embedding @@ -33,19 +49,22 @@ TiDB introduces [Vector data types](/vector-search-data-types.md) designed to op Embedding models are algorithms that transform data into [vector embeddings](#vector-embedding). -Selecting an appropriate embedding model is crucial for ensuring the accuracy and relevance of semantic search results. For unstructured text data, you can find top-performing text embedding models on the [Massive Text Embedding Benchmark (MTEB) Leaderboard](https://huggingface.co/spaces/mteb/leaderboard). +Choosing an appropriate embedding model is crucial for ensuring the accuracy and relevance of semantic search results. For unstructured text data, you can find top-performing text embedding models on the [Massive Text Embedding Benchmark (MTEB) Leaderboard](https://huggingface.co/spaces/mteb/leaderboard). -To learn how to generate vector embeddings for your specific data types, refer to the embedding provider integration tutorials or examples. +To learn how to generate vector embeddings for your specific data types, refer to integration tutorials or examples of embedding models. ## How vector search works After converting raw data into vector embeddings and storing them in TiDB, your application can execute vector search queries to find the data most semantically or contextually relevant to a user's query. -Vector Search in TiDB Cloud identifies the top-k nearest neighbor (KNN) vectors by using a [distance function](/vector-search-functions-and-operators.md) to calculate the distance between the given vector and vectors stored in the database. The vectors closest to the query represent the most similar data in meaning. +TiDB vector search identifies the top-k nearest neighbor (KNN) vectors by using a [distance function](/vector-search-functions-and-operators.md) to calculate the distance between the given vector and vectors stored in the database. The vectors closest to the given vector in the query represent the most similar data in meaning. ![The Schematic TiDB Vector Search](/media/vector-search/embedding-search.png) -As a relational database with integrated vector search capabilities, TiDB enables you to store data and their corresponding vector embeddings together in one database. You can store them in the same table using different columns, or separate them into different tables and combine them using `JOIN` queries when retrieving. +As a relational database with integrated vector search capabilities, TiDB enables you to store data and their corresponding vector representations (that is, vector embeddings) together in one database. You can choose any of the following ways for storage: + +- Store data and their corresponding vector representations in different columns of the same table. +- Store data and their corresponding vector representation in different tables. In this way, you need to use `JOIN` queries to combine the tables when retrieving data. ## Use cases From 0bf4cba60cf1b9d44e4941380cec14d0f8746a7e Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Mon, 23 Sep 2024 10:52:58 +0800 Subject: [PATCH 10/16] update the getting started docs --- vector-search-get-started-using-python.md | 77 +++++++++++++++++------ vector-search-get-started-using-sql.md | 63 ++++++++++++++----- 2 files changed, 107 insertions(+), 33 deletions(-) diff --git a/vector-search-get-started-using-python.md b/vector-search-get-started-using-python.md index 522de647c8cfa..42378175adb45 100644 --- a/vector-search-get-started-using-python.md +++ b/vector-search-get-started-using-python.md @@ -5,40 +5,50 @@ summary: Learn how to quickly develop an AI application that performs semantic s # Get Started with TiDB + AI via Python -This tutorial demonstrates how to develop a simple AI application that provides **semantic search** features. Unlike traditional keyword search, semantic search intelligently understands the meaning behind your query. For example, if you have documents titled "dog", "fish", and "tree", and you search for "a swimming animal", the application would identify "fish" as the most relevant result. +This tutorial demonstrates how to develop a simple AI application that provides **semantic search** features. Unlike traditional keyword search, semantic search intelligently understands the meaning behind your query and returns the most relevant result. For example, if you have documents titled "dog", "fish", and "tree", and you search for "a swimming animal", the application would identify "fish" as the most relevant result. Throughout this tutorial, you will develop this AI application using [TiDB Vector Search](/vector-search-overview.md), Python, [TiDB Vector SDK for Python](https://github.com/pingcap/tidb-vector-python), and AI models. - + -> **Note** +> **Warning:** > -> TiDB Vector Search is currently in beta and only available for [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. +> The vector search feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. +> **Note:** +> +> The vector search feature is only available for TiDB Self-Managed clusters and [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. + ## Prerequisites To complete this tutorial, you need: - - - [Python 3.8 or higher](https://www.python.org/downloads/) installed. - [Git](https://git-scm.com/downloads) installed. -- A TiDB cluster. Follow [Deploy a local test TiDB cluster](/quick-start-with-tidb.md#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](/production-deployment-using-tiup.md) to create a local cluster. +- A TiDB cluster. + + + +**If you don't have a TiDB cluster, you can create one as follows:** + +- Follow [Deploy a local test TiDB cluster](/quick-start-with-tidb.md#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](/production-deployment-using-tiup.md) to create a local cluster. +- Follow [Creating a TiDB Cloud Serverless cluster](/develop/dev-guide-build-cluster-in-cloud.md) to create your own TiDB Cloud cluster. -- [Python 3.8 or higher](https://www.python.org/downloads/) installed. -- [Git](https://git-scm.com/downloads) installed. -- A TiDB Cloud Serverless cluster. Follow [creating a TiDB Cloud Serverless cluster](/tidb-cloud/create-tidb-cluster-serverless.md) to create your own TiDB Cloud cluster if you don't have one. +**If you don't have a TiDB cluster, you can create one as follows:** + +- (Recommended) Follow [Creating a TiDB Cloud Serverless cluster](/develop/dev-guide-build-cluster-in-cloud.md) to create your own TiDB Cloud cluster. +- Follow [Deploy a local test TiDB cluster](https://docs.pingcap.com/tidb/stable/quick-start-with-tidb#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](https://docs.pingcap.com/tidb/stable/production-deployment-using-tiup) to create a local cluster. ## Get started -To run the demo directly, check out the sample code in the [pingcap/tidb-vector-python](https://github.com/pingcap/tidb-vector-python/blob/main/examples/python-client-quickstart) repository. +The following steps show how to develop the application from scratch. To run the demo directly, you can check out the sample code in the [pingcap/tidb-vector-python](https://github.com/pingcap/tidb-vector-python/blob/main/examples/python-client-quickstart) repository. ### Step 1. Create a new Python project @@ -58,11 +68,18 @@ In your project directory, run the following command to install the required pac pip install sqlalchemy pymysql sentence-transformers tidb-vector python-dotenv ``` -- `tidb-vector`: the Python client for interacting with Vector Search in TiDB Cloud. +- `tidb-vector`: the Python client for interacting with TiDB vector search. - [`sentence-transformers`](https://sbert.net): a Python library that provides pre-trained models for generating [vector embeddings](/vector-search-overview.md#vector-embedding) from text. ### Step 3. Configure the connection string to the TiDB cluster +Configure the cluster connection string depending on the TiDB deployment option you've selected. + + +
+ +For a TiDB Cloud Serverless cluster, take the following steps to obtain the cluster connection string and configure environment variables: + 1. Navigate to the [**Clusters**](https://tidbcloud.com/console/clusters) page, and then click the name of your target cluster to go to its overview page. 2. Click **Connect** in the upper-right corner. A connection dialog is displayed. @@ -92,6 +109,30 @@ pip install sqlalchemy pymysql sentence-transformers tidb-vector python-dotenv TIDB_DATABASE_URL="mysql+pymysql://.root:@gateway01..prod.aws.tidbcloud.com:4000/test?ssl_ca=/etc/ssl/cert.pem&ssl_verify_cert=true&ssl_verify_identity=true" ``` +
+
+ +For a TiDB Self-Managed cluster, create a `.env` file in the root directory of your Python project. Copy the following content into the `.env` file, and modify the environment variable values according to the connection parameters of your TiDB cluster: + +```dotenv +TIDB_DATABASE_URL="mysql+pymysql://:@:/" +# For example: TIDB_DATABASE_URL="mysql+pymysql://root@127.0.0.1:4000/test" +``` + +If you are running TiDB on your local machine, the `HOST` is `127.0.0.1` by default. The initial `PASSWORD` is empty, so if you are starting the cluster for the first time, you can omit this field. + +The following are descriptions for each parameter: + +- ``: The host of the TiDB cluster. +- ``: The port of the TiDB cluster. +- ``: The username to connect to the TiDB cluster. +- ``: The password to connect to the TiDB cluster. +- ``: The name of the database you want to connect to. + +
+ +
+ ### Step 4. Initialize the embedding model An [embedding model](/vector-search-overview.md#embedding-model) transforms data into [vector embeddings](/vector-search-overview.md#vector-embedding). This example uses the pre-trained model [**msmarco-MiniLM-L12-cos-v5**](https://huggingface.co/sentence-transformers/msmarco-MiniLM-L12-cos-v5) for text embedding. This lightweight model, provided by the `sentence-transformers` library, transforms text data into 384-dimensional vector embeddings. @@ -113,11 +154,11 @@ def text_to_embedding(text): ### Step 5. Connect to the TiDB cluster -Use the `TiDBVectorClient` class to connect to your TiDB cluster and create a table `embedded_documents` with a vector column to serve as the vector store. +Use the `TiDBVectorClient` class to connect to your TiDB cluster and create a table `embedded_documents` with a vector column. > **Note** > -> Ensure the dimension of your vector column matches the dimension of the vectors produced by your embedding model. For example, the **msmarco-MiniLM-L12-cos-v5** model generates vectors with 384 dimensions. +> Make sure the dimension of your vector column in the table matches the dimension of the vectors generated by your embedding model. For example, the **msmarco-MiniLM-L12-cos-v5** model generates vectors with 384 dimensions, so the dimension of your vector columns in `embedded_documents` should be 384 as well. ```python import os @@ -128,13 +169,13 @@ from dotenv import load_dotenv load_dotenv() vector_store = TiDBVectorClient( - # The table which will store the vector data. + # The 'embedded_documents' table will store the vector data. table_name='embedded_documents', # The connection string to the TiDB cluster. connection_string=os.environ.get('TIDB_DATABASE_URL'), # The dimension of the vector generated by the embedding model. vector_dimension=embed_model_dims, - # Determine whether to recreate the table if it already exists. + # Recreate the table if it already exists. drop_existing_table=True, ) ``` @@ -200,9 +241,9 @@ Search result ("a swimming animal"): - text: "tree", distance: 0.798545178640937 ``` -From the output, the swimming animal is most likely a fish, or a dog with a gift for swimming. +The three terms in the search results are sorted by their respective distance from the queried vector: the smaller the distance, the more relevant the corresponding `document`. -This demonstration shows how vector search can efficiently locate the most relevant documents, with search results organized by the proximity of the vectors: the smaller the distance, the more relevant the document. +Therefore, according to the output, the swimming animal is most likely a fish, or a dog with a gift for swimming. ## See also diff --git a/vector-search-get-started-using-sql.md b/vector-search-get-started-using-sql.md index fc411b5cac56b..b74d04342a949 100644 --- a/vector-search-get-started-using-sql.md +++ b/vector-search-get-started-using-sql.md @@ -1,41 +1,52 @@ --- title: Get Started with Vector Search via SQL -summary: Learn how to quickly get started with Vector Search in TiDB Cloud using SQL statements and power the generative AI application. +summary: Learn how to quickly get started with Vector Search in TiDB using SQL statements to power your generative AI applications. --- # Get Started with Vector Search via SQL TiDB extends MySQL syntax to support [Vector Search](/vector-search-overview.md) and introduce new [Vector data types](/vector-search-data-types.md) and several [vector functions](/vector-search-functions-and-operators.md). -This tutorial demonstrates how to get started with TiDB Vector Search just using SQL statements. You will learn how to use the [MySQL command-line client](https://dev.mysql.com/doc/refman/8.4/en/mysql.html) to: +This tutorial demonstrates how to get started with TiDB Vector Search just using SQL statements. You will learn how to use the [MySQL command-line client](https://dev.mysql.com/doc/refman/8.4/en/mysql.html) to complete the following operations: - Connect to your TiDB cluster. - Create a vector table. - Store vector embeddings. - Perform vector search queries. - + -> **Note** +> **Warning:** > -> TiDB Vector Search is currently in beta and only available for [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. +> The vector search feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. +> **Note:** +> +> The vector search feature is only available for TiDB Self-Managed clusters and [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. + ## Prerequisites To complete this tutorial, you need: +- [MySQL command-line client](https://dev.mysql.com/doc/refman/8.4/en/mysql.html) (MySQL CLI) installed on your machine. +- A TiDB cluster. + -- [MySQL command-line client](https://dev.mysql.com/doc/refman/8.4/en/mysql.html) (MySQL CLI) installed on your machine. -- A TiDB cluster. Follow [Deploy a local test TiDB cluster](/quick-start-with-tidb.md#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](/production-deployment-using-tiup.md) to create a local cluster. +**If you don't have a TiDB cluster, you can create one as follows:** + +- Follow [Deploy a local test TiDB cluster](/quick-start-with-tidb.md#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](/production-deployment-using-tiup.md) to create a local cluster. +- Follow [Creating a TiDB Cloud Serverless cluster](/develop/dev-guide-build-cluster-in-cloud.md) to create your own TiDB Cloud cluster. -- [MySQL command-line client](https://dev.mysql.com/doc/refman/8.4/en/mysql.html) (MySQL CLI) installed on your machine. -- A TiDB Cloud Serverless cluster. Follow [creating a TiDB Cloud Serverless cluster](/tidb-cloud/create-tidb-cluster-serverless.md) to create your own TiDB Cloud cluster if you don't have one. +**If you don't have a TiDB cluster, you can create one as follows:** + +- (Recommended) Follow [Creating a TiDB Cloud Serverless cluster](/develop/dev-guide-build-cluster-in-cloud.md) to create your own TiDB Cloud cluster. +- Follow [Deploy a local test TiDB cluster](https://docs.pingcap.com/tidb/stable/quick-start-with-tidb#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](https://docs.pingcap.com/tidb/stable/production-deployment-using-tiup) to create a local cluster. @@ -43,6 +54,11 @@ To complete this tutorial, you need: ### Step 1. Connect to the TiDB cluster +Connect to your TiDB cluster depending on the TiDB deployment option you've selected. + + +
+ 1. Navigate to the [**Clusters**](https://tidbcloud.com/console/clusters) page, and then click the name of your target cluster to go to its overview page. 2. Click **Connect** in the upper-right corner. A connection dialog is displayed. @@ -57,11 +73,26 @@ To complete this tutorial, you need: mysql -u '.root' -h '' -P 4000 -D 'test' --ssl-mode=VERIFY_IDENTITY --ssl-ca=/etc/ssl/cert.pem -p'' ``` +
+
+ +After your TiDB Self-Managed cluster is started, execute your cluster connection command in the terminal. + +The following is an example connection command for MacOS: + +```bash + mysql --comments --host 127.0.0.1 --port 4000 -u root +``` + +
+ +
+ ### Step 2. Create a vector table -With vector search support, you can use the `VECTOR` type column to store [vector embeddings](/vector-search-overview.md#vector-embedding) in TiDB. +When creating a table, you can define a column as a [vector](/vector-search-overview.md#vector-embedding) column by specifying the `VECTOR` data type. -To create a table with a three-dimensional `VECTOR` column, execute the following SQL statements using your MySQL CLI: +For example, to create a table `embedded_documents` with a three-dimensional `VECTOR` column, execute the following SQL statements using your MySQL CLI: ```sql USE test; @@ -80,7 +111,7 @@ The expected output is as follows: Query OK, 0 rows affected (0.27 sec) ``` -### Step 3. Store the vector embeddings +### Step 3. Insert vector embeddings to the table Insert three documents with their [vector embeddings](/vector-search-overview.md#vector-embedding) into the `embedded_documents` table: @@ -130,9 +161,9 @@ The expected output is as follows: Similar to full-text search, users provide search terms to the application when using vector search. -In this example, the search term is "a swimming animal", and its corresponding vector embedding is `[1,2,3]`. In practical applications, you need to use an embedding model to convert the user's search term into a vector embedding. +In this example, the search term is "a swimming animal", and its corresponding vector embedding is assumed to be `[1,2,3]`. In practical applications, you need to use an embedding model to convert the user's search term into a vector embedding. -Execute the following SQL statement and TiDB will identify the top three documents closest to the search term by calculating and sorting the cosine distances (`vec_cosine_distance`) between the vector embeddings. +Execute the following SQL statement, and TiDB will identify the top three documents closest to `[1,2,3]` by calculating and sorting the cosine distances (`vec_cosine_distance`) between the vector embeddings in the table. ```sql SELECT id, document, vec_cosine_distance(embedding, '[1,2,3]') AS distance @@ -154,7 +185,9 @@ The expected output is as follows: 3 rows in set (0.15 sec) ``` -From the output, the swimming animal is most likely a fish, or a dog with a gift for swimming. +The three terms in the search results are sorted by their respective distance from the queried vector: the smaller the distance, the more relevant the corresponding `document`. + +Therefore, according to the output, the swimming animal is most likely a fish, or a dog with a gift for swimming. ## See also From c29d77382249749611565a566a7fb6e71fa20cbb Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Mon, 23 Sep 2024 11:32:14 +0800 Subject: [PATCH 11/16] Update vector-search-improve-performance.md --- vector-search-improve-performance.md | 26 +++++++++++++++++--------- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/vector-search-improve-performance.md b/vector-search-improve-performance.md index 6cf5200ca9870..079ed328618ca 100644 --- a/vector-search-improve-performance.md +++ b/vector-search-improve-performance.md @@ -5,13 +5,21 @@ summary: Learn best practices for improving the performance of TiDB Vector Searc # Improve Vector Search Performance -TiDB Vector Search allows you to perform ANN queries that search for results similar to an image, document and so on. To improve the query performance, review the following best practices. +TiDB Vector Search allows you to perform Approximate Nearest Neighbor (ANN) queries that search for results similar to an image, document and so on. To improve the query performance, review the following best practices. -## Add vector search index for vector columns + -> **Note** +> **Warning:** > -> This practice is only applicable to [TiDB Serverless](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters. +> The vector search feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. + + + +> **Note:** +> +> The vector search feature is only available for TiDB Self-Managed clusters and [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. + +## Add vector search index for vector columns The [vector search index](https://docs.pingcap.com/tidbcloud/vector-search-index) dramatically improves the performance of vector search queries, usually by 10x or more, with a trade-off of only a small decrease of recall rate. @@ -25,9 +33,9 @@ Vector indexes are built asynchronously. Until all vector data is indexed, vecto ## Reduce vector dimensions or shorten embeddings -The computational complexity of vector search indexing and queries increases significantly as the size of vectors grows, necessitating more floating point comparisons. +The computational complexity of vector search indexing and queries increases significantly as the dimension of vectors grows, necessitating more floating point comparisons. -To optimize performance, consider reducing the vector dimensions whenever feasible. This usually needs switching to another embedding model. Make sure to measure the impact of changing embedding models on the accuracy of your vector queries. +To optimize performance, consider reducing the vector dimensions whenever feasible. This usually needs switching to another embedding model. When switching models, you need to evaluate the impact of the model change on the accuracy of vector queries. Certain embedding models like OpenAI `text-embedding-3-large` support [shortening embeddings](https://openai.com/index/new-embedding-models-and-api-updates/), which removes some numbers from the end of vector sequences without losing the embedding's concept-representing properties. You can also use such an embedding model to reduce the vector dimensions. @@ -35,10 +43,10 @@ Certain embedding models like OpenAI `text-embedding-3-large` support [shortenin Vector embedding data are usually large and only used during the search process. By excluding vector columns from the query results, you can greatly reduce the amount of data transferred between the TiDB server and your SQL client, thereby improving query performance. -To exclude vector columns, explicitly list the columns you want to retrieve in the `SELECT` clause, instead of using `SELECT *`. +To exclude vector columns, explicitly list the columns you want to retrieve in the `SELECT` clause, instead of using `SELECT *` to retrieve all columns. ## Warm up the index -When an index is cold accessed, it takes time to load the whole index from S3, or load from disk (instead of from memory). Such processes usually result in high tail latency. Additionally, if no SQL queries exist on a cluster for a long time (e.g. hours), the compute resource is reclaimed and will result in cold access next time. +When accessing an index that has never been used or has not been accessed for a long time (cold access), TiDB needs to load the entire index from S3 or disk (instead of from memory). This process takes some time and often results in higher query latency. Additionally, if there are no SQL queries for an extended period (for example, several hours), the computing resources is reclaimed, causing subsequent access to become cold access. -To avoid such tail latencies, warm up your index before actual workload by using similar vector search queries that hit the vector index. +To avoid such query latency, warm up your index before actual workload by running similar vector search queries that hit the vector index. \ No newline at end of file From 8971f315ede3a33522edd1fc7cb2cf31483f60eb Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Mon, 23 Sep 2024 15:00:05 +0800 Subject: [PATCH 12/16] Update vector-search-index.md --- vector-search-index.md | 138 ++++++++++++++++++++++++++++++++++------- 1 file changed, 114 insertions(+), 24 deletions(-) diff --git a/vector-search-index.md b/vector-search-index.md index 7f4e47a7dc6b0..fa045d8d95893 100644 --- a/vector-search-index.md +++ b/vector-search-index.md @@ -5,22 +5,79 @@ summary: Learn how to build and use the vector search index to accelerate K-Near # Vector Search Index -K-nearest neighbors (KNN) search is the problem of finding the K closest points for a given point in a vector space. The most straightforward approach to solving this problem is a brute force search, where the distance between all points in the vector space and the reference point is computed. This method guarantees perfect accuracy, but it is usually too slow for practical applications. Thus, nearest neighbors search problems are often solved with approximate algorithms. +K-nearest neighbors (KNN) search is the method for finding the K closest points to a given point in a vector space. The most straightforward approach to perform KNN search is a brute force search, which calculates the distance between the given vector and all other vectors in the space. This approach guarantees perfect accuracy, but it is usually too slow for real-world use. Therefore, approximate algorithms are commonly used in KNN search to enhance speed and efficiency. -In TiDB, you can create and utilize vector search indexes for such approximate nearest neighbor (ANN) searches over columns with [vector data types](/vector-search-data-types.md). By using vector search indexes, vector search queries could be finished in milliseconds. +In TiDB, you can create and use vector search indexes for such approximate nearest neighbor (ANN) searches over columns with [vector data types](/vector-search-data-types.md). By using vector search indexes, vector search queries could be finished in milliseconds. -TiDB currently supports the following vector search index algorithms: + -- HNSW +> **Warning:** +> +> The vector search feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. + + > **Note:** > -> Vector search index is only available for [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. +> The vector search feature is only available for TiDB Self-Managed clusters and [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. + +TiDB currently supports the following vector search index algorithm: + +- HNSW + +## Restrictions + +- TiFlash nodes must be deployed in your cluster in advance. +- Vector search indexes cannot be used as primary keys or unique indexes. +- Vector search indexes can only be created on a single vector column and cannot be combined with other columns (such as integers or strings) to form composite indexes. +- A distance function must be specified when creating and using vector search indexes (currently, only cosine distance `VEC_COSINE_DISTANCE()` and L2 distance `VEC_L2_DISTANCE()` functions are supported). +- For the same column, creating multiple vector search indexes using the same distance function is not supported. +- Deleting columns with vector search indexes is not supported. Creating multiple indexes in the same statement is not supported. +- Setting vector search indexes as [invisible](/sql-statements/sql-statement-alter-index.md) is not supported. ## Create the HNSW vector index [HNSW](https://en.wikipedia.org/wiki/Hierarchical_navigable_small_world) is one of the most popular vector indexing algorithms. The HNSW index provides good performance with relatively high accuracy (> 98% in typical cases). + + +In TiDB, you can create an HNSW index for a column with a [vector data type](/vector-search-data-types.md) in either of the following ways: + +- When creating a table, use the following syntax to specify the vector column for the HNSW index: + + ```sql + CREATE TABLE foo ( + id INT PRIMARY KEY, + data VECTOR(5), + data64 VECTOR64(10), + VECTOR INDEX idx_data USING HNSW ((VEC_COSINE_DISTANCE(data))) + ); + ``` + +- For an existing table that already contains a vector column, use the following syntax to create an HNSW index for the vector column: + + ```sql + CREATE VECTOR INDEX idx_name ON foo ((VEC_COSINE_DISTANCE(data))) USING HNSW; + + ALTER TABLE foo ADD VECTOR INDEX idx_name ((VEC_COSINE_DISTANCE(data))) USING HNSW; + ``` + +> **Note:** +> +> The vector index is experimental. The syntax might change before GA. + +When creating an HNSW vector index, you need to specify the distance function for the vector: + +- Cosine Distance: `((VEC_COSINE_DISTANCE(cols_name))) USING HNSW` +- L2 Distance: `((VEC_L2_DISTANCE(cols_name))) USING HNSW` + +The vector index can only be created for fixed-dimensional vector columns like `VECTOR(3)`. It cannot be created for mixed-dimensional vector columns like `VECTOR` because vector distances can only be calculated between vectors with the same dimensions. + +For restrictions and limitations of vector search indexes, see [Restrictions](#restrictions). + + + + To create an HNSW vector index, specify the index definition in the comment of a column with a [vector data type](/vector-search-data-types.md) when creating the table: ```sql @@ -54,9 +111,11 @@ Be aware of the following limitations when creating the vector index. These limi - You can only define and create a vector index when the table is created. You cannot create the vector index on demand using DDL statements after the table is created. You cannot drop the vector index using DDL statements as well. + + ## Use the vector index -The vector search index can be used in K-nearest neighbor search queries by using the `ORDER BY ... LIMIT` form like below: +The vector search index can be used in K-nearest neighbor search queries by using the `ORDER BY ... LIMIT` clause as follows: ```sql SELECT * @@ -67,12 +126,14 @@ LIMIT 10 You must use the same distance metric as you have defined when creating the vector index if you want to utilize the index in vector search. +To use an index in a vector search, make sure that the `ORDER BY ... LIMIT` clause uses the same distance function as the one specified when creating the vector index. + ## Use the vector index with filters -Queries that contain a pre-filter (using the `WHERE` clause) cannot utilize the vector index because they are not querying for K-Nearest neighborss according to the SQL semantics. For example: +Queries that contain a pre-filter (using the `WHERE` clause) cannot utilize the vector index because they are not querying for K-Nearest neighbors according to the SQL semantics. For example: ```sql --- Filter is performed before kNN, so Vector Index cannot be used: +-- For the following query, the `WHERE` filter is performed before KNN, so the vector index cannot be used: SELECT * FROM vec_table WHERE category = "document" @@ -80,12 +141,12 @@ ORDER BY Vec_Cosine_distance(embedding, '[1, 2, 3]') LIMIT 5; ``` -Several workarounds are as follows: +To use the vector index with filters, consider the following workarounds: -**Post-Filter after Vector Search:** Query for the K-Nearest neighbors first, then filter out unwanted results: +**Post-filter after vector search:** Query for the K-Nearest neighbors first, then filter out unwanted results: ```sql --- The filter is performed after kNN for these queries, so Vector Index can be used: +-- For the following query, the `WHERE` filter is performed after KNN, so the vector index cannot be used: SELECT * FROM ( @@ -95,15 +156,15 @@ SELECT * FROM ) t WHERE category = "document"; --- Note that this query may return less than 5 results if some are filtered out. +-- Note that this query might return less than 5 results if some are filtered out. ``` -**Use Table Partitioning**: Queries within the [table partition](/partitioned-table.md) can fully utilize the vector index. This can be useful if you want to perform equality filters, as equality filters can be turned into accessing specified partitions. +**Use table partitioning**: Queries within a table [partition](/partitioned-table.md) can fully utilize the vector index. This can be useful if you want to perform equality filters, as equality filters can be turned into accessing specified partitions. -Example: Suppose you want to find the closest documentation for a specific product version. +For example, suppose you want to find the closest documentation for a specific product version: ```sql --- Filter is performed before kNN, so Vector Index cannot be used: +-- For the following query, the `WHERE` filter is performed before KNN, so the vector index cannot be used: SELECT * FROM docs WHERE ver = "v2.0" ORDER BY Vec_Cosine_distance(embedding, '[1, 2, 3]') @@ -112,6 +173,31 @@ LIMIT 5; Instead of writing a query using the `WHERE` clause, you can partition the table and then query within the partition using the [`PARTITION` keyword](/partitioned-table.md#partition-selection): + + +```sql +CREATE TABLE docs ( + id INT, + ver VARCHAR(10), + doc TEXT, + embedding VECTOR(3), + VECTOR INDEX idx_embedding USING HNSW ((VEC_COSINE_DISTANCE(embedding))) +) PARTITION BY LIST COLUMNS (ver) ( + PARTITION p_v1_0 VALUES IN ('v1.0'), + PARTITION p_v1_1 VALUES IN ('v1.1'), + PARTITION p_v1_2 VALUES IN ('v1.2'), + PARTITION p_v2_0 VALUES IN ('v2.0') +); + +SELECT * FROM docs +PARTITION (p_v2_0) +ORDER BY Vec_Cosine_distance(embedding, '[1, 2, 3]') +LIMIT 5; +``` + + + + ```sql CREATE TABLE docs ( id INT, @@ -131,11 +217,13 @@ ORDER BY Vec_Cosine_distance(embedding, '[1, 2, 3]') LIMIT 5; ``` -See [Table Partitioning](/partitioned-table.md) for more information. + + +For more information, see [Table Partitioning](/partitioned-table.md). ## View index build progress -Unlike other indexes, vector indexes are built asynchronously. Therefore, vector indexes might not be immediately available after bulk data insertion. This does not affect data correctness or consistency, and you can perform vector searches at any time and get complete results. However, performance will be suboptimal until vector indexes are fully built. +Unlike other indexes, vector search indexes are built asynchronously. This means that after bulk data insertion, the vector index might not be immediately available for querying, but it does not affect the accuracy and consistency of the data. You can still perform vector searches at any time and get complete results. However, performance will be suboptimal until vector indexes are fully built. To view the index build progress, you can query the `INFORMATION_SCHEMA.TIFLASH_INDEXES` table as follows: @@ -149,11 +237,11 @@ SELECT * FROM INFORMATION_SCHEMA.TIFLASH_INDEXES; +---------------+------------+----------------+----------+--------------------+-------------+-----------+------------+---------------------+-------------------------+--------------------+------------------------+------------------+ ``` -- The `ROWS_STABLE_INDEXED` and `ROWS_STABLE_NOT_INDEXED` columns show the index build progress. When `ROWS_STABLE_NOT_INDEXED` becomes 0, the index build is complete. +- You can check the `ROWS_STABLE_INDEXED` and `ROWS_STABLE_NOT_INDEXED` columns for the index build progress. When `ROWS_STABLE_NOT_INDEXED` becomes 0, the index build is complete. As a reference, indexing a 500 MiB vector dataset might take up to 20 minutes. The indexer can run in parallel for multiple tables. Currently, adjusting the indexer priority or speed is not supported. -- The `ROWS_DELTA_NOT_INDEXED` column shows the number of rows in the Delta layer. The Delta layer stores _recently_ inserted or updated rows and is periodically merged into the Stable layer according to the write workload. This merge process is called Compaction. +- You can check the `ROWS_DELTA_NOT_INDEXED` column for the number of rows in the Delta layer. The Delta layer stores recently inserted or updated rows and is periodically merged into the Stable layer according to the write workload. This merge process is called Compaction. The Delta layer is always not indexed. To achieve optimal performance, you can force the merge of the Delta layer into the Stable layer so that all data can be indexed: @@ -163,9 +251,11 @@ SELECT * FROM INFORMATION_SCHEMA.TIFLASH_INDEXES; For more information, see [`ALTER TABLE ... COMPACT`](/sql-statements/sql-statement-alter-table-compact.md). +In addition, you can monitor the execution progress of the DDL job by executing `ADMIN SHOW DDL JOBS;` and checking the `row count`. However, this method is not fully accurate, because the `row count` value is obtained from the `rows_stable_indexed` field in `TIFLASH_INDEXES`. This approach can used as a reference for tracking the progress of indexing. + ## Check whether the vector index is used -Use the [`EXPLAIN`](/sql-statements/sql-statement-explain.md) or [`EXPLAIN ANALYZE`](/sql-statements/sql-statement-explain-analyze.md) statement to check whether this query is using the vector index. When `annIndex:` is presented in the `operator info` column for the `TableFullScan` executor, it means this table scan is utilizing the vector index. +Use the [`EXPLAIN`](/sql-statements/sql-statement-explain.md) or [`EXPLAIN ANALYZE`](/sql-statements/sql-statement-explain-analyze.md) statement to check whether a query is using the vector index. When `annIndex:` is presented in the `operator info` column for the `TableFullScan` executor, it means this table scan is utilizing the vector index. **Example: the vector index is used** @@ -210,7 +300,7 @@ LIMIT 10; When the vector index cannot be used, a warning occurs in some cases to help you learn the cause: ```sql --- Using a wrong distance metric: +-- Using a wrong distance function: [tidb]> EXPLAIN SELECT * FROM vector_table_with_index ORDER BY Vec_l2_Distance(embedding, '[1, 2, 3]') LIMIT 10; @@ -229,7 +319,7 @@ ANN index not used: index can be used only when ordering by vec_cosine_distance( ## Analyze vector search performance -The [`EXPLAIN ANALYZE`](/sql-statements/sql-statement-explain-analyze.md) statement contains detailed information about how the vector index is used in the `execution info` column: +To learn detailed information about how a vector index is used, you can execute the [`EXPLAIN ANALYZE`](/sql-statements/sql-statement-explain-analyze.md) statement and check the `execution info` column in the output: ```sql [tidb]> EXPLAIN ANALYZE SELECT * FROM vector_table_with_index @@ -263,8 +353,8 @@ Explanation of some important fields: - `vector_index.load.from_s3`: Number of indexes loaded from S3. - `vector_index.load.from_disk`: Number of indexes loaded from disk. The index was already downloaded from S3 previously. - `vector_index.load.from_cache`: Number of indexes loaded from cache. The index was already downloaded from S3 previously. -- `vector_index.search.total`: The total duration of searching in the index. Large latency usually means the index is cold (never accessed before, or accessed long ago) so that there is heavy IO when searching through the index. This field could be larger than actual query time because multiple vector indexes may be searched in parallel. -- `vector_index.search.discarded_nodes`: Number of vector rows visited but discarded during the search. These discarded vectors are not considered in the search result. Large values usually indicate that there are many stale rows caused by UPDATE or DELETE statements. +- `vector_index.search.total`: The total duration of searching in the index. Large latency usually means the index is cold (never accessed before, or accessed long ago) so that there are heavy I/O operations when searching through the index. This field could be larger than actual query time because multiple vector indexes might be searched in parallel. +- `vector_index.search.discarded_nodes`: Number of vector rows visited but discarded during the search. These discarded vectors are not considered in the search result. Large values usually indicate that there are many stale rows caused by `UPDATE` or `DELETE` statements. See [`EXPLAIN`](/sql-statements/sql-statement-explain.md), [`EXPLAIN ANALYZE`](/sql-statements/sql-statement-explain-analyze.md), and [EXPLAIN Walkthrough](/explain-walkthrough.md) for interpreting the output. From ce061444e9945cab86ece70c58fa18ba88dabade Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Mon, 23 Sep 2024 17:16:26 +0800 Subject: [PATCH 13/16] sync from zh --- vector-search-get-started-using-python.md | 2 +- vector-search-integrate-with-django-orm.md | 70 +++++++++++++--- vector-search-integrate-with-langchain.md | 92 +++++++++++++++++----- vector-search-integrate-with-peewee.md | 66 +++++++++++++--- vector-search-integrate-with-sqlalchemy.md | 63 ++++++++++++--- 5 files changed, 236 insertions(+), 57 deletions(-) diff --git a/vector-search-get-started-using-python.md b/vector-search-get-started-using-python.md index 42378175adb45..76ed822c48926 100644 --- a/vector-search-get-started-using-python.md +++ b/vector-search-get-started-using-python.md @@ -119,7 +119,7 @@ TIDB_DATABASE_URL="mysql+pymysql://:@:/" # For example: TIDB_DATABASE_URL="mysql+pymysql://root@127.0.0.1:4000/test" ``` -If you are running TiDB on your local machine, the `HOST` is `127.0.0.1` by default. The initial `PASSWORD` is empty, so if you are starting the cluster for the first time, you can omit this field. +If you are running TiDB on your local machine, `HOST` is `127.0.0.1` by default. The initial `PASSWORD` is empty, so if you are starting the cluster for the first time, you can omit this field. The following are descriptions for each parameter: diff --git a/vector-search-integrate-with-django-orm.md b/vector-search-integrate-with-django-orm.md index 2069a5bb06487..137297f62a7cf 100644 --- a/vector-search-integrate-with-django-orm.md +++ b/vector-search-integrate-with-django-orm.md @@ -7,30 +7,40 @@ summary: Learn how to integrate TiDB Vector Search with Django ORM to store embe This tutorial walks you through how to use [Django](https://www.djangoproject.com/) ORM to interact with the TiDB Vector Search, store embeddings, and perform vector search queries. - + -> **Note** +> **Warning:** > -> TiDB Vector Search is currently in beta and only available for [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. +> The vector search feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. +> **Note:** +> +> The vector search feature is only available for TiDB Self-Managed clusters and [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. + ## Prerequisites To complete this tutorial, you need: - - - [Python 3.8 or higher](https://www.python.org/downloads/) installed. - [Git](https://git-scm.com/downloads) installed. -- A TiDB cluster. Follow [Deploy a local test TiDB cluster](/quick-start-with-tidb.md#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](/production-deployment-using-tiup.md) to create a local cluster. +- A TiDB cluster. + + + +**If you don't have a TiDB cluster, you can create one as follows:** + +- Follow [Deploy a local test TiDB cluster](/quick-start-with-tidb.md#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](/production-deployment-using-tiup.md) to create a local cluster. +- Follow [Creating a TiDB Cloud Serverless cluster](/develop/dev-guide-build-cluster-in-cloud.md) to create your own TiDB Cloud cluster. -- [Python 3.8 or higher](https://www.python.org/downloads/) installed. -- [Git](https://git-scm.com/downloads) installed. -- A TiDB Cloud Serverless cluster. Follow [creating a TiDB Cloud Serverless cluster](/tidb-cloud/create-tidb-cluster-serverless.md) to create your own TiDB Cloud cluster if you don't have one. +**If you don't have a TiDB cluster, you can create one as follows:** + +- (Recommended) Follow [Creating a TiDB Cloud Serverless cluster](/develop/dev-guide-build-cluster-in-cloud.md) to create your own TiDB Cloud cluster. +- Follow [Deploy a local test TiDB cluster](https://docs.pingcap.com/tidb/stable/quick-start-with-tidb#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](https://docs.pingcap.com/tidb/stable/production-deployment-using-tiup) to create a local cluster. @@ -64,7 +74,7 @@ Install the required dependencies for the demo project: pip install -r requirements.txt ``` -For your existing project, you can install the following packages: +Alternatively, you can install the following packages for your project: ```bash pip install Django django-tidb mysqlclient numpy python-dotenv @@ -74,7 +84,7 @@ If you encounter installation issues with mysqlclient, refer to the mysqlclient #### What is `django-tidb` -`django-tidb` is a TiDB dialect for Django that enhances the Django ORM to support TiDB-specific features (For example, Vector Search) and resolves compatibility issues between TiDB and Django. +`django-tidb` is a TiDB dialect for Django, which enhances the Django ORM to support TiDB-specific features (for example, Vector Search) and resolves compatibility issues between TiDB and Django. To install `django-tidb`, choose a version that matches your Django version. For example, if you are using `django==4.2.*`, install `django-tidb==4.2.*`. The minor version does not need to be the same. It is recommended to use the latest minor version. @@ -82,6 +92,13 @@ For more information, refer to [django-tidb repository](https://github.com/pingc ### Step 4. Configure the environment variables +Configure the environment variables depending on the TiDB deployment option you've selected. + + +
+ +For a TiDB Cloud Serverless cluster, take the following steps to obtain the cluster connection string and configure environment variables: + 1. Navigate to the [**Clusters**](https://tidbcloud.com/console/clusters) page, and then click the name of your target cluster to go to its overview page. 2. Click **Connect** in the upper-right corner. A connection dialog is displayed. @@ -123,6 +140,33 @@ For more information, refer to [django-tidb repository](https://github.com/pingc TIDB_CA_PATH=/etc/ssl/cert.pem ``` +
+
+ +For a TiDB Self-Managed cluster, create a `.env` file in the root directory of your Python project. Copy the following content into the `.env` file, and modify the environment variable values according to the connection parameters of your TiDB cluster: + +```dotenv +TIDB_HOST=127.0.0.1 +TIDB_PORT=4000 +TIDB_USERNAME=root +TIDB_PASSWORD= +TIDB_DATABASE=test +``` + +If you are running TiDB on your local machine, `TIDB_HOST` is `127.0.0.1` by default. The initial `TIDB_PASSWORD` is empty, so if you are starting the cluster for the first time, you can omit this field. + +The following are descriptions for each parameter: + +- `TIDB_HOST`: The host of the TiDB cluster. +- `TIDB_PORT`: The port of the TiDB cluster. +- `TIDB_USERNAME`: The username to connect to the TiDB cluster. +- `TIDB_PASSWORD`: The password to connect to the TiDB cluster. +- `TIDB_DATABASE`: The name of the database you want to connect to. + +
+ +
+ ### Step 5. Run the demo Migrate the database schema: @@ -199,7 +243,7 @@ class Document(models.Model): > **Note** > -> This code snippet is only applicable to [TiDB Serverless](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters. +> This section is only applicable to [TiDB Serverless](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters. Define a 3-dimensional vector column and optimize it with a [vector search index](https://docs.pingcap.com/tidbcloud/vector-search-index) (HNSW index). @@ -225,7 +269,7 @@ Document.objects.create(content="tree", embedding=[1, 0, 0]) ### Search the nearest neighbor documents -TiDB Vector support below distance functions: +TiDB Vector support the following distance functions: - `L1Distance` - `L2Distance` diff --git a/vector-search-integrate-with-langchain.md b/vector-search-integrate-with-langchain.md index a682f733e72d5..16fdaf07822d6 100644 --- a/vector-search-integrate-with-langchain.md +++ b/vector-search-integrate-with-langchain.md @@ -7,14 +7,18 @@ summary: Learn how to integrate Vector Search in TiDB Cloud with LangChain. This tutorial demonstrates how to integrate the [vector search](/vector-search-overview.md) feature in TiDB Cloud with [LangChain](https://python.langchain.com/). - + -> **Note** +> **Warning:** > -> TiDB Vector Search is currently in beta and only available for [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. +> The vector search feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. +> **Note:** +> +> The vector search feature is only available for TiDB Self-Managed clusters and [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. + > **Tip** > > You can view the complete [sample code](https://github.com/langchain-ai/langchain/blob/master/docs/docs/integrations/vectorstores/tidb_vector.ipynb) on Jupyter Notebook, or run the sample code directly in the [Colab](https://colab.research.google.com/github/langchain-ai/langchain/blob/master/docs/docs/integrations/vectorstores/tidb_vector.ipynb) online environment. @@ -23,22 +27,25 @@ This tutorial demonstrates how to integrate the [vector search](/vector-search-o To complete this tutorial, you need: - - - [Python 3.8 or higher](https://www.python.org/downloads/) installed. - [Jupyter Notebook](https://jupyter.org/install) installed. - [Git](https://git-scm.com/downloads) installed. -- A TiDB cluster. Follow [Deploy a local test TiDB cluster](/quick-start-with-tidb.md#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](/production-deployment-using-tiup.md) to create a local cluster. +- A TiDB cluster. + + + +**If you don't have a TiDB cluster, you can create one as follows:** + +- Follow [Deploy a local test TiDB cluster](/quick-start-with-tidb.md#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](/production-deployment-using-tiup.md) to create a local cluster. +- Follow [Creating a TiDB Cloud Serverless cluster](/develop/dev-guide-build-cluster-in-cloud.md) to create your own TiDB Cloud cluster. -- [Python 3.8 or higher](https://www.python.org/downloads/) installed. -- [Jupyter Notebook](https://jupyter.org/install) installed. -- [Git](https://git-scm.com/downloads) installed. -- A TiDB Cloud Serverless cluster. Follow [creating a TiDB Cloud Serverless cluster](/tidb-cloud/create-tidb-cluster-serverless.md) to create your own TiDB Cloud cluster if you don't have one. +**If you don't have a TiDB cluster, you can create one as follows:** - +- (Recommended) Follow [Creating a TiDB Cloud Serverless cluster](/develop/dev-guide-build-cluster-in-cloud.md) to create your own TiDB Cloud cluster. +- Follow [Deploy a local test TiDB cluster](https://docs.pingcap.com/tidb/stable/quick-start-with-tidb#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](https://docs.pingcap.com/tidb/stable/production-deployment-using-tiup) to create a local cluster. ## Get started @@ -63,7 +70,7 @@ In your project directory, run the following command to install the required pac !pip install tidb-vector ``` -Open the `integrate_with_langchain.ipynb` file in Jupyter Notebook and add the following code to import the required packages: +Open the `integrate_with_langchain.ipynb` file in Jupyter Notebook, and then add the following code to import the required packages: ```python from langchain_community.document_loaders import TextLoader @@ -74,7 +81,12 @@ from langchain_text_splitters import CharacterTextSplitter ### Step 3. Set up your environment -#### Step 3.1 Obtain the connection string to the TiDB cluster +Configure the environment variables depending on the TiDB deployment option you've selected. + + +
+ +For a TiDB Cloud Serverless cluster, take the following steps to obtain the cluster connection string and configure environment variables: 1. Navigate to the [**Clusters**](https://tidbcloud.com/console/clusters) page, and then click the name of your target cluster to go to its overview page. @@ -93,14 +105,31 @@ from langchain_text_splitters import CharacterTextSplitter > > If you have not set a password yet, click **Generate Password** to generate a random password. -#### Step 3.2 Configure environment variables +5. Configure environment variables. + + This document uses [OpenAI](https://platform.openai.com/docs/introduction) as the embedding model provider. In this step, you need to provide the connection string obtained from the previous step and your [OpenAI API key](https://platform.openai.com/docs/quickstart/step-2-set-up-your-api-key). + + To configure the environment variables, run the following code. You will be prompted to enter your connection string and OpenAI API key: + + ```python + # Use getpass to securely prompt for environment variables in your terminal. + import getpass + import os -To establish a secure and efficient database connection, use the standard connection method provided by TiDB Cloud. + # Copy your connection string from the TiDB Cloud console. + # Connection string format: "mysql+pymysql://:@:4000/?ssl_ca=/etc/ssl/cert.pem&ssl_verify_cert=true&ssl_verify_identity=true" + tidb_connection_string = getpass.getpass("TiDB Connection String:") + os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:") + ``` -This document uses [OpenAI](https://platform.openai.com/docs/introduction) as the embedding model provider. In this step, you need to provide the connection string obtained from step 3.1 and your [OpenAI API key](https://platform.openai.com/docs/quickstart/step-2-set-up-your-api-key). +
+
+ +This document uses [OpenAI](https://platform.openai.com/docs/introduction) as the embedding model provider. In this step, you need to provide the connection string obtained from the previous step and your [OpenAI API key](https://platform.openai.com/docs/quickstart/step-2-set-up-your-api-key). To configure the environment variables, run the following code. You will be prompted to enter your connection string and OpenAI API key: + ```python # Use getpass to securely prompt for environment variables in your terminal. import getpass @@ -112,6 +141,27 @@ tidb_connection_string = getpass.getpass("TiDB Connection String:") os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:") ``` +Taking macOS as an example, the cluster connection string is as follows: + +```dotenv +TIDB_DATABASE_URL="mysql+pymysql://:@:/" +# 例如:TIDB_DATABASE_URL="mysql+pymysql://root@127.0.0.1:4000/test" +``` + +You need to modify the values of the connection parameters according to your TiDB cluster. If you are running TiDB on your local machine, `HOST` is `127.0.0.1` by default. The initial `PASSWORD` is empty, so if you are starting the cluster for the first time, you can omit this field. + +The following are descriptions for each parameter: + +- `HOST`: The host of the TiDB cluster. +- `PORT`: The port of the TiDB cluster. +- `USER`: The username to connect to the TiDB cluster. +- `PASSWORD`: The password to connect to the TiDB cluster. +- `DATABASE`: The name of the database you want to connect to. + +
+ +
+ ### Step 4. Load the sample document #### Step 4.1 Download the sample document @@ -381,7 +431,7 @@ The following metadata filters can match this document: } ``` -Each key-value pair in the metadata filters is treated as a separate filter clause, and these clauses are combined using the `AND` logical operator. +In a metadata filter, TiDB treats each key-value pair as a separate filter clause and combines these clauses using the `AND` logical operator. ### Example @@ -431,19 +481,19 @@ TiDB Vector offers advanced, high-speed vector processing capabilities, enhancin ## Advanced usage example: travel agent -This section demonstrates an advanced use case of integrating vector search with Langchain for a travel agent. The goal is to create personalized travel reports for clients seeking airports with specific amenities, such as clean lounges and vegetarian options. +This section demonstrates a use case of integrating vector search with Langchain for a travel agent. The goal is to create personalized travel reports for clients, helping them find airports with specific amenities, such as clean lounges and vegetarian options. The process involves two main steps: 1. Perform a semantic search across airport reviews to identify airport codes that match the desired amenities. -2. Execute an SQL query to merge these codes with route information, highlighting airlines and destinations that align with user's preferences. +2. Execute a SQL query to merge these codes with route information, highlighting airlines and destinations that align with user's preferences. ### Prepare data First, create a table to store airport route data: ```python -# Create table to store airplan data. +# Create a table to store flight plan data. vector_store.tidb_vector_client.execute( """CREATE TABLE airplan_routes ( id INT AUTO_INCREMENT PRIMARY KEY, @@ -584,7 +634,7 @@ The expected output is as follows: (0.19840519342700513, 3, 'EFGH', 'UA', 'SEA', 'Daily flights from SFO to SEA.', datetime.timedelta(seconds=9000), 7, 'Boeing 737', Decimal('129.99'), 'None', 'Small airport with basic facilities.')] ``` -### Clean up +### Clean up data Finally, clean up the resources by dropping the created table: diff --git a/vector-search-integrate-with-peewee.md b/vector-search-integrate-with-peewee.md index b2ed15394fc06..54cafbb55d32b 100644 --- a/vector-search-integrate-with-peewee.md +++ b/vector-search-integrate-with-peewee.md @@ -7,30 +7,40 @@ summary: Learn how to integrate TiDB Vector Search with peewee to store embeddin This tutorial walks you through how to use [peewee](https://docs.peewee-orm.com/) to interact with the [TiDB Vector Search](/vector-search-overview.md), store embeddings, and perform vector search queries. - + -> **Note** +> **Warning:** > -> TiDB Vector Search is currently in beta and only available for [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. +> The vector search feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. +> **Note:** +> +> The vector search feature is only available for TiDB Self-Managed clusters and [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. + ## Prerequisites To complete this tutorial, you need: - - - [Python 3.8 or higher](https://www.python.org/downloads/) installed. - [Git](https://git-scm.com/downloads) installed. -- A TiDB cluster. Follow [Deploy a local test TiDB cluster](/quick-start-with-tidb.md#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](/production-deployment-using-tiup.md) to create a local cluster. +- A TiDB cluster. + + + +**If you don't have a TiDB cluster, you can create one as follows:** + +- Follow [Deploy a local test TiDB cluster](/quick-start-with-tidb.md#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](/production-deployment-using-tiup.md) to create a local cluster. +- Follow [Creating a TiDB Cloud Serverless cluster](/develop/dev-guide-build-cluster-in-cloud.md) to create your own TiDB Cloud cluster. -- [Python 3.8 or higher](https://www.python.org/downloads/) installed. -- [Git](https://git-scm.com/downloads) installed. -- A TiDB Cloud Serverless cluster. Follow [creating a TiDB Cloud Serverless cluster](/tidb-cloud/create-tidb-cluster-serverless.md) to create your own TiDB Cloud cluster if you don't have one. +**If you don't have a TiDB cluster, you can create one as follows:** + +- (Recommended) Follow [Creating a TiDB Cloud Serverless cluster](/develop/dev-guide-build-cluster-in-cloud.md) to create your own TiDB Cloud cluster. +- Follow [Deploy a local test TiDB cluster](https://docs.pingcap.com/tidb/stable/quick-start-with-tidb#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](https://docs.pingcap.com/tidb/stable/production-deployment-using-tiup) to create a local cluster. @@ -64,7 +74,7 @@ Install the required dependencies for the demo project: pip install -r requirements.txt ``` -For your existing project, you can install the following packages: +Alternatively, you can install the following packages for your project: ```bash pip install peewee pymysql python-dotenv tidb-vector @@ -72,6 +82,13 @@ pip install peewee pymysql python-dotenv tidb-vector ### Step 4. Configure the environment variables +Configure the environment variables depending on the TiDB deployment option you've selected. + + +
+ +For a TiDB Cloud Serverless cluster, take the following steps to obtain the cluster connection string and configure environment variables: + 1. Navigate to the [**Clusters**](https://tidbcloud.com/console/clusters) page, and then click the name of your target cluster to go to its overview page. 2. Click **Connect** in the upper-right corner. A connection dialog is displayed. @@ -113,6 +130,33 @@ pip install peewee pymysql python-dotenv tidb-vector TIDB_CA_PATH=/etc/ssl/cert.pem ``` +
+
+ +For a TiDB Self-Managed cluster, create a `.env` file in the root directory of your Python project. Copy the following content into the `.env` file, and modify the environment variable values according to the connection parameters of your TiDB cluster: + +```dotenv +TIDB_HOST=127.0.0.1 +TIDB_PORT=4000 +TIDB_USERNAME=root +TIDB_PASSWORD= +TIDB_DATABASE=test +``` + +If you are running TiDB on your local machine, `TIDB_HOST` is `127.0.0.1` by default. The initial `TIDB_PASSWORD` is empty, so if you are starting the cluster for the first time, you can omit this field. + +The following are descriptions for each parameter: + +- `TIDB_HOST`: The host of the TiDB cluster. +- `TIDB_PORT`: The port of the TiDB cluster. +- `TIDB_USERNAME`: The username to connect to the TiDB cluster. +- `TIDB_PASSWORD`: The password to connect to the TiDB cluster. +- `TIDB_DATABASE`: The name of the database you want to connect to. + +
+ +
+ ### Step 5. Run the demo ```bash @@ -197,7 +241,7 @@ class Document(Model): > **Note** > -> This code snippet is only applicable to [TiDB Serverless](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters. +> This section is only applicable to [TiDB Serverless](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters. Define a 3-dimensional vector column and optimize it with a [vector search index](https://docs.pingcap.com/tidbcloud/vector-search-index) (HNSW index). diff --git a/vector-search-integrate-with-sqlalchemy.md b/vector-search-integrate-with-sqlalchemy.md index f326b01e8935b..657169bb9c507 100644 --- a/vector-search-integrate-with-sqlalchemy.md +++ b/vector-search-integrate-with-sqlalchemy.md @@ -7,30 +7,40 @@ summary: Learn how to integrate TiDB Vector Search with SQLAlchemy to store embe This tutorial walks you through how to use [SQLAlchemy](https://www.sqlalchemy.org/) to interact with [TiDB Vector Search](/vector-search-overview.md), store embeddings, and perform vector search queries. - + -> **Note** +> **Warning:** > -> TiDB Vector Search is currently in beta and only available for [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. +> The vector search feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. +> **Note:** +> +> The vector search feature is only available for TiDB Self-Managed clusters and [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. + ## Prerequisites To complete this tutorial, you need: - - - [Python 3.8 or higher](https://www.python.org/downloads/) installed. - [Git](https://git-scm.com/downloads) installed. -- A TiDB cluster. Follow [Deploy a local test TiDB cluster](/quick-start-with-tidb.md#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](/production-deployment-using-tiup.md) to create a local cluster. +- A TiDB cluster. + + + +**If you don't have a TiDB cluster, you can create one as follows:** + +- Follow [Deploy a local test TiDB cluster](/quick-start-with-tidb.md#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](/production-deployment-using-tiup.md) to create a local cluster. +- Follow [Creating a TiDB Cloud Serverless cluster](/develop/dev-guide-build-cluster-in-cloud.md) to create your own TiDB Cloud cluster. -- [Python 3.8 or higher](https://www.python.org/downloads/) installed. -- [Git](https://git-scm.com/downloads) installed. -- A TiDB Cloud Serverless cluster. Follow [creating a TiDB Cloud Serverless cluster](/tidb-cloud/create-tidb-cluster-serverless.md) to create your own TiDB Cloud cluster if you don't have one. +**If you don't have a TiDB cluster, you can create one as follows:** + +- (Recommended) Follow [Creating a TiDB Cloud Serverless cluster](/develop/dev-guide-build-cluster-in-cloud.md) to create your own TiDB Cloud cluster. +- Follow [Deploy a local test TiDB cluster](https://docs.pingcap.com/tidb/stable/quick-start-with-tidb#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](https://docs.pingcap.com/tidb/stable/production-deployment-using-tiup) to create a local cluster. @@ -64,7 +74,7 @@ Install the required dependencies for the demo project: pip install -r requirements.txt ``` -For your existing project, you can install the following packages: +Alternatively, you can install the following packages for your project: ```bash pip install pymysql python-dotenv sqlalchemy tidb-vector @@ -72,6 +82,13 @@ pip install pymysql python-dotenv sqlalchemy tidb-vector ### Step 4. Configure the environment variables +Configure the environment variables depending on the TiDB deployment option you've selected. + + +
+ +For a TiDB Cloud Serverless cluster, take the following steps to obtain the cluster connection string and configure environment variables: + 1. Navigate to the [**Clusters**](https://tidbcloud.com/console/clusters) page, and then click the name of your target cluster to go to its overview page. 2. Click **Connect** in the upper-right corner. A connection dialog is displayed. @@ -101,6 +118,30 @@ pip install pymysql python-dotenv sqlalchemy tidb-vector TIDB_DATABASE_URL="mysql+pymysql://.root:@gateway01..prod.aws.tidbcloud.com:4000/test?ssl_ca=/etc/ssl/cert.pem&ssl_verify_cert=true&ssl_verify_identity=true" ``` +
+
+ +For a TiDB Self-Managed cluster, create a `.env` file in the root directory of your Python project. Copy the following content into the `.env` file, and modify the environment variable values according to the connection parameters of your TiDB cluster: + +```dotenv +TIDB_DATABASE_URL="mysql+pymysql://:@:/" +# For example: TIDB_DATABASE_URL="mysql+pymysql://root@127.0.0.1:4000/test" +``` + +If you are running TiDB on your local machine, `HOST` is `127.0.0.1` by default. The initial `PASSWORD` is empty, so if you are starting the cluster for the first time, you can omit this field. + +The following are descriptions for each parameter: + +- ``: The host of the TiDB cluster. +- ``: The port of the TiDB cluster. +- ``: The username to connect to the TiDB cluster. +- ``: The password to connect to the TiDB cluster. +- ``: The name of the database you want to connect to. + +
+ +
+ ### Step 5. Run the demo ```bash @@ -164,7 +205,7 @@ class Document(Base): > **Note** > -> This code snippet is only applicable to [TiDB Serverless](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters. +> This section is only applicable to [TiDB Serverless](https://docs.pingcap.com/tidbcloud/select-cluster-tier#tidb-serverless) clusters. Define a 3-dimensional vector column and optimize it with a [vector search index](https://docs.pingcap.com/tidbcloud/vector-search-index) (HNSW index). From 93fb40ae64aef9eea1aeac86f0fd707666a773c0 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Tue, 24 Sep 2024 11:21:26 +0800 Subject: [PATCH 14/16] syn from zh --- ...-search-integrate-with-jinaai-embedding.md | 92 +++++++++++++------ vector-search-integrate-with-langchain.md | 14 +-- vector-search-integrate-with-llamaindex.md | 89 +++++++++++++----- vector-search-integration-overview.md | 10 +- vector-search-limitations.md | 28 ++++-- 5 files changed, 168 insertions(+), 65 deletions(-) diff --git a/vector-search-integrate-with-jinaai-embedding.md b/vector-search-integrate-with-jinaai-embedding.md index 89b04b02d9e28..0e90b997594da 100644 --- a/vector-search-integrate-with-jinaai-embedding.md +++ b/vector-search-integrate-with-jinaai-embedding.md @@ -5,32 +5,42 @@ summary: Learn how to integrate TiDB Vector Search with Jina AI Embeddings API t # Integrate TiDB Vector Search with Jina AI Embeddings API -This tutorial walks you through how to use [Jina AI](https://jina.ai/) to generate embeddings for text data, and then store the embeddings in TiDB Vector Storage and search similar texts based on embeddings. +This tutorial walks you through how to use [Jina AI](https://jina.ai/) to generate embeddings for text data, and then store the embeddings in TiDB vector storage and search similar texts based on embeddings. - + -> **Note** +> **Warning:** > -> TiDB Vector Search is currently in beta and only available for [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. +> The vector search feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. +> **Note:** +> +> The vector search feature is only available for TiDB Self-Managed clusters and [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. + ## Prerequisites To complete this tutorial, you need: - - - [Python 3.8 or higher](https://www.python.org/downloads/) installed. - [Git](https://git-scm.com/downloads) installed. -- A TiDB cluster. Follow [Deploy a local test TiDB cluster](/quick-start-with-tidb.md#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](/production-deployment-using-tiup.md) to create a local cluster. +- A TiDB cluster. + + + +**If you don't have a TiDB cluster, you can create one as follows:** + +- Follow [Deploy a local test TiDB cluster](/quick-start-with-tidb.md#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](/production-deployment-using-tiup.md) to create a local cluster. +- Follow [Creating a TiDB Cloud Serverless cluster](/develop/dev-guide-build-cluster-in-cloud.md) to create your own TiDB Cloud cluster. -- [Python 3.8 or higher](https://www.python.org/downloads/) installed. -- [Git](https://git-scm.com/downloads) installed. -- A TiDB Cloud Serverless cluster. Follow [creating a TiDB Cloud Serverless cluster](/tidb-cloud/create-tidb-cluster-serverless.md) to create your own TiDB Cloud cluster if you don't have one. +**If you don't have a TiDB cluster, you can create one as follows:** + +- (Recommended) Follow [Creating a TiDB Cloud Serverless cluster](/develop/dev-guide-build-cluster-in-cloud.md) to create your own TiDB Cloud cluster. +- Follow [Deploy a local test TiDB cluster](https://docs.pingcap.com/tidb/stable/quick-start-with-tidb#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](https://docs.pingcap.com/tidb/stable/production-deployment-using-tiup) to create a local cluster. @@ -66,11 +76,12 @@ pip install -r requirements.txt ### Step 4. Configure the environment variables -#### 4.1 Get the Jina AI API key +Get the Jina AI API key from the [Jina AI Embeddings API](https://jina.ai/embeddings/) page, and then configure the environment variables depending on the TiDB deployment option you've selected. -Get the Jina AI API key from the [Jina AI Embeddings API](https://jina.ai/embeddings/) page. + +
-#### 4.2 Get the TiDB connection parameters +For a TiDB Cloud Serverless cluster, take the following steps to obtain the cluster connection string and configure environment variables: 1. Navigate to the [**Clusters**](https://tidbcloud.com/console/clusters) page, and then click the name of your target cluster to go to its overview page. @@ -93,21 +104,44 @@ Get the Jina AI API key from the [Jina AI Embeddings API](https://jina.ai/embedd > > If you have not set a password yet, click **Create password** to generate a random password. -#### 4.3 Set the environment variables +5. Set the Jina AI API key and the TiDB connection string as environment variables in your terminal, or create a `.env` file with the following environment variables: -Set the environment variables in your terminal, or create a `.env` file with the above environment variables. + ```dotenv + JINAAI_API_KEY="****" + TIDB_DATABASE_URL="{tidb_connection_string}" + ``` -```dotenv -JINAAI_API_KEY="****" -TIDB_DATABASE_URL="{tidb_connection_string}" -``` + The following is an example connection string for macOS: -For example, the connection string on macOS looks like: + ```dotenv + TIDB_DATABASE_URL="mysql+pymysql://.root:@gateway01..prod.aws.tidbcloud.com:4000/test?ssl_ca=/etc/ssl/cert.pem&ssl_verify_cert=true&ssl_verify_identity=true" + ``` -```dotenv -TIDB_DATABASE_URL="mysql+pymysql://.root:@gateway01..prod.aws.tidbcloud.com:4000/test?ssl_ca=/etc/ssl/cert.pem&ssl_verify_cert=true&ssl_verify_identity=true" +
+
+ +For a TiDB Self-Managed cluster, set the environment variables for connecting to your TiDB cluster in your terminal as follows: + +```shell +export JINA_API_KEY="****" +export TIDB_DATABASE_URL="mysql+pymysql://:@:/" +# For example: export TIDB_DATABASE_URL="mysql+pymysql://root@127.0.0.1:4000/test" ``` +You need to replace parameters in the preceding command according to your TiDB cluster. If you are running TiDB on your local machine, `HOST` is `127.0.0.1` by default. The initial `PASSWORD` is empty, so if you are starting the cluster for the first time, you can omit this field. + +The following are descriptions for each parameter: + +- ``: The host of the TiDB cluster. +- ``: The port of the TiDB cluster. +- ``: The username to connect to the TiDB cluster. +- ``: The password to connect to the TiDB cluster. +- ``: The name of the database you want to connect to. + +
+ +
+ ### Step 5. Run the demo ```bash @@ -132,7 +166,7 @@ Example output: ## Sample code snippets -### Get Embeddings from Jina AI +### Get embeddings from Jina AI Define a `generate_embeddings` helper function to call Jina AI embeddings API: @@ -159,9 +193,9 @@ def generate_embeddings(text: str): return response.json()['data'][0]['embedding'] ``` -### Connect to TiDB Cloud Serverless +### Connect to the TiDB cluster -Connect to TiDB Cloud Serverless through SQLAlchemy: +Connect to the TiDB cluster through SQLAlchemy: ```python import os @@ -204,7 +238,7 @@ class Document(Base): > - The dimension of the vector column must match the dimension of the embeddings generated by the embedding model. > - In this example, the dimension of embeddings generated by the `jina-embeddings-v2-base-en` model is `768`. -### Create embeddings with Jina AI embeddings and TiDB +### Create embeddings with Jina AI and store in TiDB Use the Jina AI Embeddings API to generate embeddings for each piece of text and store the embeddings in TiDB: @@ -234,13 +268,13 @@ with Session(engine) as session: session.commit() ``` -### Perform semantic search with Jina AI embeddings and TiDB +### Perform semantic search with Jina AI embeddings in TiDB -Generate embeddings for the query text via Jina AI embeddings API, and then search for the most relevant document based on the cosine distance between the query embedding and the document embeddings: +Generate the embedding for the query text via Jina AI embeddings API, and then search for the most relevant document based on the cosine distance between **the embedding of the query text** and **each embedding in the vector table**: ```python query = 'What is TiDB?' -# Generate embeddings for the query via Jina AI API. +# Generate the embedding for the query via Jina AI API. query_embedding = generate_embeddings(query) with Session(engine) as session: diff --git a/vector-search-integrate-with-langchain.md b/vector-search-integrate-with-langchain.md index 16fdaf07822d6..935a219e96eeb 100644 --- a/vector-search-integrate-with-langchain.md +++ b/vector-search-integrate-with-langchain.md @@ -47,6 +47,8 @@ To complete this tutorial, you need: - (Recommended) Follow [Creating a TiDB Cloud Serverless cluster](/develop/dev-guide-build-cluster-in-cloud.md) to create your own TiDB Cloud cluster. - Follow [Deploy a local test TiDB cluster](https://docs.pingcap.com/tidb/stable/quick-start-with-tidb#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](https://docs.pingcap.com/tidb/stable/production-deployment-using-tiup) to create a local cluster. +
+ ## Get started This section provides step-by-step instructions for integrating TiDB Vector Search with LangChain to perform semantic searches. @@ -145,18 +147,18 @@ Taking macOS as an example, the cluster connection string is as follows: ```dotenv TIDB_DATABASE_URL="mysql+pymysql://:@:/" -# 例如:TIDB_DATABASE_URL="mysql+pymysql://root@127.0.0.1:4000/test" +# For example: TIDB_DATABASE_URL="mysql+pymysql://root@127.0.0.1:4000/test" ``` You need to modify the values of the connection parameters according to your TiDB cluster. If you are running TiDB on your local machine, `HOST` is `127.0.0.1` by default. The initial `PASSWORD` is empty, so if you are starting the cluster for the first time, you can omit this field. The following are descriptions for each parameter: -- `HOST`: The host of the TiDB cluster. -- `PORT`: The port of the TiDB cluster. -- `USER`: The username to connect to the TiDB cluster. -- `PASSWORD`: The password to connect to the TiDB cluster. -- `DATABASE`: The name of the database you want to connect to. +- ``: The host of the TiDB cluster. +- ``: The port of the TiDB cluster. +- ``: The username to connect to the TiDB cluster. +- ``: The password to connect to the TiDB cluster. +- ``: The name of the database you want to connect to. diff --git a/vector-search-integrate-with-llamaindex.md b/vector-search-integrate-with-llamaindex.md index b6f7e2c05ef10..513cbed749132 100644 --- a/vector-search-integrate-with-llamaindex.md +++ b/vector-search-integrate-with-llamaindex.md @@ -1,20 +1,24 @@ --- title: Integrate Vector Search with LlamaIndex -summary: Learn how to integrate Vector Search in TiDB Cloud with LlamaIndex. +summary: Learn how to integrate TiDB Vector Search with LlamaIndex. --- # Integrate Vector Search with LlamaIndex -This tutorial demonstrates how to integrate the [vector search](/vector-search-overview.md) feature in TiDB Cloud with [LlamaIndex](https://www.llamaindex.ai). +This tutorial demonstrates how to integrate the [vector search](/vector-search-overview.md) feature of TiDB with [LlamaIndex](https://www.llamaindex.ai). - + -> **Note** +> **Warning:** > -> TiDB Vector Search is currently in beta and only available for [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. +> The vector search feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. +> **Note:** +> +> The vector search feature is only available for TiDB Self-Managed clusters and [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. + > **Tip** > > You can view the complete [sample code](https://github.com/run-llama/llama_index/blob/main/docs/docs/examples/vector_stores/TiDBVector.ipynb) on Jupyter Notebook, or run the sample code directly in the [Colab](https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/vector_stores/TiDBVector.ipynb) online environment. @@ -23,20 +27,25 @@ This tutorial demonstrates how to integrate the [vector search](/vector-search-o To complete this tutorial, you need: - - - [Python 3.8 or higher](https://www.python.org/downloads/) installed. - [Jupyter Notebook](https://jupyter.org/install) installed. - [Git](https://git-scm.com/downloads) installed. -- A TiDB cluster. Follow [Deploy a local test TiDB cluster](/quick-start-with-tidb.md#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](/production-deployment-using-tiup.md) to create a local cluster. +- A TiDB cluster. + + + +**If you don't have a TiDB cluster, you can create one as follows:** + +- Follow [Deploy a local test TiDB cluster](/quick-start-with-tidb.md#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](/production-deployment-using-tiup.md) to create a local cluster. +- Follow [Creating a TiDB Cloud Serverless cluster](/develop/dev-guide-build-cluster-in-cloud.md) to create your own TiDB Cloud cluster. -- [Python 3.8 or higher](https://www.python.org/downloads/) installed. -- [Jupyter Notebook](https://jupyter.org/install) installed. -- [Git](https://git-scm.com/downloads) installed. -- A TiDB Cloud Serverless cluster. Follow [creating a TiDB Cloud Serverless cluster](/tidb-cloud/create-tidb-cluster-serverless.md) to create your own TiDB Cloud cluster if you don't have one. +**If you don't have a TiDB cluster, you can create one as follows:** + +- (Recommended) Follow [Creating a TiDB Cloud Serverless cluster](/develop/dev-guide-build-cluster-in-cloud.md) to create your own TiDB Cloud cluster. +- Follow [Deploy a local test TiDB cluster](https://docs.pingcap.com/tidb/stable/quick-start-with-tidb#deploy-a-local-test-cluster) or [Deploy a production TiDB cluster](https://docs.pingcap.com/tidb/stable/production-deployment-using-tiup) to create a local cluster. @@ -46,7 +55,7 @@ This section provides step-by-step instructions for integrating TiDB Vector Sear ### Step 1. Create a new Jupyter Notebook file -In your preferred directory, create a new Jupyter Notebook file named `integrate_with_llamaindex.ipynb`: +In the root directory, create a new Jupyter Notebook file named `integrate_with_llamaindex.ipynb`: ```shell touch integrate_with_llamaindex.ipynb @@ -71,9 +80,9 @@ from llama_index.core import VectorStoreIndex from llama_index.vector_stores.tidbvector import TiDBVectorStore ``` -### Step 3. Set up your environment +### Step 3. Configure environment variables + -#### Step 3.1 Obtain the connection string to the TiDB cluster 1. Navigate to the [**Clusters**](https://tidbcloud.com/console/clusters) page, and then click the name of your target cluster to go to its overview page. @@ -92,24 +101,62 @@ from llama_index.vector_stores.tidbvector import TiDBVectorStore > > If you have not set a password yet, click **Generate Password** to generate a random password. -#### Step 3.2 Configure environment variables +5. Configure environment variables. + + This document uses [OpenAI](https://platform.openai.com/docs/introduction) as the embedding model provider. In this step, you need to provide the connection string obtained from from the previous step and your [OpenAI API key](https://platform.openai.com/docs/quickstart/step-2-set-up-your-api-key). + + To configure the environment variables, run the following code. You will be prompted to enter your connection string and OpenAI API key: + + ```python + # Use getpass to securely prompt for environment variables in your terminal. + import getpass + import os -To establish a secure and efficient database connection, use the standard connection method provided by TiDB Cloud. + # Copy your connection string from the TiDB Cloud console. + # Connection string format: "mysql+pymysql://:@:4000/?ssl_ca=/etc/ssl/cert.pem&ssl_verify_cert=true&ssl_verify_identity=true" + tidb_connection_string = getpass.getpass("TiDB Connection String:") + os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:") + ``` -This document uses [OpenAI](https://platform.openai.com/docs/introduction) as the embedding model provider. In this step, you need to provide the connection string obtained from step 3.1 and your [OpenAI API key](https://platform.openai.com/docs/quickstart/step-2-set-up-your-api-key). + +
+ +This document uses [OpenAI](https://platform.openai.com/docs/introduction) as the embedding model provider. In this step, you need to provide the connection string of your TiDB cluster and your [OpenAI API key](https://platform.openai.com/docs/quickstart/step-2-set-up-your-api-key). To configure the environment variables, run the following code. You will be prompted to enter your connection string and OpenAI API key: ```python +# Use getpass to securely prompt for environment variables in your terminal. import getpass import os -tidb_connection_url = getpass.getpass( - "TiDB connection URL (format - mysql+pymysql://root@127.0.0.1:4000/test): " -) +# Copy your connection string from the TiDB Cloud console. +# Connection string format: "mysql+pymysql://:@:4000/?ssl_ca=/etc/ssl/cert.pem&ssl_verify_cert=true&ssl_verify_identity=true" +tidb_connection_string = getpass.getpass("TiDB Connection String:") os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:") ``` +Taking macOS as an example, the cluster connection string is as follows: + +```dotenv +TIDB_DATABASE_URL="mysql+pymysql://:@:/" +# For example: TIDB_DATABASE_URL="mysql+pymysql://root@127.0.0.1:4000/test" +``` + +You need to modify the parameters in the connection string according to your TiDB cluster. If you are running TiDB on your local machine, `HOST` is `127.0.0.1` by default. The initial `PASSWORD` is empty, so if you are starting the cluster for the first time, you can omit this field. + +The following are descriptions for each parameter: + +- ``: The host of the TiDB cluster. +- ``: The port of the TiDB cluster. +- ``: The username to connect to the TiDB cluster. +- ``: The password to connect to the TiDB cluster. +- ``: The name of the database you want to connect to. + +
+ + + ### Step 4. Load the sample document #### Step 4.1 Download the sample document diff --git a/vector-search-integration-overview.md b/vector-search-integration-overview.md index ff6497a33ec6a..7efccf2d8e648 100644 --- a/vector-search-integration-overview.md +++ b/vector-search-integration-overview.md @@ -7,14 +7,18 @@ summary: An overview of TiDB vector search integration, including supported AI f This document provides an overview of TiDB vector search integration, including supported AI frameworks, embedding models, and Object Relational Mapping (ORM) libraries. - + -> **Note** +> **Warning:** > -> TiDB Vector Search is currently in beta and is only available for [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. +> The vector search feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. +> **Note:** +> +> The vector search feature is only available for TiDB Self-Managed clusters and [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. + ## AI frameworks TiDB provides official support for the following AI frameworks, enabling you to easily integrate AI applications developed based on these frameworks with TiDB Vector Search. diff --git a/vector-search-limitations.md b/vector-search-limitations.md index 2ff13f3df95fe..572c108a3095c 100644 --- a/vector-search-limitations.md +++ b/vector-search-limitations.md @@ -7,20 +7,36 @@ summary: Learn the limitations of the TiDB Vector Search. This document describes the known limitations of TiDB Vector Search. We are continuously working to enhance your experience by adding more features. -- TiDB Vector Search is only available for the following clusters. It is not available for TiDB Dedicated clusters. + - - [TiDB Serverless](/tidb-cloud/select-cluster-tier.md#tidb-serverless) clusters - - TiDB Self-Hosted clusters with TiDB versions of 8.4.0 or later +> **Warning:** +> +> The vector search feature is experimental. It is not recommended that you use it in the production environment. This feature might be changed or removed without prior notice. If you find a bug, you can report an [issue](https://github.com/pingcap/tidb/issues) on GitHub. -- Each [vector](/vector-search-data-types.md) supports up to 16,000 dimensions. + -- Vector data supports only single-precision floating-point numbers (Float32). +> **Note:** +> +> The vector search feature is only available for TiDB Self-Managed clusters and [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. -- Only cosine distance and L2 distance are supported when you create a [vector search index](/vector-search-index.md). +- Each [vector](/vector-search-data-types.md) supports up to 16,000 dimensions. +- You cannot store `NaN`, `Infinity`, or `-Infinity` values in the vector data type. +- Only cosine distance and L2 distance (Euclidean distance) are supported when you create a [vector search index](/vector-search-index.md). +- Vector data types cannot store double-precision floating-point numbers (this is planned to be supported in a future release). If you insert or store double-precision floating-point numbers in Vector columns, they are converted to single-precision floating-point numbers. ## Feedback We value your feedback and are always here to help: + + +- [Join our Discord](https://discord.gg/zcqexutz2R) + + + + + - [Join our Discord](https://discord.gg/zcqexutz2R) - [Visit our Support Portal](https://tidb.support.pingcap.com/) + + \ No newline at end of file From aff12a5dad16f1523547a7b034731663e0534223 Mon Sep 17 00:00:00 2001 From: Grace Cai Date: Tue, 24 Sep 2024 11:55:20 +0800 Subject: [PATCH 15/16] update the dimension limit to 16383 --- vector-search-integration-overview.md | 2 +- vector-search-limitations.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/vector-search-integration-overview.md b/vector-search-integration-overview.md index 7efccf2d8e648..da17d33ab73f4 100644 --- a/vector-search-integration-overview.md +++ b/vector-search-integration-overview.md @@ -32,7 +32,7 @@ Moreover, you can also use TiDB for various purposes, such as document storage a ## Embedding models and services -TiDB Vector Search supports storing vectors of up to 16,000 dimensions, which accommodates most embedding models. +TiDB Vector Search supports storing vectors of up to 16383 dimensions, which accommodates most embedding models. You can either use self-deployed open-source embedding models or third-party embedding APIs provided by third-party embedding providers to generate vectors. diff --git a/vector-search-limitations.md b/vector-search-limitations.md index 572c108a3095c..31e0195aa57e3 100644 --- a/vector-search-limitations.md +++ b/vector-search-limitations.md @@ -19,7 +19,7 @@ This document describes the known limitations of TiDB Vector Search. We are cont > > The vector search feature is only available for TiDB Self-Managed clusters and [TiDB Cloud Serverless](/tidb-cloud/select-cluster-tier.md#tidb-cloud-serverless) clusters. -- Each [vector](/vector-search-data-types.md) supports up to 16,000 dimensions. +- Each [vector](/vector-search-data-types.md) supports up to 16383 dimensions. - You cannot store `NaN`, `Infinity`, or `-Infinity` values in the vector data type. - Only cosine distance and L2 distance (Euclidean distance) are supported when you create a [vector search index](/vector-search-index.md). - Vector data types cannot store double-precision floating-point numbers (this is planned to be supported in a future release). If you insert or store double-precision floating-point numbers in Vector columns, they are converted to single-precision floating-point numbers. From 4aa3cafb101346f33aa656fad312621454d769f6 Mon Sep 17 00:00:00 2001 From: JaySon-Huang Date: Fri, 27 Sep 2024 17:31:46 +0800 Subject: [PATCH 16/16] Update desc about tiflash upgrade Signed-off-by: JaySon-Huang --- tiflash-upgrade-guide.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tiflash-upgrade-guide.md b/tiflash-upgrade-guide.md index f36051c145166..94afab69c1876 100644 --- a/tiflash-upgrade-guide.md +++ b/tiflash-upgrade-guide.md @@ -124,6 +124,10 @@ After upgrading TiFlash to v7.3 and configuring TiFlash to use V3 DTFiles, if yo Starting from v7.4, to reduce the read and write amplification generated during data compaction, TiFlash optimizes the data compaction logic of PageStorage V3, which leads to changes to some of the underlying storage file names. Therefore, after the upgrade to v7.4 or a later version, in-place downgrading to the original version is not supported. +## From v7.x to v8.4 or a later version + +Starting from v8.4, the underlying storage format of TiFlash has been updated to support the [vector search](/vector-search-overview.md). Therefore, after the upgrade TiFlash to v8.4 or a later version, in-place downgrading to the original version is not supported. + **Workaround for downgrading TiFlash in testing or other special scenarios** -To downgrade TiFlash in testing or other special scenarios, you can forcibly scale in the target TiFlash node and then replicate data from TiKV again. For detailed steps, see [Scale in a TiFlash cluster](/scale-tidb-using-tiup.md#scale-in-a-tiflash-cluster). \ No newline at end of file +To downgrade TiFlash in testing or other special scenarios, you can forcibly scale in the target TiFlash node and then replicate data from TiKV again. For detailed steps, see [Scale in a TiFlash cluster](/scale-tidb-using-tiup.md#scale-in-a-tiflash-cluster).