Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]: cloud: add concept docs #19717

Open
wants to merge 21 commits into
base: release-8.1
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
975 changes: 504 additions & 471 deletions TOC-tidb-cloud.md

Large diffs are not rendered by default.

Binary file added media/tidb-cloud/blank_transparent_placeholder.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
46 changes: 46 additions & 0 deletions tidb-cloud/ai-feature-concepts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
title: AI features
summary: Learn about AI features for TiDB Cloud.
---

# AI features

## Chat2Query

Chat2Query is an AI-powered feature integrated into SQL Editor that assists users in generating, debugging, or rewriting SQL queries using natural language instructions. For more information, see [Explore your data with AI-assisted SQL Editor](/tidb-cloud/explore-data-with-chat2query.md).

In addition, TiDB Cloud provides a Chat2Query API for TiDB Cloud Serverless clusters. After it is enabled, TiDB Cloud will automatically create a system Data App called Chat2Query and a Chat2Data endpoint in Data Service. You can call this endpoint to let AI generate and execute SQL statements by providing instructions. For more information, see [Get started with Chat2Query API](/tidb-cloud/use-chat2query-api.md).

## Vector Search

Vector search is a search method that prioritizes the meaning of your data to deliver relevant results.

Unlike traditional full-text search, which relies on exact keyword matching and word frequency, vector search converts various data types (such as text, images, or audio) into high-dimensional vectors and queries based on the similarity between these vectors. This search method captures the semantic meaning and contextual information of the data, leading to a more precise understanding of user intent.

Even when the search terms do not exactly match the content in the database, vector search can still provide results that align with the user's intent by analyzing the semantics of the data. For example, a full-text search for "a swimming animal" only returns results containing these exact keywords. In contrast, vector search can return results for other swimming animals, such as fish or ducks, even if these results do not contain the exact keywords.

For more information, see [Vector Search (Beta) Overview](/tidb-cloud/vector-search-overview.md).

## AI integrations

### AI frameworks

TiDB provides official support for several popular AI frameworks, enabling you to easily integrate AI applications developed based on these frameworks with TiDB Vector Search.

For a list of supported AI frameworks, see [Vector Search Integration Overview](/tidb-cloud/vector-search-integration-overview.md#ai-frameworks).

### Embedding models and services

A vector embedding, also known as an embedding, is a sequence of numbers that represents real-world objects in a high-dimensional space. It captures the meaning and context of unstructured data, such as documents, images, audio, and videos.

Embedding models are algorithms that transform data into [vector embeddings](/tidb-cloud/vector-search-overview.md#vector-embedding). The choice of an appropriate embedding model is crucial for ensuring the accuracy and relevance of semantic search results.

TiDB Vector Search supports storing vectors of up to 16383 dimensions, which accommodates most embedding models. For unstructured text data, you can find top-performing text embedding models on the [Massive Text Embedding Benchmark (MTEB) Leaderboard](https://huggingface.co/spaces/mteb/leaderboard).

### Object Relational Mapping (ORM) libraries

Object Relational Mapping (ORM) libraries are tools that facilitate the interaction between applications and relational databases by allowing developers to work with database records as if they were objects in their programming language of choice.

TiDB lets you integrate vector search with ORM libraries to manage vector data alongside traditional relational data. This integration is particularly useful for applications that need to store and query vector embeddings generated by AI models. By using ORM libraries, developers can seamlessly interact with vector data stored in TiDB, leveraging the database's capabilities to perform complex vector operations like nearest neighbor search.

For a list of supported ORM libraries, see [Vector Search Integration Overview](/tidb-cloud/vector-search-integration-overview.md#object-relational-mapping-orm-libraries).
98 changes: 98 additions & 0 deletions tidb-cloud/architecture-cocepts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tidb-cloud/architecture-cocepts.md -> tidb-cloud/architecture-concepts.md

title: Architecture
summary: Learn about architecture concepts for TiDB Cloud.
---

# Architecture

TiDB Cloud is a fully-managed Database-as-a-Service (DBaaS) that brings the flexibility and power of [TiDB](https://docs.pingcap.com/tidb/stable/overview), an open-source HTAP (Hybrid Transactional and Analytical Processing) database, to Google Cloud and AWS.

Check failure on line 8 in tidb-cloud/architecture-cocepts.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [PingCAP.LyHyphens] ' fully-' doesn't need a hyphen. Raw Output: {"message": "[PingCAP.LyHyphens] ' fully-' doesn't need a hyphen.", "location": {"path": "tidb-cloud/architecture-cocepts.md", "range": {"start": {"line": 8, "column": 16}}}, "severity": "ERROR"}

TiDB is MySQL-compatible, making it easy to migrate and work with existing applications, while offering seamless scalability to handle everything from small workloads to massive, high-performance clusters. It supports both transactional (OLTP) and analytical (OLAP) workloads in one system, simplifying operations and enabling real-time insights.

TiDB Cloud provides two deployment options: **TiDB Cloud** **Serverless**, for auto-scaling, cost-efficient workloads, and **TiDB Cloud Dedicated**, for enterprise-grade applications with dedicated resources and advanced capabilities. TiDB Cloud makes it easy to scale your database, handle complex management tasks, and stay focused on developing reliable, high-performing applications.

## TiDB Cloud Serverless

TiDB Cloud Serverless is a fully managed serverless solution that provides HTAP capabilities similar to traditional TiDB, while offering auto-scaling to alleviate users' burdens related to capacity planning and management complexities. It includes a free tier for basic usage, with consumption-based billing for any usage that exceeds the free limits. TiDB Cloud Serverless offers two types of high availability to address varying operational requirements.

By default, clusters utilizing the Zonal High Availability option have all components located within the same availability zone, which results in lower network latency.

For applications that require maximum infrastructure isolation and redundancy, the Regional High Availability option distributes nodes across multiple availability zones.

## TiDB Cloud Dedicated

TiDB Cloud Dedicated is designed for mission-critical businesses, offering high availability across multiple availability zones, horizontal scaling, and full HTAP capabilities.

Built on isolated cloud resources such as VPCs, VMs, managed Kubernetes services, and cloud storage, it leverages the infrastructure of major cloud providers. TiDB Cloud Dedicated clusters support the complete TiDB feature set, enabling rapid scaling, reliable backups, deployment within specific VPCs, and geographic-level disaster recovery.

## TiDB Cloud console

The [TiDB Cloud console](https://tidbcloud.com/) is the web-based management interface for both TiDB Cloud Serverless and TiDB Cloud Dedicated. It provides tools to manage clusters, import or migrate data, monitor performance metrics, configure backups, set up security controls, and integrate with other cloud services, all from a single, user-friendly platform.

## TiDB Cloud CLI

The TiDB Cloud CLI, `ticloud`, allows you to manage TiDB Cloud Serverless and TiDB Cloud Dedicated directly from your terminal with simple commands. You can perform tasks such as:

- Creating, deleting, and listing clusters.
- Importing data into clusters.
- Exporting data from clusters.

For more information, see [TiDB Cloud CLI Reference](/tidb-cloud/cli-reference.md).

## TiDB Cloud API

The TiDB Cloud API is a REST-based interface that provides programmatic access to manage resources across TiDB Cloud Serverless and TiDB Cloud Dedicated. It enables automated and efficient handling of tasks such as managing projects, clusters, backups, restores, data imports, billing, and other resources in [TiDB Cloud Data Service](/tidb-cloud/data-service-overview.md).

For more information, see [TiDB Cloud API Overview](/tidb-cloud/api-overview.md).

## Nodes

In TiDB Cloud, each cluster consists of TiDB, TiKV, and TiFlash nodes.

- In a TiDB Cloud Dedicated cluster, you can fully manage the number and size of your dedicated TiDB, TiKV, and TiFlash nodes according to your performance requirements. For more information, see [Scalability](/tidb-cloud/scalability-concepts.md).
- In a TiDB Cloud Serverless cluster, the number and size of TiDB, TiKV, and TiFlash nodes are automatically managed. This ensures seamless scaling, eliminating the need for users to handle node configuration or management tasks.

### TiDB node

A [TiDB node](/tidb-computing.md) is a stateless SQL layer that connects to applications using a MySQL-compatible endpoint. It handles tasks like parsing, optimizing, and creating distributed execution plans for SQL queries.

You can deploy multiple TiDB nodes to scale horizontally and manage higher workloads. These nodes work with load balancers, such as TiProxy or HAProxy, to provide a seamless interface. TiDB nodes do not store data themselves---they forward data requests to TiKV nodes for row-based storage or TiFlash nodes for columnar storage.

### TiKV node

A [TiKV node](/tikv-overview.md) is the backbone of data storage in the TiDB architecture, serving as a distributed transactional key-value storage engine that delivers reliability, scalability, and high availability.

**Key features:**

- **Region-based data storage**

- Data is divided into [Regions](https://docs.pingcap.com/tidb/dev/glossary#regionpeerraft-group), each covering a specific Key Range (left-closed, right-open interval: `StartKey` to `EndKey`).
- Multiple Regions coexist within each TiKV node to ensure efficient data distribution.

- **Transactional support**

- TiKV nodes provide native distributed transaction support at the key-value level, ensuring Snapshot Isolation as the default isolation level.
- The TiDB node translates SQL execution plans into calls to the TiKV node API, enabling seamless SQL-level transaction support.

- **High availability**

- All data in TiKV nodes is replicated (default: three replicas) for durability.
- TiKV ensures native high availability and supports automatic failover, safeguarding against node failures.

- **Scalability and reliability**

- TiKV nodes are designed to handle expanding datasets while maintaining distributed consistency and fault tolerance.

### TiFlash node

A [TiFlash node](/tiflash/tiflash-overview.md) is a specialized type of storage node within the TiDB architecture. Unlike ordinary TiKV nodes, TiFlash is designed for analytical acceleration with a columnar storage model.

**Key features:**

- **Columnar storage**

TiFlash nodes store data in a columnar format, making them optimized for analytical queries and significantly improving performance for read-intensive workloads.

- **Vector search index support**

The vector search index feature uses TiFlash replicas for tables, enabling advanced search capabilities and improving efficiency in complex analytical scenarios.
47 changes: 47 additions & 0 deletions tidb-cloud/backup-and-restore-concepts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
title: Backup & Restore
summary: Learn about backup & restore concepts for TiDB Cloud.
---

# Backup & Restore

TiDB Cloud Backup & Restore features are designed to safeguard your data and ensure business continuity by enabling you to back up and recover cluster data.

## Snapshot backup

Snapshot backup is an implementation to back up the entire cluster. It is based on [multi-version concurrency control (MVCC)](/tidb-storage.md#mvcc) and backs up all data in the specified snapshot to a target storage. The size of the backup data is approximately the size of the compressed single replica in the cluster.

## Automatic backup

For both TiDB Cloud Serverless and TiDB Cloud Dedicated clusters, snapshot backups are taken automatically by default and stored according to your backup retention policy.

## Log backup

Snapshot backup contains the full cluster data at a certain point, while TiDB log backup can back up data written by applications to a specified storage in a timely manner.

If you want to choose the restore point as required, that is, to perform point-in-time recovery (PITR), note the following:

- For TiDB Cloud Serverless clusters, PITR is available only for scalable clusters and not available for free clusters.
- For TiDB Cloud Dedicated clusters, you need to [enable PITR](/tidb-cloud/backup-and-restore.md#turn-on-point-in-time-restore) in advance.

## Manual backup

Dual region backup is a feature of TiDB Cloud Dedicated that enables you to back up your data to a known state as needed, and then restore to that state at any time.

For more information, see [Perform a manual backup](/tidb-cloud/backup-and-restore.md#perform-a-manual-backup).

## Dual region backup

Dual region backup is a feature of TiDB Cloud Dedicated that enables you to replicate backups from your cluster region to another different region. After it is enabled, all backups are automatically replicated to the specified region. This provides cross-region data protection and disaster recovery capabilities. It is estimated that approximately 99% of the data can be replicated to the secondary region within an hour.

For more information, see [Turn on dual region backup](/tidb-cloud/backup-and-restore.md#turn-on-dual-region-backup).

## Point-in-time Restore

Point-in-time Restore is a feature that enables you to restore data of any point in time to a new cluster. You can use it to:

- Reduce RPO in disaster recovery.
- Resolve cases of data write errors by restoring point-in-time that is before the error event.
- Audit the historical data of the business.

For more information, see [Turn on Point-in-time Restore](/tidb-cloud/backup-and-restore.md#turn-on-point-in-time-restore).
46 changes: 46 additions & 0 deletions tidb-cloud/data-service-concepts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
---
title: Data Service (Beta)
summary: Learn about Data Service concepts for TiDB Cloud.
---

# Data Service (Beta)

TiDB Cloud [Data Service (beta)](https://tidbcloud.com/console/data-service) is a fully managed low-code backend-as-a-service solution that simplifies backend application development, empowering developers to rapidly build highly scalable, secure, data-driven applications.

Data Service enables you to access TiDB Cloud data via an HTTPS request using a custom API endpoint. This feature uses a serverless architecture to handle computing resources and elastic scaling, so you can focus on the query logic in endpoints without worrying about infrastructure or maintenance costs.

For more information, see [TiDB Cloud Data Service (Beta) Overview](/tidb-cloud/data-service-overview.md).

## Data App

A Data App in [Data Service (beta)](https://tidbcloud.com/console/data-service) is a collection of endpoints that you can use to access data for a specific application. By creating a Data App, you can group your endpoints and configure authorization settings using API keys to restrict access to endpoints. In this way, you can ensure that only authorized users can access and manipulate your data, making your application more secure.

For more information, see [Manage a Data App](/tidb-cloud/data-service-manage-data-app.md).

## Data App endpoints

An endpoint in [Data Service (beta)](https://tidbcloud.com/console/data-service) is a web API that you can customize to execute SQL statements. You can specify parameters for your SQL statements, such as the value used in the `WHERE` clause. When a client calls an endpoint and provides values for the parameters in a request URL, the endpoint executes the corresponding SQL statement with the provided parameters and returns the results as part of the HTTP response.

For more information, see [Manage an Endpoint](/tidb-cloud/data-service-manage-endpoint.md).

## Chat2Query API

In TiDB Cloud, Chat2Query API is a RESTful interface that enables you to generate and execute SQL statements using AI by providing instructions. Then, the API returns the query results for you.

For more information, see [Get Started with Chat2Query API](/tidb-cloud/use-chat2query-api.md).

## AI integrations

Integrating third-party tools with your Data App enhances your applications with advanced natural language processing and artificial intelligence (AI) capabilities provided by third-party tools. This integration enables your applications to perform more complex tasks and deliver intelligent solutions.

Currently, you can integrate third-party tools, such as GPTs and Dify, in the TiDB Cloud console.
For more information, see [Integrate a Data App with Third-Party Tools](/tidb-cloud/data-service-integrations.md).

## Configuration as Code

TiDB Cloud provides a Configuration as Code (CaC) approach to represent your entire Data App configurations as code using the JSON syntax.

By connecting your Data App to GitHub, TiDB Cloud can use the CaC approach and push your Data App configurations as [configuration files](/tidb-cloud/data-service-app-config-files.md) to your preferred GitHub repository and branch.
If Auto Sync & Deployment is enabled for your GitHub connection, you can also modify your Data App by updating its configuration files on GitHub. After you push the configuration file changes to GitHub, the new configurations will be deployed in TiDB Cloud automatically.

For more information, see [Deploy Data App Automatically with GitHub](/tidb-cloud/data-service-manage-github-connection.md).
22 changes: 22 additions & 0 deletions tidb-cloud/data-streaming-concepts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
title: Data Streaming
summary: Learn about data streaming concepts for TiDB Cloud.
---

# Data Streaming

TiDB Cloud lets you stream data changes from your TiDB Cluster to other systems like Kafka, MySQL, and object storage.

Currently, TiDB Cloud supports streaming data to Apache Kafka, MySQL, TiDB Cloud and cloud storage.

## Changefeed

TiDB Cloud changefeed is a continuous data stream that helps you replicate data changes from TiDB Cloud to other data services.

On the changefeed page in the TiDB Cloud console, you can create a changefeed, view a list of existing changefeeds, and operate the existing changefeeds (such as scaling, pausing, resuming, editing, and deleting a changefeed).

Replication includes only incremental data changes by default. If existing data must be replicated, it must be exported and loaded into the target system manually before starting the changefeed.

In TiDB Cloud, replication can be tailored by defining table filters (to specify which tables to replicate) and event filters (to include or exclude specific types of events like INSERT or DELETE).

For more information, see [Changefeed](/tidb-cloud/changefeed-overview.md).
Loading
Loading