Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Reduce links #51066

Closed
wants to merge 26 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/en/_assets/commonMarkdown/loadMethodIntro.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,6 @@ Each of these options has its own advantages, which are detailed in the followin

In most cases, we recommend that you use the INSERT+`FILES()` method, which is much easier to use.

However, the INSERT+`FILES()` method currently supports only the Parquet, ORC, and CSV file formats. Therefore, if you need to load data of other file formats such as JSON, or [perform data changes such as DELETE during data loading](../../loading/Load_to_Primary_Key_tables.md), you can resort to Broker Load.
However, the INSERT+`FILES()` method currently supports only the Parquet, ORC, and CSV file formats. Therefore, if you need to load data of other file formats such as JSON, or perform data changes such as DELETE during data loading, you can resort to Broker Load.

If you need to load a large number of data files with a significant data volume in total (for example, more than 100 GB or even 1 TB), we recommend that you use the Pipe method. Pipe can split the files based on their number or size, breaking down the load job into smaller, sequential tasks. This approach ensures that errors in one file do not impact the entire load job and minimizes the need for retries due to data errors.
1 change: 1 addition & 0 deletions docs/en/_assets/commonMarkdown/multi-service-access.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
For the best practices of multi-service access control, see [Multi-service access control](../../administration/user_privs/User_privilege.md#multi-service-access-control).
5 changes: 5 additions & 0 deletions docs/en/_assets/commonMarkdown/quickstart-iceberg-tip.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@

:::tip
This example uses the Local Climatological Data(LCD) dataset featured in the [StarRocks Basics](../../quick_start/shared-nothing.md) Quick Start. You can load the data and try the example yourself.
:::

3 changes: 3 additions & 0 deletions docs/en/_assets/commonMarkdown/quickstart-overview-tip.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
## Learn by doing

Try the [Quick Starts](../../quick_start/quick_start.mdx) to get an overview of using StarRocks with realistic scenarios.
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@

:::tip
Try Routine Load out in this [Quick Start](../../quick_start/routine-load.md)
:::

5 changes: 5 additions & 0 deletions docs/en/_assets/commonMarkdown/quickstart-shared-data.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@

:::tip
Give [shared-data](../../quick_start/shared-data.md) a try using MinIO for object storage.
:::

Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@

:::tip
This example uses the Local Climatological Data(LCD) dataset featured in the [StarRocks Basics](../../quick_start/shared-nothing.md) Quick Start. You can load the data and try the example yourself.
:::

Original file line number Diff line number Diff line change
Expand Up @@ -60,15 +60,15 @@ You can specify CPU and memory resource quotas for a resource group on a BE by u

> **NOTE**
>
> The amount of memory that can be used for queries is indicated by the `query_pool` parameter. For more information about the parameter, see [Memory management](Memory_management.md).
> The amount of memory that can be used for queries is indicated by the `query_pool` parameter.

- `concurrency_limit`

This parameter specifies the upper limit of concurrent queries in a resource group. It is used to avoid system overload caused by too many concurrent queries. This parameter takes effect only when it is set greater than 0. Default: 0.

- `max_cpu_cores`

The CPU core threshold for triggering query queue in FE. For more details, refer to [Query queues - Specify resource thresholds for resource group-level query queues](./query_queues.md#specify-resource-thresholds-for-resource-group-level-query-queues). It takes effect only when it is set to greater than `0`. Range: [0, `avg_be_cpu_cores`], where `avg_be_cpu_cores` represents the average number of CPU cores across all BE nodes. Default: 0.
The CPU core threshold for triggering query queue in FE. This only takes effect when it is set to greater than `0`. Range: [0, `avg_be_cpu_cores`], where `avg_be_cpu_cores` represents the average number of CPU cores across all BE nodes. Default: 0.

- `spill_mem_limit_threshold`

Expand Down Expand Up @@ -360,9 +360,9 @@ The following FE metrics only provide statistics within the current FE node:
| starrocks_fe_query_resource_group | Count | Instantaneous | The number of queries historically run in this resource group (including those currently running). |
| starrocks_fe_query_resource_group_latency | ms | Instantaneous | The query latency percentile for this resource group. The label `type` indicates specific percentiles, including `mean`, `75_quantile`, `95_quantile`, `98_quantile`, `99_quantile`, `999_quantile`. |
| starrocks_fe_query_resource_group_err | Count | Instantaneous | The number of queries in this resource group that encountered an error. |
| starrocks_fe_resource_group_query_queue_total | Count | Instantaneous | The total number of queries historically queued in this resource group (including those currently running). This metric is supported from v3.1.4 onwards. It is valid only when query queues are enabled, see [Query Queues](query_queues.md) for details. |
| starrocks_fe_resource_group_query_queue_pending | Count | Instantaneous | The number of queries currently in the queue of this resource group. This metric is supported from v3.1.4 onwards. It is valid only when query queues are enabled, see [Query Queues](query_queues.md) for details. |
| starrocks_fe_resource_group_query_queue_timeout | Count | Instantaneous | The number of queries in this resource group that have timed out while in the queue. This metric is supported from v3.1.4 onwards. It is valid only when query queues are enabled, see [Query Queues](query_queues.md) for details. |
| starrocks_fe_resource_group_query_queue_total | Count | Instantaneous | The total number of queries historically queued in this resource group (including those currently running). This metric is supported from v3.1.4 onwards. It is valid only when query queues are enabled. |
| starrocks_fe_resource_group_query_queue_pending | Count | Instantaneous | The number of queries currently in the queue of this resource group. This metric is supported from v3.1.4 onwards. It is valid only when query queues are enabled. |
| starrocks_fe_resource_group_query_queue_timeout | Count | Instantaneous | The number of queries in this resource group that have timed out while in the queue. This metric is supported from v3.1.4 onwards. It is valid only when query queues are enabled. |

### BE metrics

Expand Down Expand Up @@ -412,11 +412,3 @@ MySQL [(none)]> SHOW USAGE RESOURCE GROUPS;
| wg2 | 0 | 127.0.0.1 | 0.400 | 4 | 8 |
+------------+----+-----------+-----------------+-----------------+------------------+
```

## What to do next

After you configure resource groups, you can manage memory resources and queries. For more information, see the following topics:

- [Memory management](./Memory_management.md)

- [Query management](./Query_management.md)
5 changes: 2 additions & 3 deletions docs/en/data_source/catalog/iceberg_catalog.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,11 @@ toc_max_heading_level: 5
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
import QSTip from '../../_assets/commonMarkdown/quickstart-iceberg-tip.mdx'

# Iceberg catalog

:::tip
Try it in this [hands-on tutorial](../../quick_start/iceberg.md)
:::
<QSTip />

An Iceberg catalog is a type of external catalog that is supported by StarRocks from v2.4 onwards. With Iceberg catalogs, you can:

Expand Down
6 changes: 2 additions & 4 deletions docs/en/introduction/Architecture.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
---
displayed_sidebar: docs
---
import QSOverview from '../_assets/commonMarkdown/quickstart-overview-tip.mdx'

# Architecture

Expand Down Expand Up @@ -79,7 +80,4 @@ Queries against hot data scan the cache directly and then the local disk, while

Caching can be enabled when creating tables. If caching is enabled, data will be written to both the local disk and backend object storage. During queries, the CN nodes first read data from the local disk. If the data is not found, it will be retrieved from the backend object storage and simultaneously cached on the local disk.

## Learn by doing

- Give [shared-data](../quick_start/shared-data.md) a try using MinIO for object storage.
- Kubernetes users can use the [Helm quick start](../quick_start/helm.md) and deploy three FEs and three BEs in a shared-nothing architecture using persistent volumes.
<QSOverview />
2 changes: 1 addition & 1 deletion docs/en/loading/InsertInto.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ StarRocks v2.4 further supports overwriting data into a table by using INSERT OV
- You can cancel a synchronous INSERT transaction only by pressing the **Ctrl** and **C** keys from your MySQL client.
- You can submit an asynchronous INSERT task using [SUBMIT TASK](../sql-reference/sql-statements/loading_unloading/ETL/SUBMIT_TASK.md).
- As for the current version of StarRocks, the INSERT transaction fails by default if the data of any rows does not comply with the schema of the table. For example, the INSERT transaction fails if the length of a field in any row exceeds the length limit for the mapping field in the table. You can set the session variable `enable_insert_strict` to `false` to allow the transaction to continue by filtering out the rows that mismatch the table.
- If you execute the INSERT statement frequently to load small batches of data into StarRocks, excessive data versions are generated. It severely affects query performance. We recommend that, in production, you should not load data with the INSERT command too often or use it as a routine for data loading on a daily basis. If your application or analytic scenario demand solutions to loading streaming data or small data batches separately, we recommend you use Apache Kafka® as your data source and load the data via [Routine Load](../loading/RoutineLoad.md).
- If you execute the INSERT statement frequently to load small batches of data into StarRocks, excessive data versions are generated. It severely affects query performance. We recommend that, in production, you should not load data with the INSERT command too often or use it as a routine for data loading on a daily basis. If your application or analytic scenario demand solutions to loading streaming data or small data batches separately, we recommend you use Apache Kafka® as your data source and load the data via Routine Load.
- If you execute the INSERT OVERWRITE statement, StarRocks creates temporary partitions for the partitions which store the original data, inserts new data into the temporary partitions, and [swaps the original partitions with the temporary partitions](../sql-reference/sql-statements/table_bucket_part_index/ALTER_TABLE.md#use-a-temporary-partition-to-replace-the-current-partition). All these operations are executed in the FE Leader node. Hence, if the FE Leader node crashes while executing INSERT OVERWRITE command, the whole load transaction will fail, and the temporary partitions will be truncated.

## Preparation
Expand Down
11 changes: 3 additions & 8 deletions docs/en/loading/RoutineLoad.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,9 @@ displayed_sidebar: docs
# Load data using Routine Load

import InsertPrivNote from '../_assets/commonMarkdown/insertPrivNote.md'
import QSTip from '../_assets/commonMarkdown/quickstart-routine-load-tip.mdx'

:::tip
Try Routine Load out in this [Quick Start](../quick_start/routine-load.md)
:::
<QSTip />

This topic introduces how to create a Routine Load job to stream Kafka messages (events) into StarRocks, and familiarizes you with some basic concepts about Routine Load.

Expand Down Expand Up @@ -196,7 +195,7 @@ After submitting the load job, you can execute the [SHOW ROUTINE LOAD](../sql-re

In the example, the number of BE nodes that are alive is `5`, the number of the pre-specified Kafka topic partitions is `5`, and the value of `max_routine_load_task_concurrent_num` is `5`. To increase the actual load task concurrency, you can increase the `desired_concurrent_number` from the default value `3` to `5`.

For more about the properties, see [CREATE ROUTINE LOAD](../sql-reference/sql-statements/loading_unloading/routine_load/CREATE_ROUTINE_LOAD.md). For detailed instructions on accelerating the loading, see [Routine Load FAQ](../faq/loading/Routine_load_faq.md).
For more about the properties, see [CREATE ROUTINE LOAD](../sql-reference/sql-statements/loading_unloading/routine_load/CREATE_ROUTINE_LOAD.md).

### Load JSON-format data

Expand Down Expand Up @@ -551,7 +550,3 @@ The following example stops the load job `example_tbl2_ordertest2`:
```SQL
STOP ROUTINE LOAD FOR example_tbl2_ordertest2;
```

## FAQ

Please see [Routine Load FAQ](../faq/loading/Routine_load_faq.md).
2 changes: 1 addition & 1 deletion docs/en/sql-reference/data-types/semi_structured/Array.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,7 +201,7 @@ INSERT INTO t0 VALUES(1, [1,2,3]);

### Use Stream Load or Routine Load to load CSV-formatted arrays

Arrays in CSV files are separated with comma by default. You can use [Stream Load](../../../loading/StreamLoad.md#load-csv-data) or [Routine Load](../../../loading/RoutineLoad.md#load-csv-format-data) to load CSV text files or CSV data in Kafka.
Arrays in CSV files are separated with comma by default. You can use [Stream Load or Routine Load](../../../loading/loading_introduction/Loading_intro.md) to load CSV text files or CSV data in Kafka.

## Query ARRAY data

Expand Down
2 changes: 1 addition & 1 deletion docs/en/sql-reference/data-types/semi_structured/JSON.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ StarRocks supports the following data type conversions at Parquet file loading.
| LIST | ARRAY |
| Other data types such as UNION and TIMESTAMP | Not supported |

- Method 4: Use [Routine](../../../loading/RoutineLoad.md) load to continuously load JSON data from Kafka into StarRocks.
- Method 4: Use [Routine](../../../loading/loading_introduction/Loading_intro.md) load to continuously load JSON data from Kafka into StarRocks.

### Query and process JSON data

Expand Down
2 changes: 1 addition & 1 deletion docs/en/sql-reference/information_schema/loads.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ The following fields are provided in `loads`:
| JOB_ID | The unique ID assigned by StarRocks to identify the load job. |
| LABEL | The label of the load job. |
| DATABASE_NAME | The name of the database to which the destination StarRocks tables belong. |
| STATE | The state of the load job. Valid values:<ul><li>`PENDING`: The load job is created.</li><li>`QUEUEING`: The load job is in the queue waiting to be scheduled.</li><li>`LOADING`: The load job is running.</li><li>`PREPARED`: The transaction has been committed.</li><li>`FINISHED`: The load job succeeded.</li><li>`CANCELLED`: The load job failed.</li></ul>For more information, see the "Asynchronous loading" section in [Loading concepts](../../loading/loading_introduction/loading_concepts.md#asynchronous-loading). |
| STATE | The state of the load job. Valid values:<ul><li>`PENDING`: The load job is created.</li><li>`QUEUEING`: The load job is in the queue waiting to be scheduled.</li><li>`LOADING`: The load job is running.</li><li>`PREPARED`: The transaction has been committed.</li><li>`FINISHED`: The load job succeeded.</li><li>`CANCELLED`: The load job failed.</li></ul> |
| PROGRESS | The progress of the ETL stage and LOADING stage of the load job. |
| TYPE | The type of the load job. For Broker Load, the return value is `BROKER`. For INSERT, the return value is `INSERT`. |
| PRIORITY | The priority of the load job. Valid values: `HIGHEST`, `HIGH`, `NORMAL`, `LOW`, and `LOWEST`. |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
displayed_sidebar: docs
---

import Tip from '../../../_assets/commonMarkdown/quickstart-shared-nothing-tip.mdx';

# regexp_extract


Expand All @@ -16,9 +18,7 @@ VARCHAR regexp_extract(VARCHAR str, VARCHAR pattern, int pos)

## Examples

:::tip
This example uses the Local Climatological Data(LCD) dataset featured in the [StarRocks Basics](../../../quick_start/shared-nothing.md) Quick Start. You can load the data and try the example yourself.
:::
<Tip />

Given this data:

Expand Down
4 changes: 2 additions & 2 deletions docs/en/sql-reference/sql-functions/table-functions/files.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Defines data files in remote storage.

From v3.1.0 onwards, StarRocks supports defining read-only files in remote storage using the table function FILES(). It can access remote storage with the path-related properties of the files, infers the table schema of the data in the files, and returns the data rows. You can directly query the data rows using [SELECT](../../sql-statements/table_bucket_part_index/SELECT.md), load the data rows into an existing table using [INSERT](../../sql-statements/loading_unloading/INSERT.md), or create a new table and load the data rows into it using [CREATE TABLE AS SELECT](../../sql-statements/table_bucket_part_index/CREATE_TABLE_AS_SELECT.md).

From v3.2.0 onwards, FILES() supports writing data into files in remote storage. You can [use INSERT INTO FILES() to unload data from StarRocks to remote storage](../../../unloading/unload_using_insert_into_files.md).
From v3.2.0 onwards, FILES() supports writing data into files in remote storage. You can use INSERT INTO FILES() to unload data from StarRocks to remote storage.

Currently, the FILES() function supports the following data sources and file formats:

Expand Down Expand Up @@ -246,7 +246,7 @@ Suppose the data file **file1** is stored under a path in the format of `/geo/co

### unload_data_param

From v3.2 onwards, FILES() supports defining writable files in remote storage for data unloading. For detailed instructions, see [Unload data using INSERT INTO FILES](../../../unloading/unload_using_insert_into_files.md).
From v3.2 onwards, FILES() supports defining writable files in remote storage for data unloading.

```sql
-- Supported from v3.2 onwards.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ displayed_sidebar: docs

## Description

Creates resources. The following types of resources can be created: Apache Spark™, Apache Hive™, Apache Iceberg, Apache Hudi, and JDBC. Spark resources are used in [Spark Load](../../../loading/SparkLoad.md) to manage loading information, such as YARN configurations, storage path of intermediate data, and Broker configurations. Hive, Iceberg, Hudi, and JDBC resources are used for managing data source access information involved in querying [External tables](../../../data_source/External_table.md).
Creates resources. The following types of resources can be created: Apache Spark™, Apache Hive™, Apache Iceberg, Apache Hudi, and JDBC. Spark resources are used in Spark Load to manage loading information, such as YARN configurations, storage path of intermediate data, and Broker configurations. Hive, Iceberg, Hudi, and JDBC resources are used for managing data source access information involved in querying [External tables](../../../data_source/External_table.md).

:::tip

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,15 @@ toc_max_heading_level: 4
# GRANT

import UserPrivilegeCase from '../../../_assets/commonMarkdown/userPrivilegeCase.md'
import MultiServiceAccess from '../../../_assets/commonMarkdown/multi-service-access.mdx'

## Description

Grants one or more privileges on specific objects to a user or a role.

Grants roles to users or other roles.

For more information about the privileges that can be granted, see [Privilege items](../../../administration/user_privs/privilege_item.md).
For more information about the privileges that can be granted, see [Privilege items](../../../administration/user_privs/privilege_overview.md).

After a GRANT operation is performed, you can run [SHOW GRANTS](./SHOW_GRANTS.md) to view detailed privilege information or run [REVOKE](REVOKE.md) to revoke a privilege or role.

Expand Down Expand Up @@ -259,4 +260,5 @@ GRANT IMPERSONATE ON USER 'rose'@'%' TO USER 'jack'@'%';

<UserPrivilegeCase />

For the best practices of multi-service access control, see [Multi-service access control](../../../administration/user_privs/User_privilege.md#multi-service-access-control).
<MultiServiceAccess />

Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@ displayed_sidebar: docs

## Description

Revokes specific privileges or roles from a user or a role. For the privileges supported by StarRocks, see [Privileges supported by StarRocks](../../../administration/user_privs/privilege_item.md).
Revokes specific privileges or roles from a user or a role. For the privileges supported by StarRocks, see [Privileges supported by StarRocks](../../../administration/user_privs/privilege_overview.md).

:::tip

- Common users can only revoke their privileges that have the `WITH GRANT OPTION` keyword from other users and roles. For information about `WITH GRANT OPTION`, see [GRANT](GRANT.md).
- Common users can only revoke their privileges that have the `WITH GRANT OPTION` keyword from other users and roles. For information about `WITH GRANT OPTION`, see [GRANT](./GRANT.md).
- Only users with the `user_admin` role has the privilege to revoke privileges from other users.

:::
Expand Down
Loading
Loading