diff --git a/docs/en/_assets/commonMarkdown/loadMethodIntro.md b/docs/en/_assets/commonMarkdown/loadMethodIntro.md index d5cc7a640afa5..8f87c904680f7 100644 --- a/docs/en/_assets/commonMarkdown/loadMethodIntro.md +++ b/docs/en/_assets/commonMarkdown/loadMethodIntro.md @@ -7,6 +7,6 @@ Each of these options has its own advantages, which are detailed in the followin In most cases, we recommend that you use the INSERT+`FILES()` method, which is much easier to use. -However, the INSERT+`FILES()` method currently supports only the Parquet, ORC, and CSV file formats. Therefore, if you need to load data of other file formats such as JSON, or [perform data changes such as DELETE during data loading](../../loading/Load_to_Primary_Key_tables.md), you can resort to Broker Load. +However, the INSERT+`FILES()` method currently supports only the Parquet, ORC, and CSV file formats. Therefore, if you need to load data of other file formats such as JSON, or perform data changes such as DELETE during data loading, you can resort to Broker Load. If you need to load a large number of data files with a significant data volume in total (for example, more than 100 GB or even 1 TB), we recommend that you use the Pipe method. Pipe can split the files based on their number or size, breaking down the load job into smaller, sequential tasks. This approach ensures that errors in one file do not impact the entire load job and minimizes the need for retries due to data errors. diff --git a/docs/en/_assets/commonMarkdown/multi-service-access.mdx b/docs/en/_assets/commonMarkdown/multi-service-access.mdx new file mode 100644 index 0000000000000..a76bc56932fdf --- /dev/null +++ b/docs/en/_assets/commonMarkdown/multi-service-access.mdx @@ -0,0 +1 @@ +For the best practices of multi-service access control, see [Multi-service access control](../../administration/user_privs/User_privilege.md#multi-service-access-control). diff --git a/docs/en/_assets/commonMarkdown/quickstart-iceberg-tip.mdx b/docs/en/_assets/commonMarkdown/quickstart-iceberg-tip.mdx new file mode 100644 index 0000000000000..e32a3d871a85f --- /dev/null +++ b/docs/en/_assets/commonMarkdown/quickstart-iceberg-tip.mdx @@ -0,0 +1,5 @@ + +:::tip +This example uses the Local Climatological Data(LCD) dataset featured in the [StarRocks Basics](../../quick_start/shared-nothing.md) Quick Start. You can load the data and try the example yourself. +::: + diff --git a/docs/en/_assets/commonMarkdown/quickstart-overview-tip.mdx b/docs/en/_assets/commonMarkdown/quickstart-overview-tip.mdx new file mode 100644 index 0000000000000..a2618790f9d25 --- /dev/null +++ b/docs/en/_assets/commonMarkdown/quickstart-overview-tip.mdx @@ -0,0 +1,3 @@ +## Learn by doing + +Try the [Quick Starts](../../quick_start/quick_start.mdx) to get an overview of using StarRocks with realistic scenarios. diff --git a/docs/en/_assets/commonMarkdown/quickstart-routine-load-tip.mdx b/docs/en/_assets/commonMarkdown/quickstart-routine-load-tip.mdx new file mode 100644 index 0000000000000..8abc4a00b6e49 --- /dev/null +++ b/docs/en/_assets/commonMarkdown/quickstart-routine-load-tip.mdx @@ -0,0 +1,5 @@ + +:::tip +Try Routine Load out in this [Quick Start](../../quick_start/routine-load.md) +::: + diff --git a/docs/en/_assets/commonMarkdown/quickstart-shared-data.mdx b/docs/en/_assets/commonMarkdown/quickstart-shared-data.mdx new file mode 100644 index 0000000000000..dc5a34a47c1c4 --- /dev/null +++ b/docs/en/_assets/commonMarkdown/quickstart-shared-data.mdx @@ -0,0 +1,5 @@ + +:::tip +Give [shared-data](../../quick_start/shared-data.md) a try using MinIO for object storage. +::: + diff --git a/docs/en/_assets/commonMarkdown/quickstart-shared-nothing-tip.mdx b/docs/en/_assets/commonMarkdown/quickstart-shared-nothing-tip.mdx new file mode 100644 index 0000000000000..e32a3d871a85f --- /dev/null +++ b/docs/en/_assets/commonMarkdown/quickstart-shared-nothing-tip.mdx @@ -0,0 +1,5 @@ + +:::tip +This example uses the Local Climatological Data(LCD) dataset featured in the [StarRocks Basics](../../quick_start/shared-nothing.md) Quick Start. You can load the data and try the example yourself. +::: + diff --git a/docs/en/administration/management/resource_management/resource_group.md b/docs/en/administration/management/resource_management/resource_group.md index 17f2606d07893..bdfbcea0c79dd 100644 --- a/docs/en/administration/management/resource_management/resource_group.md +++ b/docs/en/administration/management/resource_management/resource_group.md @@ -60,7 +60,7 @@ You can specify CPU and memory resource quotas for a resource group on a BE by u > **NOTE** > - > The amount of memory that can be used for queries is indicated by the `query_pool` parameter. For more information about the parameter, see [Memory management](Memory_management.md). + > The amount of memory that can be used for queries is indicated by the `query_pool` parameter. - `concurrency_limit` @@ -68,7 +68,7 @@ You can specify CPU and memory resource quotas for a resource group on a BE by u - `max_cpu_cores` - The CPU core threshold for triggering query queue in FE. For more details, refer to [Query queues - Specify resource thresholds for resource group-level query queues](./query_queues.md#specify-resource-thresholds-for-resource-group-level-query-queues). It takes effect only when it is set to greater than `0`. Range: [0, `avg_be_cpu_cores`], where `avg_be_cpu_cores` represents the average number of CPU cores across all BE nodes. Default: 0. + The CPU core threshold for triggering query queue in FE. This only takes effect when it is set to greater than `0`. Range: [0, `avg_be_cpu_cores`], where `avg_be_cpu_cores` represents the average number of CPU cores across all BE nodes. Default: 0. - `spill_mem_limit_threshold` @@ -360,9 +360,9 @@ The following FE metrics only provide statistics within the current FE node: | starrocks_fe_query_resource_group | Count | Instantaneous | The number of queries historically run in this resource group (including those currently running). | | starrocks_fe_query_resource_group_latency | ms | Instantaneous | The query latency percentile for this resource group. The label `type` indicates specific percentiles, including `mean`, `75_quantile`, `95_quantile`, `98_quantile`, `99_quantile`, `999_quantile`. | | starrocks_fe_query_resource_group_err | Count | Instantaneous | The number of queries in this resource group that encountered an error. | -| starrocks_fe_resource_group_query_queue_total | Count | Instantaneous | The total number of queries historically queued in this resource group (including those currently running). This metric is supported from v3.1.4 onwards. It is valid only when query queues are enabled, see [Query Queues](query_queues.md) for details. | -| starrocks_fe_resource_group_query_queue_pending | Count | Instantaneous | The number of queries currently in the queue of this resource group. This metric is supported from v3.1.4 onwards. It is valid only when query queues are enabled, see [Query Queues](query_queues.md) for details. | -| starrocks_fe_resource_group_query_queue_timeout | Count | Instantaneous | The number of queries in this resource group that have timed out while in the queue. This metric is supported from v3.1.4 onwards. It is valid only when query queues are enabled, see [Query Queues](query_queues.md) for details. | +| starrocks_fe_resource_group_query_queue_total | Count | Instantaneous | The total number of queries historically queued in this resource group (including those currently running). This metric is supported from v3.1.4 onwards. It is valid only when query queues are enabled. | +| starrocks_fe_resource_group_query_queue_pending | Count | Instantaneous | The number of queries currently in the queue of this resource group. This metric is supported from v3.1.4 onwards. It is valid only when query queues are enabled. | +| starrocks_fe_resource_group_query_queue_timeout | Count | Instantaneous | The number of queries in this resource group that have timed out while in the queue. This metric is supported from v3.1.4 onwards. It is valid only when query queues are enabled. | ### BE metrics @@ -412,11 +412,3 @@ MySQL [(none)]> SHOW USAGE RESOURCE GROUPS; | wg2 | 0 | 127.0.0.1 | 0.400 | 4 | 8 | +------------+----+-----------+-----------------+-----------------+------------------+ ``` - -## What to do next - -After you configure resource groups, you can manage memory resources and queries. For more information, see the following topics: - -- [Memory management](./Memory_management.md) - -- [Query management](./Query_management.md) diff --git a/docs/en/data_source/catalog/iceberg_catalog.md b/docs/en/data_source/catalog/iceberg_catalog.md index 1521bea65aa44..be06d261c5d6c 100644 --- a/docs/en/data_source/catalog/iceberg_catalog.md +++ b/docs/en/data_source/catalog/iceberg_catalog.md @@ -4,12 +4,11 @@ toc_max_heading_level: 5 --- import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem'; +import QSTip from '../../_assets/commonMarkdown/quickstart-iceberg-tip.mdx' # Iceberg catalog -:::tip -Try it in this [hands-on tutorial](../../quick_start/iceberg.md) -::: + An Iceberg catalog is a type of external catalog that is supported by StarRocks from v2.4 onwards. With Iceberg catalogs, you can: diff --git a/docs/en/introduction/Architecture.md b/docs/en/introduction/Architecture.md index 6283fc4e31523..6dcc74512e807 100644 --- a/docs/en/introduction/Architecture.md +++ b/docs/en/introduction/Architecture.md @@ -1,6 +1,7 @@ --- displayed_sidebar: docs --- +import QSOverview from '../_assets/commonMarkdown/quickstart-overview-tip.mdx' # Architecture @@ -79,7 +80,4 @@ Queries against hot data scan the cache directly and then the local disk, while Caching can be enabled when creating tables. If caching is enabled, data will be written to both the local disk and backend object storage. During queries, the CN nodes first read data from the local disk. If the data is not found, it will be retrieved from the backend object storage and simultaneously cached on the local disk. -## Learn by doing - -- Give [shared-data](../quick_start/shared-data.md) a try using MinIO for object storage. -- Kubernetes users can use the [Helm quick start](../quick_start/helm.md) and deploy three FEs and three BEs in a shared-nothing architecture using persistent volumes. + diff --git a/docs/en/loading/InsertInto.md b/docs/en/loading/InsertInto.md index e13d1359f3b26..d21a147bf0e40 100644 --- a/docs/en/loading/InsertInto.md +++ b/docs/en/loading/InsertInto.md @@ -25,7 +25,7 @@ StarRocks v2.4 further supports overwriting data into a table by using INSERT OV - You can cancel a synchronous INSERT transaction only by pressing the **Ctrl** and **C** keys from your MySQL client. - You can submit an asynchronous INSERT task using [SUBMIT TASK](../sql-reference/sql-statements/loading_unloading/ETL/SUBMIT_TASK.md). - As for the current version of StarRocks, the INSERT transaction fails by default if the data of any rows does not comply with the schema of the table. For example, the INSERT transaction fails if the length of a field in any row exceeds the length limit for the mapping field in the table. You can set the session variable `enable_insert_strict` to `false` to allow the transaction to continue by filtering out the rows that mismatch the table. -- If you execute the INSERT statement frequently to load small batches of data into StarRocks, excessive data versions are generated. It severely affects query performance. We recommend that, in production, you should not load data with the INSERT command too often or use it as a routine for data loading on a daily basis. If your application or analytic scenario demand solutions to loading streaming data or small data batches separately, we recommend you use Apache Kafka® as your data source and load the data via [Routine Load](../loading/RoutineLoad.md). +- If you execute the INSERT statement frequently to load small batches of data into StarRocks, excessive data versions are generated. It severely affects query performance. We recommend that, in production, you should not load data with the INSERT command too often or use it as a routine for data loading on a daily basis. If your application or analytic scenario demand solutions to loading streaming data or small data batches separately, we recommend you use Apache Kafka® as your data source and load the data via Routine Load. - If you execute the INSERT OVERWRITE statement, StarRocks creates temporary partitions for the partitions which store the original data, inserts new data into the temporary partitions, and [swaps the original partitions with the temporary partitions](../sql-reference/sql-statements/table_bucket_part_index/ALTER_TABLE.md#use-a-temporary-partition-to-replace-the-current-partition). All these operations are executed in the FE Leader node. Hence, if the FE Leader node crashes while executing INSERT OVERWRITE command, the whole load transaction will fail, and the temporary partitions will be truncated. ## Preparation diff --git a/docs/en/loading/RoutineLoad.md b/docs/en/loading/RoutineLoad.md index e86f81f09f02d..c9ae242ebda8f 100644 --- a/docs/en/loading/RoutineLoad.md +++ b/docs/en/loading/RoutineLoad.md @@ -5,10 +5,9 @@ displayed_sidebar: docs # Load data using Routine Load import InsertPrivNote from '../_assets/commonMarkdown/insertPrivNote.md' +import QSTip from '../_assets/commonMarkdown/quickstart-routine-load-tip.mdx' -:::tip -Try Routine Load out in this [Quick Start](../quick_start/routine-load.md) -::: + This topic introduces how to create a Routine Load job to stream Kafka messages (events) into StarRocks, and familiarizes you with some basic concepts about Routine Load. @@ -196,7 +195,7 @@ After submitting the load job, you can execute the [SHOW ROUTINE LOAD](../sql-re In the example, the number of BE nodes that are alive is `5`, the number of the pre-specified Kafka topic partitions is `5`, and the value of `max_routine_load_task_concurrent_num` is `5`. To increase the actual load task concurrency, you can increase the `desired_concurrent_number` from the default value `3` to `5`. - For more about the properties, see [CREATE ROUTINE LOAD](../sql-reference/sql-statements/loading_unloading/routine_load/CREATE_ROUTINE_LOAD.md). For detailed instructions on accelerating the loading, see [Routine Load FAQ](../faq/loading/Routine_load_faq.md). + For more about the properties, see [CREATE ROUTINE LOAD](../sql-reference/sql-statements/loading_unloading/routine_load/CREATE_ROUTINE_LOAD.md). ### Load JSON-format data @@ -551,7 +550,3 @@ The following example stops the load job `example_tbl2_ordertest2`: ```SQL STOP ROUTINE LOAD FOR example_tbl2_ordertest2; ``` - -## FAQ - -Please see [Routine Load FAQ](../faq/loading/Routine_load_faq.md). diff --git a/docs/en/sql-reference/data-types/semi_structured/Array.md b/docs/en/sql-reference/data-types/semi_structured/Array.md index e44378b3a6646..b66e64e617d97 100644 --- a/docs/en/sql-reference/data-types/semi_structured/Array.md +++ b/docs/en/sql-reference/data-types/semi_structured/Array.md @@ -201,7 +201,7 @@ INSERT INTO t0 VALUES(1, [1,2,3]); ### Use Stream Load or Routine Load to load CSV-formatted arrays - Arrays in CSV files are separated with comma by default. You can use [Stream Load](../../../loading/StreamLoad.md#load-csv-data) or [Routine Load](../../../loading/RoutineLoad.md#load-csv-format-data) to load CSV text files or CSV data in Kafka. + Arrays in CSV files are separated with comma by default. You can use [Stream Load or Routine Load](../../../loading/loading_introduction/Loading_intro.md) to load CSV text files or CSV data in Kafka. ## Query ARRAY data diff --git a/docs/en/sql-reference/data-types/semi_structured/JSON.md b/docs/en/sql-reference/data-types/semi_structured/JSON.md index defb82033918a..84ee1b72707ba 100644 --- a/docs/en/sql-reference/data-types/semi_structured/JSON.md +++ b/docs/en/sql-reference/data-types/semi_structured/JSON.md @@ -69,7 +69,7 @@ StarRocks supports the following data type conversions at Parquet file loading. | LIST | ARRAY | | Other data types such as UNION and TIMESTAMP | Not supported | -- Method 4: Use [Routine](../../../loading/RoutineLoad.md) load to continuously load JSON data from Kafka into StarRocks. +- Method 4: Use [Routine](../../../loading/loading_introduction/Loading_intro.md) load to continuously load JSON data from Kafka into StarRocks. ### Query and process JSON data diff --git a/docs/en/sql-reference/information_schema/loads.md b/docs/en/sql-reference/information_schema/loads.md index d2886387d9604..a92d833e9f711 100644 --- a/docs/en/sql-reference/information_schema/loads.md +++ b/docs/en/sql-reference/information_schema/loads.md @@ -13,7 +13,7 @@ The following fields are provided in `loads`: | JOB_ID | The unique ID assigned by StarRocks to identify the load job. | | LABEL | The label of the load job. | | DATABASE_NAME | The name of the database to which the destination StarRocks tables belong. | -| STATE | The state of the load job. Valid values:
  • `PENDING`: The load job is created.
  • `QUEUEING`: The load job is in the queue waiting to be scheduled.
  • `LOADING`: The load job is running.
  • `PREPARED`: The transaction has been committed.
  • `FINISHED`: The load job succeeded.
  • `CANCELLED`: The load job failed.
For more information, see the "Asynchronous loading" section in [Loading concepts](../../loading/loading_introduction/loading_concepts.md#asynchronous-loading). | +| STATE | The state of the load job. Valid values:
  • `PENDING`: The load job is created.
  • `QUEUEING`: The load job is in the queue waiting to be scheduled.
  • `LOADING`: The load job is running.
  • `PREPARED`: The transaction has been committed.
  • `FINISHED`: The load job succeeded.
  • `CANCELLED`: The load job failed.
| | PROGRESS | The progress of the ETL stage and LOADING stage of the load job. | | TYPE | The type of the load job. For Broker Load, the return value is `BROKER`. For INSERT, the return value is `INSERT`. | | PRIORITY | The priority of the load job. Valid values: `HIGHEST`, `HIGH`, `NORMAL`, `LOW`, and `LOWEST`. | diff --git a/docs/en/sql-reference/sql-functions/like-predicate-functions/regexp_extract.md b/docs/en/sql-reference/sql-functions/like-predicate-functions/regexp_extract.md index 45b6f32338914..02617aea5f8bd 100644 --- a/docs/en/sql-reference/sql-functions/like-predicate-functions/regexp_extract.md +++ b/docs/en/sql-reference/sql-functions/like-predicate-functions/regexp_extract.md @@ -2,6 +2,8 @@ displayed_sidebar: docs --- +import Tip from '../../../_assets/commonMarkdown/quickstart-shared-nothing-tip.mdx'; + # regexp_extract @@ -16,9 +18,7 @@ VARCHAR regexp_extract(VARCHAR str, VARCHAR pattern, int pos) ## Examples -:::tip -This example uses the Local Climatological Data(LCD) dataset featured in the [StarRocks Basics](../../../quick_start/shared-nothing.md) Quick Start. You can load the data and try the example yourself. -::: + Given this data: diff --git a/docs/en/sql-reference/sql-functions/table-functions/files.md b/docs/en/sql-reference/sql-functions/table-functions/files.md index 7bad70b2e95a3..49f5d555d0030 100644 --- a/docs/en/sql-reference/sql-functions/table-functions/files.md +++ b/docs/en/sql-reference/sql-functions/table-functions/files.md @@ -10,7 +10,7 @@ Defines data files in remote storage. From v3.1.0 onwards, StarRocks supports defining read-only files in remote storage using the table function FILES(). It can access remote storage with the path-related properties of the files, infers the table schema of the data in the files, and returns the data rows. You can directly query the data rows using [SELECT](../../sql-statements/table_bucket_part_index/SELECT.md), load the data rows into an existing table using [INSERT](../../sql-statements/loading_unloading/INSERT.md), or create a new table and load the data rows into it using [CREATE TABLE AS SELECT](../../sql-statements/table_bucket_part_index/CREATE_TABLE_AS_SELECT.md). -From v3.2.0 onwards, FILES() supports writing data into files in remote storage. You can [use INSERT INTO FILES() to unload data from StarRocks to remote storage](../../../unloading/unload_using_insert_into_files.md). +From v3.2.0 onwards, FILES() supports writing data into files in remote storage. You can use INSERT INTO FILES() to unload data from StarRocks to remote storage. Currently, the FILES() function supports the following data sources and file formats: @@ -246,7 +246,7 @@ Suppose the data file **file1** is stored under a path in the format of `/geo/co ### unload_data_param -From v3.2 onwards, FILES() supports defining writable files in remote storage for data unloading. For detailed instructions, see [Unload data using INSERT INTO FILES](../../../unloading/unload_using_insert_into_files.md). +From v3.2 onwards, FILES() supports defining writable files in remote storage for data unloading. ```sql -- Supported from v3.2 onwards. diff --git a/docs/en/sql-reference/sql-statements/Resource/CREATE_RESOURCE.md b/docs/en/sql-reference/sql-statements/Resource/CREATE_RESOURCE.md index c028ec80e6a82..0eb7698ee9db9 100644 --- a/docs/en/sql-reference/sql-statements/Resource/CREATE_RESOURCE.md +++ b/docs/en/sql-reference/sql-statements/Resource/CREATE_RESOURCE.md @@ -6,7 +6,7 @@ displayed_sidebar: docs ## Description -Creates resources. The following types of resources can be created: Apache Spark™, Apache Hive™, Apache Iceberg, Apache Hudi, and JDBC. Spark resources are used in [Spark Load](../../../loading/SparkLoad.md) to manage loading information, such as YARN configurations, storage path of intermediate data, and Broker configurations. Hive, Iceberg, Hudi, and JDBC resources are used for managing data source access information involved in querying [External tables](../../../data_source/External_table.md). +Creates resources. The following types of resources can be created: Apache Spark™, Apache Hive™, Apache Iceberg, Apache Hudi, and JDBC. Spark resources are used in Spark Load to manage loading information, such as YARN configurations, storage path of intermediate data, and Broker configurations. Hive, Iceberg, Hudi, and JDBC resources are used for managing data source access information involved in querying [External tables](../../../data_source/External_table.md). :::tip diff --git a/docs/en/sql-reference/sql-statements/account-management/GRANT.md b/docs/en/sql-reference/sql-statements/account-management/GRANT.md index c702bac84f807..36aa10c199681 100644 --- a/docs/en/sql-reference/sql-statements/account-management/GRANT.md +++ b/docs/en/sql-reference/sql-statements/account-management/GRANT.md @@ -6,6 +6,7 @@ toc_max_heading_level: 4 # GRANT import UserPrivilegeCase from '../../../_assets/commonMarkdown/userPrivilegeCase.md' +import MultiServiceAccess from '../../../_assets/commonMarkdown/multi-service-access.mdx' ## Description @@ -13,7 +14,7 @@ Grants one or more privileges on specific objects to a user or a role. Grants roles to users or other roles. -For more information about the privileges that can be granted, see [Privilege items](../../../administration/user_privs/privilege_item.md). +For more information about the privileges that can be granted, see [Privilege items](../../../administration/user_privs/privilege_overview.md). After a GRANT operation is performed, you can run [SHOW GRANTS](./SHOW_GRANTS.md) to view detailed privilege information or run [REVOKE](REVOKE.md) to revoke a privilege or role. @@ -259,4 +260,5 @@ GRANT IMPERSONATE ON USER 'rose'@'%' TO USER 'jack'@'%'; -For the best practices of multi-service access control, see [Multi-service access control](../../../administration/user_privs/User_privilege.md#multi-service-access-control). + + diff --git a/docs/en/sql-reference/sql-statements/account-management/REVOKE.md b/docs/en/sql-reference/sql-statements/account-management/REVOKE.md index e093bfe0da585..6b2309311f1d7 100644 --- a/docs/en/sql-reference/sql-statements/account-management/REVOKE.md +++ b/docs/en/sql-reference/sql-statements/account-management/REVOKE.md @@ -6,11 +6,11 @@ displayed_sidebar: docs ## Description -Revokes specific privileges or roles from a user or a role. For the privileges supported by StarRocks, see [Privileges supported by StarRocks](../../../administration/user_privs/privilege_item.md). +Revokes specific privileges or roles from a user or a role. For the privileges supported by StarRocks, see [Privileges supported by StarRocks](../../../administration/user_privs/privilege_overview.md). :::tip -- Common users can only revoke their privileges that have the `WITH GRANT OPTION` keyword from other users and roles. For information about `WITH GRANT OPTION`, see [GRANT](GRANT.md). +- Common users can only revoke their privileges that have the `WITH GRANT OPTION` keyword from other users and roles. For information about `WITH GRANT OPTION`, see [GRANT](./GRANT.md). - Only users with the `user_admin` role has the privilege to revoke privileges from other users. ::: diff --git a/docs/en/sql-reference/sql-statements/account-management/SET_PASSWORD.md b/docs/en/sql-reference/sql-statements/account-management/SET_PASSWORD.md index 25878c7d8126c..5248f99a820c0 100644 --- a/docs/en/sql-reference/sql-statements/account-management/SET_PASSWORD.md +++ b/docs/en/sql-reference/sql-statements/account-management/SET_PASSWORD.md @@ -12,7 +12,7 @@ Changes login password for users. The [ALTER USER](ALTER_USER.md) command can al - All users can reset their own password. - Only users with the `user_admin` role can change the password of other users. -- Only the `root` user itself can change its password. For more information, see [Reset root password](../../../administration/user_privs/User_privilege.md#reset-lost-root-password). +- Only the `root` user itself can change its password. For more information, see [the priviege overview](../../../administration/user_privs/privilege_overview.md). ::: diff --git a/docs/en/sql-reference/sql-statements/cluster-management/nodes_processes/ALTER_SYSTEM.md b/docs/en/sql-reference/sql-statements/cluster-management/nodes_processes/ALTER_SYSTEM.md index f65f8a593fd35..1eddc8c8ce194 100644 --- a/docs/en/sql-reference/sql-statements/cluster-management/nodes_processes/ALTER_SYSTEM.md +++ b/docs/en/sql-reference/sql-statements/cluster-management/nodes_processes/ALTER_SYSTEM.md @@ -109,7 +109,7 @@ Manages FE, BE, CN, Broker nodes, and metadata snapshots in a cluster. ### Broker -- Add Broker nodes. You can use Broker nodes to load data from HDFS or cloud storage into StarRocks. For more information, see [Load data from HDFS](../../../../loading/hdfs_load.md) or [Load data from cloud storage](../../../../loading/cloud_storage_load.md). +- Add Broker nodes. You can use Broker nodes to load data from HDFS or cloud storage into StarRocks. For more information, see [Loading](../../../../loading/loading_introduction/Loading_intro.md). ```SQL ALTER SYSTEM ADD BROKER ":"[, ...] diff --git a/docs/en/sql-reference/sql-statements/loading_unloading/BROKER_LOAD.md b/docs/en/sql-reference/sql-statements/loading_unloading/BROKER_LOAD.md index 92ebc04eb3be0..4320faff7ccfb 100644 --- a/docs/en/sql-reference/sql-statements/loading_unloading/BROKER_LOAD.md +++ b/docs/en/sql-reference/sql-statements/loading_unloading/BROKER_LOAD.md @@ -9,7 +9,7 @@ import InsertPrivNote from '../../../_assets/commonMarkdown/insertPrivNote.md' ## Description -StarRocks provides the MySQL-based loading method Broker Load. After you submit a load job, StarRocks asynchronously runs the job. You can use `SELECT * FROM information_schema.loads` to query the job result. This feature is supported from v3.1 onwards. For more information about the background information, principles, supported data file formats, how to perform single-table loads and multi-table loads, and how to view job results, see [Load data from HDFS](../../../loading/hdfs_load.md) and [Load data from cloud storage](../../../loading/cloud_storage_load.md). +StarRocks provides the MySQL-based loading method Broker Load. After you submit a load job, StarRocks asynchronously runs the job. You can use `SELECT * FROM information_schema.loads` to query the job result. This feature is supported from v3.1 onwards. For more information about the background information, principles, supported data file formats, how to perform single-table loads and multi-table loads, and how to view job results, see [loading overview](../../../loading/loading_introduction/Loading_intro.md). @@ -177,7 +177,7 @@ INTO TABLE > > If the columns of the data file are mapped in sequence onto the columns of the StarRocks table, you do not need to specify `column_list`. - If you want to skip a specific column of the data file, you only need to temporarily name that column as different from any of the StarRocks table columns. For more information, see [Transform data at loading](../../../loading/Etl_in_loading.md). + If you want to skip a specific column of the data file, you only need to temporarily name that column as different from any of the StarRocks table columns. For more information, see [loading overview](../../../loading/loading_introduction/Loading_intro.md). - `COLUMNS FROM PATH AS` @@ -605,7 +605,7 @@ The following parameters are supported: - `priority` - Specifies the priority of the load job. Valid values: `LOWEST`, `LOW`, `NORMAL`, `HIGH`, and `HIGHEST`. Default value: `NORMAL`. Broker Load provides the [FE parameter](../../../administration/management/FE_configuration.md) `max_broker_load_job_concurrency`, determines the maximum number of Broker Load jobs that can be concurrently run within your StarRocks cluster. If the number of Broker Load jobs that are submitted within the specified time period exceeds the maximum number, excessive jobs will be waiting to be scheduled based on their priorities. + Specifies the priority of the load job. Valid values: `LOWEST`, `LOW`, `NORMAL`, `HIGH`, and `HIGHEST`. Default value: `NORMAL`. Broker Load provides the FE parameter `max_broker_load_job_concurrency`, determines the maximum number of Broker Load jobs that can be concurrently run within your StarRocks cluster. If the number of Broker Load jobs that are submitted within the specified time period exceeds the maximum number, excessive jobs will be waiting to be scheduled based on their priorities. You can use the [ALTER LOAD](ALTER_LOAD.md) statement to change the priority of an existing load job that is in the `QUEUEING` or `LOADING` state. @@ -623,8 +623,6 @@ The following parameters are supported: Specifies the name of the column you want to use as the condition to determine whether updates can take effect. The update from a source record to a destination record takes effect only when the source data record has a greater or equal value than the destination data record in the specified column. - Broker Load supports conditional updates since v3.1. For more information, see [Change data through loading](../../../loading/Load_to_Primary_Key_tables.md#conditional-updates). - > **NOTE** > > The column that you specify cannot be a primary key column. Additionally, only tables that use the Primary Key table support conditional updates. @@ -693,7 +691,7 @@ For examples about loading JSON-formatted data by using the matched mode, see [L ## Related configuration items -The [FE configuration item](../../../administration/management/FE_configuration.md) `max_broker_load_job_concurrency` specifies the maximum number of Broker Load jobs that can be concurrently run within your StarRocks cluster. +The FE configuration item `max_broker_load_job_concurrency` specifies the maximum number of Broker Load jobs that can be concurrently run within your StarRocks cluster. In StarRocks v2.4 and earlier, if the total number of Broker Load jobs that are submitted within a specific period of time exceeds the maximum number, excessive jobs will be queued and scheduled based on their submission time. @@ -707,7 +705,7 @@ A Broker Load job can be split into one or more tasks that concurrently run. The - If you declare multiple `data_desc` parameters, each of which specifies a distinct partition for the same table, a task is generated to load the data of each partition. -Additionally, each task can be further split into one or more instances, which are evenly distributed to and concurrently run on the BEs or CNs of your StarRocks cluster. StarRocks splits each task based on the FE parameter [`min_bytes_per_broker_scanner`](../../../administration/management/FE_configuration.md) and the number of BE or CN nodes. You can use the following formula to calculate the number of instances in an individual task: +Additionally, each task can be further split into one or more instances, which are evenly distributed to and concurrently run on the BEs or CNs of your StarRocks cluster. StarRocks splits each task based on the FE parameter `min_bytes_per_broker_scanner` and the number of BE or CN nodes. You can use the following formula to calculate the number of instances in an individual task: **Number of instances in an individual task = min(Amount of data to be loaded by an individual task/`min_bytes_per_broker_scanner`, Number of BE/CN nodes)** diff --git a/docs/en/sql-reference/sql-statements/loading_unloading/INSERT.md b/docs/en/sql-reference/sql-statements/loading_unloading/INSERT.md index 1133b054ecfcf..4bc379489dff2 100644 --- a/docs/en/sql-reference/sql-statements/loading_unloading/INSERT.md +++ b/docs/en/sql-reference/sql-statements/loading_unloading/INSERT.md @@ -6,7 +6,7 @@ displayed_sidebar: docs ## Description -Inserts data into a specific table or overwrites a specific table with data. For detailed information about the application scenarios, see [Load data with INSERT](../../../loading/InsertInto.md). From v3.2.0 onwards, INSERT supports writing data into files in remote storage. You can [use INSERT INTO FILES() to unload data from StarRocks to remote storage](../../../unloading/unload_using_insert_into_files.md). +Inserts data into a specific table or overwrites a specific table with data. From v3.2.0 onwards, INSERT supports writing data into files in remote storage. You can use INSERT INTO FILES() to unload data from StarRocks to remote storage. You can submit an asynchronous INSERT task using [SUBMIT TASK](ETL/SUBMIT_TASK.md). @@ -45,7 +45,7 @@ You can submit an asynchronous INSERT task using [SUBMIT TASK](ETL/SUBMIT_TASK.m | expression | Expression that assigns values to the column. | | DEFAULT | Assigns default value to the column. | | query | Query statement whose result will be loaded into the destination table. It can be any SQL statement supported by StarRocks. | -| FILES() | Table function [FILES()](../../sql-functions/table-functions/files.md). You can use this function to unload data into remote storage. For more information, see [Use INSERT INTO FILES() to unload data to remote storage](../../../unloading/unload_using_insert_into_files.md). | +| FILES() | Table function [FILES()](../../sql-functions/table-functions/files.md). You can use this function to unload data into remote storage. | ## Return diff --git a/docs/en/sql-reference/sql-statements/loading_unloading/SHOW_LOAD.md b/docs/en/sql-reference/sql-statements/loading_unloading/SHOW_LOAD.md index f7f7eeb2fa307..800ca3e3d7d06 100644 --- a/docs/en/sql-reference/sql-statements/loading_unloading/SHOW_LOAD.md +++ b/docs/en/sql-reference/sql-statements/loading_unloading/SHOW_LOAD.md @@ -6,7 +6,7 @@ displayed_sidebar: docs ## Description -Displays information of all load jobs or given load jobs in a database. This statement can only display load jobs that are created by using [Broker Load](BROKER_LOAD.md), [INSERT](../data-mSPARK_LOAD.mdLoad](../loading_unloading/SPARK_LOAD.md). You can also view information of load jobs via the `curl` command. From v3.1 onwards, we recommend that you use the [SELECT](../table_bucket_part_index/SELECT.md) statement to query the results of Broker Load or Insert jobs from the [`loads`](../../information_schema/loads.md) table in the `information_schema` database. For more information, see [Load data from HDFS](../../../loading/hdfs_load.md), [Load data from cloud storage](../../../loading/cloud_storage_load.md), [Load data using INSERT](../../../loading/InsertInto.md), and [Bulk load using Apache Spark™](../../../loading/SparkLoad.md). +Displays information of all load jobs or given load jobs in a database. This statement can only display load jobs that are created by using Broker Load, INSERT, and SPARK_LOAD. You can also view load job information via the `curl` command. From v3.1 onwards, we recommend that you use the SELECT statement to query the results of Broker Load or Insert jobs from the `loads` table in the `information_schema` database. For more information, see [Loading](../../../loading/loading_introduction/Loading_intro.md). In addition to the preceding loading methods, StarRocks supports using Stream Load and Routine Load to load data. Stream Load is a synchronous operation and will directly return information of Stream Load jobs. Routine Load is an asynchronous operation where you can use the [SHOW ROUTINE LOAD](routine_load/SHOW_ROUTINE_LOAD.md) statement to display information of Routine Load jobs. diff --git a/docs/en/sql-reference/sql-statements/loading_unloading/STREAM_LOAD.md b/docs/en/sql-reference/sql-statements/loading_unloading/STREAM_LOAD.md index 9982519dc00e0..aa4f9ee5af158 100644 --- a/docs/en/sql-reference/sql-statements/loading_unloading/STREAM_LOAD.md +++ b/docs/en/sql-reference/sql-statements/loading_unloading/STREAM_LOAD.md @@ -2,12 +2,11 @@ displayed_sidebar: docs toc_max_heading_level: 4 --- +import Tip from '../../../_assets/commonMarkdown/quickstart-shared-nothing-tip.mdx'; # STREAM LOAD -:::tip -The [StarRocks in Docker](../../../quick_start/shared-nothing.md) quick start features Stream Load. Give it a try for hands-on experience and a detailed explanation of a realistic ETL flow. -::: + ## Description @@ -164,7 +163,7 @@ The following table describes the optional parameters. | load_mem_limit | No | The maximum amount of memory that can be provisioned to the load job. Unit: bytes. By default, the maximum memory size for a load job is 2 GB. The value of this parameter cannot exceed the maximum amount of memory that can be provisioned to each BE or CN. | | partial_update | No | Whether to use partial updates. Valid values: `TRUE` and `FALSE`. Default value: `FALSE`, indicating to disable this feature. | | partial_update_mode | No | Specifies the mode for partial updates. Valid values: `row` and `column`.
  • The value `row` (default) means partial updates in row mode, which is more suitable for real-time updates with many columns and small batches.
  • The value `column` means partial updates in column mode, which is more suitable for batch updates with few columns and many rows. In such scenarios, enabling the column mode offers faster update speeds. For example, in a table with 100 columns, if only 10 columns (10% of the total) are updated for all rows, the update speed of the column mode is 10 times faster.
| -| merge_condition | No | Specifies the name of the column you want to use as the condition to determine whether updates can take effect. The update from a source record to a destination record takes effect only when the source data record has a greater or equal value than the destination data record in the specified column. StarRocks supports conditional updates since v2.5. For more information, see [Change data through loading](../../../loading/Load_to_Primary_Key_tables.md).
**NOTE**
The column that you specify cannot be a primary key column. Additionally, only tables that use the Primary Key table support conditional updates. | +| merge_condition | No | Specifies the name of the column you want to use as the condition to determine whether updates can take effect. The update from a source record to a destination record takes effect only when the source data record has a greater or equal value than the destination data record in the specified column. StarRocks supports conditional updates since v2.5.
**NOTE**
The column that you specify cannot be a primary key column. Additionally, only tables that use the Primary Key table support conditional updates. | ## Column mapping diff --git a/docs/en/sql-reference/sql-statements/loading_unloading/routine_load/ALTER_ROUTINE_LOAD.md b/docs/en/sql-reference/sql-statements/loading_unloading/routine_load/ALTER_ROUTINE_LOAD.md index 959aa1f39176a..fc740dad50ba7 100644 --- a/docs/en/sql-reference/sql-statements/loading_unloading/routine_load/ALTER_ROUTINE_LOAD.md +++ b/docs/en/sql-reference/sql-statements/loading_unloading/routine_load/ALTER_ROUTINE_LOAD.md @@ -93,7 +93,7 @@ FROM data_source ## Examples -1. The following example increases the value of the property `desired_concurrent_number` of the load job to `5` in order to increase the parallelism of load tasks. For details on task parallelism, see [how to improve load performance](../../../../faq/loading/Routine_load_faq.md#how-can-i-improve-loading-performance). +1. The following example increases the value of the property `desired_concurrent_number` of the load job to `5` in order to increase the parallelism of load tasks. ```SQL ALTER ROUTINE LOAD FOR example_tbl_ordertest diff --git a/docs/en/sql-reference/sql-statements/loading_unloading/routine_load/CREATE_ROUTINE_LOAD.md b/docs/en/sql-reference/sql-statements/loading_unloading/routine_load/CREATE_ROUTINE_LOAD.md index 75eea2d86f8e0..9dd6b7b52cd32 100644 --- a/docs/en/sql-reference/sql-statements/loading_unloading/routine_load/CREATE_ROUTINE_LOAD.md +++ b/docs/en/sql-reference/sql-statements/loading_unloading/routine_load/CREATE_ROUTINE_LOAD.md @@ -2,11 +2,11 @@ displayed_sidebar: docs --- +import Tip from '../../../../_assets/commonMarkdown/quickstart-routine-load-tip.mdx'; + # CREATE ROUTINE LOAD -:::tip -Try Routine Load out in this [Quick Start](../../../../quick_start/routine-load.md) -::: + Routine Load can continuously consume messages from Apache Kafka® and load data into StarRocks. Routine Load can consume CSV, JSON, and Avro (supported since v3.0.1) data from a Kafka cluster and access Kafka via multiple security protocols, including `plaintext`, `ssl`, `sasl_plaintext`, and `sasl_ssl`. @@ -14,7 +14,7 @@ This topic describes the syntax, parameters, and examples of the CREATE ROUTINE > **NOTE** > -> - For information about the application scenarios, principles, and basic operations of Routine Load, see [Load data using Routine Load](../../../../loading/RoutineLoad.md). +> - For information about the application scenarios, principles, and basic operations of Routine Load, see [Load data using Routine Load](../../../../loading/loading_introduction/Loading_intro.md). > - You can load data into StarRocks tables only as a user who has the INSERT privilege on those StarRocks tables. If you do not have the INSERT privilege, follow the instructions provided in [GRANT](../../account-management/GRANT.md) to grant the INSERT privilege to the user that you use to connect to your StarRocks cluster. ## Syntax @@ -117,7 +117,7 @@ PROPERTIES ("" = ""[, "" = "" ...]) | log_rejected_record_num | No | Specifies the maximum number of unqualified data rows that can be logged. This parameter is supported from v3.1 onwards. Valid values: `0`, `-1`, and any non-zero positive integer. Default value: `0`.
  • The value `0` specifies that data rows that are filtered out will not be logged.
  • The value `-1` specifies that all data rows that are filtered out will be logged.
  • A non-zero positive integer such as `n` specifies that up to `n` data rows that are filtered out can be logged on each BE.
| | timezone | No | The time zone used by the load job. Default value: `Asia/Shanghai`. The value of this parameter affects the results returned by functions such as strftime(), alignment_timestamp(), and from_unixtime(). The time zone specified by this parameter is a session-level time zone. For more information, see [Configure a time zone](../../../../administration/management/timezone.md). | | partial_update | No | Whether to use partial updates. Valid values: `TRUE` and `FALSE`. Default value: `FALSE`, indicating to disable this feature. | -| merge_condition | No | Specifies the name of the column you want to use as the condition to determine whether to update data. Data will be updated only when the value of the data to be loaded into this column is greater than or equal to the current value of this column. For more information, see [Change data through loading](../../../../loading/Load_to_Primary_Key_tables.md).
**NOTE**
Only Primary Key tables support conditional updates. The column that you specify cannot be a primary key column. | +| merge_condition | No | Specifies the name of the column you want to use as the condition to determine whether to update data. Data will be updated only when the value of the data to be loaded into this column is greater than or equal to the current value of this column. **NOTE**
Only Primary Key tables support conditional updates. The column that you specify cannot be a primary key column. | | format | No | The format of the data to be loaded. Valid values: `CSV`, `JSON`, and `Avro` (supported since v3.0.1). Default value: `CSV`. | | trim_space | No | Specifies whether to remove spaces preceding and following column separators from the data file when the data file is in CSV format. Type: BOOLEAN. Default value: `false`.
For some databases, spaces are added to column separators when you export data as a CSV-formatted data file. Such spaces are called leading spaces or trailing spaces depending on their locations. By setting the `trim_space` parameter, you can enable StarRocks to remove such unnecessary spaces during data loading.
Note that StarRocks does not remove the spaces (including leading spaces and trailing spaces) within a field wrapped in a pair of `enclose`-specified characters. For example, the following field values use pipe (|) as the column separator and double quotation marks (`"`) as the `enclose`-specified character: | "Love StarRocks" |. If you set `trim_space` to `true`, StarRocks processes the preceding field values as |"Love StarRocks"|. | | enclose | No | Specifies the character that is used to wrap the field values in the data file according to [RFC4180](https://www.rfc-editor.org/rfc/rfc4180) when the data file is in CSV format. Type: single-byte character. Default value: `NONE`. The most prevalent characters are single quotation mark (`'`) and double quotation mark (`"`).
All special characters (including row separators and column separators) wrapped by using the `enclose`-specified character are considered normal symbols. StarRocks can do more than RFC4180 as it allows you to specify any single-byte character as the `enclose`-specified character.
If a field value contains an `enclose`-specified character, you can use the same character to escape that `enclose`-specified character. For example, you set `enclose` to `"`, and a field value is `a "quoted" c`. In this case, you can enter the field value as `"a ""quoted"" c"` into the data file. | @@ -333,10 +333,6 @@ FROM KAFKA To improve loading performance and avoid accumulative consumption, you can increase task parallelism by increasing the `desired_concurrent_number` value when you create the Routine Load job. Task parallelism allows splitting one Routine Load job into as many parallel tasks as possible. -> **Note** -> -> For more ways to improve loading performance, see [Routine Load FAQ](../../../../faq/loading/Routine_load_faq.md). - Note that the actual task parallelism is determined by the minimum value among the following multiple parameters: ```SQL diff --git a/docs/en/sql-reference/sql-statements/loading_unloading/routine_load/SHOW_ROUTINE_LOAD_TASK.md b/docs/en/sql-reference/sql-statements/loading_unloading/routine_load/SHOW_ROUTINE_LOAD_TASK.md index c462a644ab264..46befd6172435 100644 --- a/docs/en/sql-reference/sql-statements/loading_unloading/routine_load/SHOW_ROUTINE_LOAD_TASK.md +++ b/docs/en/sql-reference/sql-statements/loading_unloading/routine_load/SHOW_ROUTINE_LOAD_TASK.md @@ -14,8 +14,6 @@ Shows the execution information of load tasks within a Routine Load job. - -- For the relationship between a Routine Load job and the load tasks in it, see [Load data using Routine Load](../../../../loading/RoutineLoad.md#basic-concepts) - ::: ## Syntax diff --git a/docs/en/sql-reference/sql-statements/loading_unloading/unloading/EXPORT.md b/docs/en/sql-reference/sql-statements/loading_unloading/unloading/EXPORT.md index 10c4774556e52..7791753bc3884 100644 --- a/docs/en/sql-reference/sql-statements/loading_unloading/unloading/EXPORT.md +++ b/docs/en/sql-reference/sql-statements/loading_unloading/unloading/EXPORT.md @@ -64,7 +64,7 @@ WITH BROKER - `WITH BROKER` - In v2.4 and earlier, input `WITH BROKER ""` to specify the broker you want to use. From v2.5 onwards, you no longer need to specify a broker, but you still need to retain the `WITH BROKER` keyword. For more information, see [Export data using EXPORT > Background information](../../../../unloading/Export.md#background-information). + In v2.4 and earlier, input `WITH BROKER ""` to specify the broker you want to use. From v2.5 onwards, you no longer need to specify a broker, but you still need to retain the `WITH BROKER` keyword. - `broker_properties` diff --git a/docs/en/sql-reference/sql-statements/loading_unloading/unloading/SHOW_EXPORT.md b/docs/en/sql-reference/sql-statements/loading_unloading/unloading/SHOW_EXPORT.md index 6dccbe804bbb3..fbf0cc5ddf2ea 100644 --- a/docs/en/sql-reference/sql-statements/loading_unloading/unloading/SHOW_EXPORT.md +++ b/docs/en/sql-reference/sql-statements/loading_unloading/unloading/SHOW_EXPORT.md @@ -84,7 +84,7 @@ The parameters in the return result are described as follows: - `FINISHED`: The export job has been successfully completed. - `CANCELLED`: The export job has failed. -- `Progress`: the progress of the export job. The progress is measured in the unit of query plans. Suppose that the export job is divided into 10 query plans and three of them have finished. In this case, the progress of the export job is 30%. For more information, see ["Export data using EXPORT > Workflow"](../../../../unloading/Export.md#workflow). +- `Progress`: the progress of the export job. The progress is measured in the unit of query plans. Suppose that the export job is divided into 10 query plans and three of them have finished. In this case, the progress of the export job is 30%. - `TaskInfo`: the information of the export job. The information is a JSON object that consists of the following keys: @@ -93,7 +93,7 @@ The parameters in the return result are described as follows: - `column separator`: the column separator used in the exported data file. - `columns`: the names of the columns whose data is exported. - `tablet num`: the total number of tablets that are exported. - - `broker`: In v2.4 and earlier, this field is used to return the name of the broker that is used by the export job. From v2.5 onwards, this field returns an empty string. For more information, see ["Export data using EXPORT > Background information"](../../../../unloading/Export.md#background-information). + - `broker`: In v2.4 and earlier, this field is used to return the name of the broker that is used by the export job. From v2.5 onwards, this field returns an empty string. - `coord num`: the number of query plans into which the export job is divided. - `db`: the name of the database to which the exported data belongs. - `tbl`: the name of the table to which the exported data belongs diff --git a/docs/en/sql-reference/sql-statements/table_bucket_part_index/CREATE_TABLE.md b/docs/en/sql-reference/sql-statements/table_bucket_part_index/CREATE_TABLE.md index a32d5bb753d7d..9fcc2b9030853 100644 --- a/docs/en/sql-reference/sql-statements/table_bucket_part_index/CREATE_TABLE.md +++ b/docs/en/sql-reference/sql-statements/table_bucket_part_index/CREATE_TABLE.md @@ -114,11 +114,11 @@ Default value: `olap`. If this parameter is not specified, an OLAP table (StarRo Optional value: `mysql`, `elasticsearch`, `hive`, `jdbc` (2.3 and later), `iceberg`, and `hudi` (2.2 and later). If you want to create an external table to query external data sources, specify `CREATE EXTERNAL TABLE` and set `ENGINE` to any of these values. You can refer to [External table](../../../data_source/External_table.md) for more information. -**From v3.0 onwards, we recommend that you use catalogs to query data from Hive, Iceberg, Hudi, and JDBC data sources. External tables are deprecated. For more information, see [Hive catalog](../../../data_source/catalog/hive_catalog.md), [Iceberg catalog](../../../data_source/catalog/iceberg_catalog.md), [Hudi catalog](../../../data_source/catalog/hudi_catalog.md), and [JDBC catalog](../../../data_source/catalog/jdbc_catalog.md).** +**We recommend that you use catalogs to query data from Hive, Iceberg, Hudi, and JDBC data sources. External tables are deprecated.** -**From v3.1 onwards, StarRocks supports creating Parquet-formatted tables in Iceberg catalogs, and you can insert data to these Parquet-formatted Iceberg tables by using [INSERT INTO](../loading_unloading/INSERT.md). See [Create an Iceberg table](../../../data_source/catalog/iceberg_catalog.md#create-an-iceberg-table).** +**From v3.1 onwards, StarRocks supports creating Parquet-formatted tables in Iceberg catalogs, and you can insert data to these Parquet-formatted Iceberg tables by using INSERT INTO.** -**From v3.2 onwards, StarRocks supports creating Parquet-formatted tables in Hive catalogs, and supports sinking data to these Parquet-formatted Hive tables by using [INSERT INTO](../loading_unloading/INSERT.md). From v3.3 onwards, StarRocks supports creating ORC- and Textfile-formatted tables in Hive catalogs, and supports sinking data to these ORC- and Textfile-formatted Hive tables by using [INSERT INTO](../loading_unloading/INSERT.md). For more information, see [Create a Hive table](../../../data_source/catalog/hive_catalog.md#create-a-hive-table) and [Sink data to a Hive table](../../../data_source/catalog/hive_catalog.md#sink-data-to-a-hive-table).** +**From v3.2 onwards, StarRocks supports creating Parquet-formatted tables in Hive catalogs, and supports sinking data to these Parquet-formatted Hive tables by using INSERT INTO. From v3.3 onwards, StarRocks supports creating ORC- and Textfile-formatted tables in Hive catalogs, and supports sinking data to these ORC- and Textfile-formatted Hive tables by using INSERT INTO** - For MySQL, specify the following properties: @@ -349,7 +349,7 @@ StarRocks supports hash bucketing and random bucketing. If you do not configure **Precautions** - You can only use random bucketing to create Duplicate Key tables. - You can not specify a [Colocation Group](../../../using_starrocks/Colocate_join.md) for a table bucketed randomly. - - [Spark Load](../../../loading/SparkLoad.md) cannot be used to load data into tables bucketed randomly. + - Spark Load cannot be used to load data into tables bucketed randomly. - Since StarRocks v2.5.7, you do not need to set the number of buckets when you create a table. StarRocks automatically sets the number of buckets. If you want to set this parameter, see [Set the number of buckets](../../../table_design/Data_distribution.md#set-the-number-of-buckets). For more information, see [Random bucketing](../../../table_design/Data_distribution.md#random-bucketing-since-v31). @@ -656,7 +656,7 @@ PROPERTIES ( > **NOTE** > - > To enable the local disk cache, you must specify the directory of the disk in the BE configuration item `storage_root_path`. For more information, see [BE Configuration items](../../../administration/management/BE_configuration.md). + > To enable the local disk cache, you must specify the directory of the disk in the BE configuration item `storage_root_path`. - `datacache.partition_duration`: The validity duration of the hot data. When the local disk cache is enabled, all data is loaded into the cache. When the cache is full, StarRocks deletes the less recently used data from the cache. When a query needs to scan the deleted data, StarRocks checks if the data is within the duration of validity. If the data is within the duration, StarRocks loads the data into the cache again. If the data is not within the duration, StarRocks does not load it into the cache. This property is a string value that can be specified with the following units: `YEAR`, `MONTH`, `DAY`, and `HOUR`, for example, `7 DAY` and `12 HOUR`. If it is not specified, all data is cached as the hot data. @@ -675,7 +675,7 @@ PROPERTIES ( > **NOTE** > > - This parameter is supported for shared-nothing clusters since v3.2.0, and shared-data clusters since v3.3.0. - > - If you need to configure fast schema evolution at the cluster level, such as disabling fast schema evolution within the StarRocks cluster, you can set the FE dynamic parameter [`enable_fast_schema_evolution`](../../../administration/management/FE_configuration.md#enable_fast_schema_evolution). + > - If you need to configure fast schema evolution at the cluster level, such as disabling fast schema evolution within the StarRocks cluster, you can set the FE dynamic parameter `enable_fast_schema_evolution`. ## Examples diff --git a/docs/en/table_design/Data_distribution.md b/docs/en/table_design/Data_distribution.md index 93e800ec08f7d..10f98e7d4e118 100644 --- a/docs/en/table_design/Data_distribution.md +++ b/docs/en/table_design/Data_distribution.md @@ -2,6 +2,7 @@ displayed_sidebar: docs toc_max_heading_level: 4 description: Partition and bucket data +sidebar_position: 30 --- # Data distribution @@ -584,7 +585,7 @@ However, note that if you query massive amounts of data and frequently use certa - You can only use random bucketing to create a Duplicate Key table. - You cannot specify a table bucketed randomly to belong to a [Colocation Group](../using_starrocks/Colocate_join.md). -- [Spark Load](../loading/SparkLoad.md) cannot be used to load data into tables bucketed randomly. +- Spark Load cannot be used to load data into tables bucketed randomly. In the following CREATE TABLE example, the `DISTRIBUTED BY xxx` statement is not used, so StarRocks uses random bucketing by default, and automatically sets the number of buckets. diff --git a/docs/en/table_design/StarRocks_table_design.md b/docs/en/table_design/StarRocks_table_design.md index 057d86bed9586..d3695e58a0159 100644 --- a/docs/en/table_design/StarRocks_table_design.md +++ b/docs/en/table_design/StarRocks_table_design.md @@ -1,5 +1,6 @@ --- displayed_sidebar: docs +sidebar_position: 10 --- # Table overview diff --git a/docs/en/table_design/Temporary_partition.md b/docs/en/table_design/Temporary_partition.md index 57c3a6084eb19..b909807ea2e51 100644 --- a/docs/en/table_design/Temporary_partition.md +++ b/docs/en/table_design/Temporary_partition.md @@ -1,5 +1,6 @@ --- displayed_sidebar: docs +sidebar_position: 40 --- # Temporary partition diff --git a/docs/en/table_design/catalog_db_tbl.md b/docs/en/table_design/catalog_db_tbl.md index d83c93e40c5e9..48250b60fbb87 100644 --- a/docs/en/table_design/catalog_db_tbl.md +++ b/docs/en/table_design/catalog_db_tbl.md @@ -52,6 +52,6 @@ Views, or logical views, are virtual tables that do not hold any data. Views onl Privileges determine which users can perform which operations on which objects. StarRocks adopts two types of privilege models: identity-based access control and role-based access control. You can first assign privileges to roles, and then assign roles to users. In this case, privileges are passed to users through roles. Or, you can directly assign privileges to user identities. -## [Data storage in storage-compute separation architecture](../introduction/Architecture.md#storage-compute-separation) +## Data storage in storage-compute separation architecture Since v3.0, StarRocks introduces the new storage-compute separation (shared-data) architecture. Data storage is separated from BEs. Data is persistently stored in remote object storage or HDFS, while local disks are used for caching hot data to accelerate queries. diff --git a/docs/en/table_design/data_compression.md b/docs/en/table_design/data_compression.md index 4331d4824a3c6..9af57147ad021 100644 --- a/docs/en/table_design/data_compression.md +++ b/docs/en/table_design/data_compression.md @@ -1,5 +1,6 @@ --- displayed_sidebar: docs +sidebar_position: 50 --- # Data compression diff --git a/docs/en/table_design/dynamic_partitioning.md b/docs/en/table_design/dynamic_partitioning.md index f8fb2b6e33896..4a8a7c80593c2 100644 --- a/docs/en/table_design/dynamic_partitioning.md +++ b/docs/en/table_design/dynamic_partitioning.md @@ -1,5 +1,6 @@ --- displayed_sidebar: docs +sidebar_position: 30 --- # Dynamic partitioning diff --git a/docs/en/table_design/expression_partitioning.md b/docs/en/table_design/expression_partitioning.md index 2a9add05bc820..fcc07a76c7aa4 100644 --- a/docs/en/table_design/expression_partitioning.md +++ b/docs/en/table_design/expression_partitioning.md @@ -1,6 +1,7 @@ --- displayed_sidebar: docs description: Partition data in StarRocks +sidebar_position: 10 --- # Expression partitioning (recommended) diff --git a/docs/en/table_design/feature-support-data-distribution.md b/docs/en/table_design/feature-support-data-distribution.md index a4101ae724cb3..b5cde96541a45 100644 --- a/docs/en/table_design/feature-support-data-distribution.md +++ b/docs/en/table_design/feature-support-data-distribution.md @@ -1,6 +1,7 @@ --- displayed_sidebar: docs sidebar_label: "Feature Support" +sidebar_position: 50 --- # Feature Support: Data Distribution diff --git a/docs/en/table_design/hybrid_table.md b/docs/en/table_design/hybrid_table.md index 54d7d07231b8f..df1675cb0b0d5 100644 --- a/docs/en/table_design/hybrid_table.md +++ b/docs/en/table_design/hybrid_table.md @@ -1,5 +1,6 @@ --- displayed_sidebar: docs +sidebar_position: 60 --- # [Preview] Hybrid row-column storage diff --git a/docs/en/table_design/list_partitioning.md b/docs/en/table_design/list_partitioning.md index c4d3f87aa276b..09263c0a5392f 100644 --- a/docs/en/table_design/list_partitioning.md +++ b/docs/en/table_design/list_partitioning.md @@ -1,5 +1,6 @@ --- displayed_sidebar: docs +sidebar_position: 20 --- # List partitioning @@ -111,5 +112,5 @@ DISTRIBUTED BY HASH(`id`); - List partitioning does support dynamic partitioning and creating multiple partitions at a time. - Currently, StarRocks's shared-data mode does not support this feature. - When the `ALTER TABLE DROP PARTITION ;` statement is used to delete a partition created by using list partitioning, data in the partition is directly removed and cannot be recovered. -- Currently you cannot [backup and restore](../administration/management/management.mdx) partitions created by the list partitioning. +- Currently you cannot backup and restore partitions created by the list partitioning. - Currently, StarRocks does not support creating [asynchronous materialized views](../using_starrocks/Materialized_view.md) with base tables created with the list partitioning strategy. diff --git a/docs/en/using_starrocks/block_cache.md b/docs/en/using_starrocks/block_cache.md index bd0a698d6e3e5..90d8fbc4bbd4d 100644 --- a/docs/en/using_starrocks/block_cache.md +++ b/docs/en/using_starrocks/block_cache.md @@ -83,8 +83,6 @@ You can download the following Grafana Dashboard templates based on your StarRoc - [Dashboard template for StarRocks shared-data cluster on virtual machines](http://starrocks-thirdparty.oss-cn-zhangjiakou.aliyuncs.com/StarRocks-Shared_data-for-vm.json) - [Dashboard template for StarRocks shared-data cluster on Kubernetes](http://starrocks-thirdparty.oss-cn-zhangjiakou.aliyuncs.com/StarRocks-Shared_data-for-k8s.json) -For more instructions on deploying monitoring and alert services for StarRocks, see [Monitor and alert](../administration/management/monitoring/Monitor_and_Alert.md). - ### Important metrics #### fslib read io_latency diff --git a/docs/en/using_starrocks/data_lake_query_acceleration_with_materialized_views.md b/docs/en/using_starrocks/data_lake_query_acceleration_with_materialized_views.md index fc748865fad7e..97b87075139bb 100644 --- a/docs/en/using_starrocks/data_lake_query_acceleration_with_materialized_views.md +++ b/docs/en/using_starrocks/data_lake_query_acceleration_with_materialized_views.md @@ -180,7 +180,7 @@ In scenarios involving query rewriting, if you use a very complex query statemen ## Best practices -In real-world business scenarios, you can identify queries with high execution latency and resource consumption by analyzing audit logs or [big query logs](../administration/management/monitor_manage_big_queries.md#analyze-big-query-logs). You can further use [query profiles](../administration/query_profile_overview.md) to pinpoint the specific stages where the query is slow. The following sections provide instructions and examples on how to boost data lake query performance with materialized views. +In real-world business scenarios, you can identify queries with high execution latency and resource consumption by analyzing audit logs or big query logs. You can further use [query profiles](../administration/query_profile_overview.md) to pinpoint the specific stages where the query is slow. The following sections provide instructions and examples on how to boost data lake query performance with materialized views. ### Case One: Accelerate join calculation in data lake