Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add doc of stage and datalink #1143

Merged
merged 1 commit into from
Oct 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 30 additions & 6 deletions docs/MatrixOne/Develop/export-data/select-into-outfile.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,15 @@ MatrixOne 支持以下两种方式导出数据:

本篇文档主要介绍如何使用 `SELECT INTO...OUTFILE` 导出数据。

使用 `SELECT...INTO OUTFILE` 语法可以将表数据导出到主机上的文本文件中
使用 `SELECT...INTO OUTFILE` 语法可以将表数据导出到主机上的文本文件或者 stage 中

## 语法结构

`SELECT...INTO OUTFILE` 语法是 `SELECT` 语法和 `INTO OUTFILE filename` 的结合。默认输出格式与 `LOAD DATA` 命令相同。因此,以下语句是将名称为 **test** 的表导出到目录路径为 **/root/test** 的*. csv* 文件中。
`SELECT...INTO OUTFILE` 语法是 `SELECT` 语法和 `INTO OUTFILE filename` 的结合。默认输出格式与 `LOAD DATA` 命令相同。

```
mysql> SELECT * FROM TEST
-> INTO OUTFILE '/root/test.csv';
mysql> SELECT * FROM <table_name>
-> INTO OUTFILE '<filepath>|<stage://stage_name>';
```

你可以采用多种形式和选项更改输出格式,用于表示如何引用、分隔列和记录。
Expand Down Expand Up @@ -70,7 +70,11 @@ sudo docker run --name <name> --privileged -d -p 6001:6001 -v ${local_data_path}
+------+-----------+------+
```

2. 对于使用源代码或二进制文件的方式安装构建 MatrixOne,将表导出到本地目录,例如 *~/tmp/export_demo/export_datatable.txt*,命令示例如下:
2. 数据导出

- 导出到本地

对于使用源代码或二进制文件的方式安装构建 MatrixOne,将表导出到本地目录,例如 *~/tmp/export_demo/export_datatable.txt*,命令示例如下:

```
select * from user into outfile '~/tmp/export_demo/export_datatable.txt'
Expand All @@ -82,9 +86,29 @@ sudo docker run --name <name> --privileged -d -p 6001:6001 -v ${local_data_path}
select * from user into outfile 'mo-data/export_datatable.txt';
```

3. 到你本地 *export_datatable.txt* 文件下查看导出情况:
- 导出到 satge

```sql
create stage stage_fs url = 'file:///Users/admin/test';
select * from user into outfile 'stage://stage_fs/user.csv';
```

3. 查看导出情况:

- 导出到本地

```
(base) admin@192 test % cat export_datatable.txt
id,user_name,sex
1,"weder","man"
2,"tom","man"
3,"wederTom","man"
```

- 导出到 stage

```bash
(base) admin@192 test % cat user.csv
id,user_name,sex
1,"weder","man"
2,"tom","man"
Expand Down
291 changes: 0 additions & 291 deletions docs/MatrixOne/Develop/import-data/bulk-load/1.1-load-s3.md

This file was deleted.

60 changes: 34 additions & 26 deletions docs/MatrixOne/Develop/import-data/bulk-load/load-csv.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,37 +7,45 @@
- 场景一:数据文件与 MatrixOne 服务器在同一台机器上:

```
LOAD DATA
INFILE 'file_name'
INTO TABLE tbl_name
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[PARALLEL {'TRUE' | 'FALSE'}]
> LOAD DATA
INFILE '<file_name>|<stage://stage_name/filepath>'
INTO TABLE tbl_name
[CHARACTER SET charset_name]
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ENCASPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[SET column_name_1=nullif(column_name_1, expr1), column_name_2=nullif(column_name_2, expr2)...]
[PARALLEL {'TRUE' | 'FALSE'}]
[STRICT {'TRUE' | 'FALSE'}]
```

- 场景二:数据文件与 MatrixOne 服务器在不同的机器上:

```
LOAD DATA LOCAL
INFILE 'file_name'
INTO TABLE tbl_name
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[PARALLEL {'TRUE' | 'FALSE'}]
> LOAD DATA LOCAL
INFILE '<file_name>|<stage://stage_name/filepath>'
INTO TABLE tbl_name
[CHARACTER SET charset_name]
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ENCASPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[SET column_name_1=nullif(column_name_1, expr1), column_name_2=nullif(column_name_2, expr2)...]
[PARALLEL {'TRUE' | 'FALSE'}]
[STRICT {'TRUE' | 'FALSE'}]
```

## 开始前准备
Expand Down
37 changes: 37 additions & 0 deletions docs/MatrixOne/Reference/Data-Types/data-types.md
Original file line number Diff line number Diff line change
Expand Up @@ -573,4 +573,41 @@ mysql> select * from t1;
| [1, 2, 3] | [4, 5] |
+-----------+--------+
1 row in set (0.00 sec)
```

## Datalink 数据类型

|类型 | 解释 |
|------------|--------------------- |
|datalink | 用于存储指向文档 (例如 satge) 或文件链接的特殊数据类 |

### 示例

```sql
drop table test01;
create table test01 (col1 int, col2 datalink);
create stage stage01 url='file:///Users/admin/case/';
insert into test01 values (1, 'file:///Users/admin/case/t1.csv');
insert into test01 values (2, 'file:///Users/admin/case/t1.csv?size=2');
insert into test01 values (3, 'file:///Users/admin/case/t1.csv?offset=4');
insert into test01 values (4, 'file:///Users/admin/case/t1.csv?offset=4&size=2');
insert into test01 values (5, 'stage://stage01/t1.csv');
insert into test01 values (6, 'stage://stage01/t1.csv?size=2');
insert into test01 values (7, 'stage://stage01/t1.csv?offset=4');
insert into test01 values (8, 'stage://stage01/t1.csv?offset=4&size=2');

mysql> select * from test01;
+------+-------------------------------------------------+
| col1 | col2 |
+------+-------------------------------------------------+
| 1 | file:///Users/admin/case/t1.csv |
| 2 | file:///Users/admin/case/t1.csv?size=2 |
| 3 | file:///Users/admin/case/t1.csv?offset=4 |
| 4 | file:///Users/admin/case/t1.csv?offset=4&size=2 |
| 5 | stage://stage01/t1.csv |
| 6 | stage://stage01/t1.csv?size=2 |
| 7 | stage://stage01/t1.csv?offset=4 |
| 8 | stage://stage01/t1.csv?offset=4&size=2 |
+------+-------------------------------------------------+
8 rows in set (0.01 sec)
```
87 changes: 87 additions & 0 deletions docs/MatrixOne/Reference/Data-Types/datalink-type.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# DATALINK 类型

`DATALINK` 类型用于存储指向文档 (例如 satge) 或文件链接的特殊数据类型。它的主要目的是在数据库中存储文档的链接地址,而不是存储文档本身。这种类型可以应用于各种场景,特别是在处理大规模文档管理时,提供对文档的快捷访问,而不需要将文档实际存储在数据库中。

使用 `DATALINK` 数据类型可以:

- 节省存储空间:文档实际存储在外部存储中(例如对象存储系统),而数据库只保存链接。
- 方便的文档访问:通过存储链接,系统可以快速访问文档,无需额外的存储和处理。
- 提高数据操作效率:避免了直接在数据库中处理大文件,提高了数据操作的速度和效率。

## 插入 DATALINK 类型数据

**语法结构**

```
INSERT INTO TABLE_NAME VALUES ('<file://<path>/<filename>>|<stage://<stage_name>/<path>/<file_name>>?<offset=xx>&<size=xxx>')
```

**参数释义**

| 参数 | 说明 |
| ---- | ---- |
| file | 指向本地文件系统文件位置。|
| stage | 指向 stage 指向文件位置。|
| offset | 非必填。偏移量,表明读的内容的起点。|
| size | 非必填。指定读取内容的大小,单位为子节。|

## 读取 DATALINK 类型数据

如果要读 `DATALINK` 指向文件链接的数据,可以使用 [load_file](../../Reference/Functions-and-Operators/Other/load_file.md) 函数。

## 示例

`/Users/admin/case` 下有文件 `t1.csv`

```bash
(base) admin@192 case % cat t1.csv
this is a test message
```

```sql
drop table test01;
create table test01 (col1 int, col2 datalink);
create stage stage01 url='file:///Users/admin/case/';
insert into test01 values (1, 'file:///Users/admin/case/t1.csv');
insert into test01 values (2, 'file:///Users/admin/case/t1.csv?size=2');
insert into test01 values (3, 'file:///Users/admin/case/t1.csv?offset=4');
insert into test01 values (4, 'file:///Users/admin/case/t1.csv?offset=4&size=2');
insert into test01 values (5, 'stage://stage01/t1.csv');
insert into test01 values (6, 'stage://stage01/t1.csv?size=2');
insert into test01 values (7, 'stage://stage01/t1.csv?offset=4');
insert into test01 values (8, 'stage://stage01/t1.csv?offset=4&size=2');

mysql> select * from test01;
+------+-------------------------------------------------+
| col1 | col2 |
+------+-------------------------------------------------+
| 1 | file:///Users/admin/case/t1.csv |
| 2 | file:///Users/admin/case/t1.csv?size=2 |
| 3 | file:///Users/admin/case/t1.csv?offset=4 |
| 4 | file:///Users/admin/case/t1.csv?offset=4&size=2 |
| 5 | stage://stage01/t1.csv |
| 6 | stage://stage01/t1.csv?size=2 |
| 7 | stage://stage01/t1.csv?offset=4 |
| 8 | stage://stage01/t1.csv?offset=4&size=2 |
+------+-------------------------------------------------+
8 rows in set (0.01 sec)

mysql> select col1, load_file(col2) from test01;
+------+-------------------------+
| col1 | load_file(col2) |
+------+-------------------------+
| 1 | this is a test message
|
| 2 | th |
| 3 | is a test message
|
| 4 | i |
| 5 | this is a test message
|
| 6 | th |
| 7 | is a test message
|
| 8 | i |
+------+-------------------------+
8 rows in set (0.01 sec)
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# **LOAD_FILE()**

## **函数说明**

`LOAD_FILE()` 函数用于读取 datalink 类型指向文件的内容。

## **函数语法**

```
>LOAD_FILE(datalink_type_data) ;
```

## **参数释义**

| 参数 | 说明 |
| ---- | ---- |
| datalink_type_data | datalink 类型数据,可以使用[cast()](../../../Reference/Operators/operators/cast-functions-and-operators/cast/)函数进行转换|

## 示例

`/Users/admin/case` 下有文件 `t1.csv`

```bash
(base) admin@192 case % cat t1.csv
this is a test message
```

```sql
create table t1 (col1 int, col2 datalink);
create stage stage1 url='file:///Users/admin/case/';
insert into t1 values (1, 'file:///Users/admin/case/t1.csv');
insert into t1 values (2, 'stage://stage1//t1.csv');

mysql> select * from t1;
+------+---------------------------------+
| col1 | col2 |
+------+---------------------------------+
| 1 | file:///Users/admin/case/t1.csv |
| 2 | stage://stage1//t1.csv |
+------+---------------------------------+
2 rows in set (0.00 sec)

mysql> select col1, load_file(col2) from t1;
+------+-------------------------+
| col1 | load_file(col2) |
+------+-------------------------+
| 1 | this is a test message
|
| 2 | this is a test message
|
+------+-------------------------+
2 rows in set (0.01 sec)


mysql> select load_file(cast('file:///Users/admin/case/t1.csv' as datalink));
+--------------------------------------------------------------+
| load_file(cast(file:///Users/admin/case/t1.csv as datalink)) |
+--------------------------------------------------------------+
| this is a test message
|
+--------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select load_file(cast('stage://stage1//t1.csv' as datalink));
+-----------------------------------------------------+
| load_file(cast(stage://stage1//t1.csv as datalink)) |
+-----------------------------------------------------+
| this is a test message
|
+-----------------------------------------------------+
1 row in set (0.00 sec)
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# **SAVE_FILE()**

## **函数说明**

`SAVE_FILE()` 函数用于向 datalink 指向文件中写入内容,指行返回写入内容字节长度。

## **函数语法**

```
>SAVE_FILE(datalink_type_data,content) ;
```

## **参数释义**

| 参数 | 说明 |
| ---- | ---- |
| datalink_type_data | datalink 类型数据,可以使用[cast()](../../../Reference/Operators/operators/cast-functions-and-operators/cast/)函数进行转换|
| content | 需要写入 datalink 指向文件的内容|

## 示例

```
drop stage if exists tab1;
create stage stage01 url='file:///Users/admin/case/';
mysql> select save_file(cast('stage://stage01/test.csv' as datalink), 'this is a test message');
+-------------------------------------------------------------------------------+
| save_file(cast(stage://stage01/test.csv as datalink), this is a test message) |
+-------------------------------------------------------------------------------+
| 22 |
+-------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select save_file(cast('file:///Users/admin/case/test1.csv' as datalink), 'this is another test message');
+-----------------------------------------------------------------------------------------------+
| save_file(cast(file:///Users/admin/case/test1.csv as datalink), this is another test message) |
+-----------------------------------------------------------------------------------------------+
| 28 |
+-----------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

```

```bash
(base) admin@192 case % cat test.csv
this is a test message

(base) admin@192 case % cat test1.csv
this is another test message
```
Loading
Loading