Skip to content

Commit

Permalink
Merge pull request #1143 from yangj1211/stage_datalink
Browse files Browse the repository at this point in the history
add doc of stage and datalink
  • Loading branch information
yangj1211 authored Oct 28, 2024
2 parents ea227ad + 5d83a2f commit 6335646
Show file tree
Hide file tree
Showing 15 changed files with 533 additions and 479 deletions.
36 changes: 30 additions & 6 deletions docs/MatrixOne/Develop/export-data/select-into-outfile.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,15 @@ MatrixOne 支持以下两种方式导出数据:

本篇文档主要介绍如何使用 `SELECT INTO...OUTFILE` 导出数据。

使用 `SELECT...INTO OUTFILE` 语法可以将表数据导出到主机上的文本文件中
使用 `SELECT...INTO OUTFILE` 语法可以将表数据导出到主机上的文本文件或者 stage 中

## 语法结构

`SELECT...INTO OUTFILE` 语法是 `SELECT` 语法和 `INTO OUTFILE filename` 的结合。默认输出格式与 `LOAD DATA` 命令相同。因此,以下语句是将名称为 **test** 的表导出到目录路径为 **/root/test***. csv* 文件中。
`SELECT...INTO OUTFILE` 语法是 `SELECT` 语法和 `INTO OUTFILE filename` 的结合。默认输出格式与 `LOAD DATA` 命令相同。

```
mysql> SELECT * FROM TEST
-> INTO OUTFILE '/root/test.csv';
mysql> SELECT * FROM <table_name>
-> INTO OUTFILE '<filepath>|<stage://stage_name>';
```

你可以采用多种形式和选项更改输出格式,用于表示如何引用、分隔列和记录。
Expand Down Expand Up @@ -70,7 +70,11 @@ sudo docker run --name <name> --privileged -d -p 6001:6001 -v ${local_data_path}
+------+-----------+------+
```

2. 对于使用源代码或二进制文件的方式安装构建 MatrixOne,将表导出到本地目录,例如 *~/tmp/export_demo/export_datatable.txt*,命令示例如下:
2. 数据导出

- 导出到本地

对于使用源代码或二进制文件的方式安装构建 MatrixOne,将表导出到本地目录,例如 *~/tmp/export_demo/export_datatable.txt*,命令示例如下:

```
select * from user into outfile '~/tmp/export_demo/export_datatable.txt'
Expand All @@ -82,9 +86,29 @@ sudo docker run --name <name> --privileged -d -p 6001:6001 -v ${local_data_path}
select * from user into outfile 'mo-data/export_datatable.txt';
```

3. 到你本地 *export_datatable.txt* 文件下查看导出情况:
- 导出到 satge

```sql
create stage stage_fs url = 'file:///Users/admin/test';
select * from user into outfile 'stage://stage_fs/user.csv';
```

3. 查看导出情况:

- 导出到本地

```
(base) admin@192 test % cat export_datatable.txt
id,user_name,sex
1,"weder","man"
2,"tom","man"
3,"wederTom","man"
```

- 导出到 stage

```bash
(base) admin@192 test % cat user.csv
id,user_name,sex
1,"weder","man"
2,"tom","man"
Expand Down
291 changes: 0 additions & 291 deletions docs/MatrixOne/Develop/import-data/bulk-load/1.1-load-s3.md

This file was deleted.

60 changes: 34 additions & 26 deletions docs/MatrixOne/Develop/import-data/bulk-load/load-csv.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,37 +7,45 @@
- 场景一:数据文件与 MatrixOne 服务器在同一台机器上:

```
LOAD DATA
INFILE 'file_name'
INTO TABLE tbl_name
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[PARALLEL {'TRUE' | 'FALSE'}]
> LOAD DATA
INFILE '<file_name>|<stage://stage_name/filepath>'
INTO TABLE tbl_name
[CHARACTER SET charset_name]
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ENCASPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[SET column_name_1=nullif(column_name_1, expr1), column_name_2=nullif(column_name_2, expr2)...]
[PARALLEL {'TRUE' | 'FALSE'}]
[STRICT {'TRUE' | 'FALSE'}]
```

- 场景二:数据文件与 MatrixOne 服务器在不同的机器上:

```
LOAD DATA LOCAL
INFILE 'file_name'
INTO TABLE tbl_name
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[PARALLEL {'TRUE' | 'FALSE'}]
> LOAD DATA LOCAL
INFILE '<file_name>|<stage://stage_name/filepath>'
INTO TABLE tbl_name
[CHARACTER SET charset_name]
[{FIELDS | COLUMNS}
[TERMINATED BY 'string']
[[OPTIONALLY] ENCLOSED BY 'char']
[ENCASPED BY 'char']
]
[LINES
[STARTING BY 'string']
[TERMINATED BY 'string']
]
[IGNORE number {LINES | ROWS}]
[SET column_name_1=nullif(column_name_1, expr1), column_name_2=nullif(column_name_2, expr2)...]
[PARALLEL {'TRUE' | 'FALSE'}]
[STRICT {'TRUE' | 'FALSE'}]
```

## 开始前准备
Expand Down
37 changes: 37 additions & 0 deletions docs/MatrixOne/Reference/Data-Types/data-types.md
Original file line number Diff line number Diff line change
Expand Up @@ -573,4 +573,41 @@ mysql> select * from t1;
| [1, 2, 3] | [4, 5] |
+-----------+--------+
1 row in set (0.00 sec)
```

## Datalink 数据类型

|类型 | 解释 |
|------------|--------------------- |
|datalink | 用于存储指向文档 (例如 satge) 或文件链接的特殊数据类 |

### 示例

```sql
drop table test01;
create table test01 (col1 int, col2 datalink);
create stage stage01 url='file:///Users/admin/case/';
insert into test01 values (1, 'file:///Users/admin/case/t1.csv');
insert into test01 values (2, 'file:///Users/admin/case/t1.csv?size=2');
insert into test01 values (3, 'file:///Users/admin/case/t1.csv?offset=4');
insert into test01 values (4, 'file:///Users/admin/case/t1.csv?offset=4&size=2');
insert into test01 values (5, 'stage://stage01/t1.csv');
insert into test01 values (6, 'stage://stage01/t1.csv?size=2');
insert into test01 values (7, 'stage://stage01/t1.csv?offset=4');
insert into test01 values (8, 'stage://stage01/t1.csv?offset=4&size=2');

mysql> select * from test01;
+------+-------------------------------------------------+
| col1 | col2 |
+------+-------------------------------------------------+
| 1 | file:///Users/admin/case/t1.csv |
| 2 | file:///Users/admin/case/t1.csv?size=2 |
| 3 | file:///Users/admin/case/t1.csv?offset=4 |
| 4 | file:///Users/admin/case/t1.csv?offset=4&size=2 |
| 5 | stage://stage01/t1.csv |
| 6 | stage://stage01/t1.csv?size=2 |
| 7 | stage://stage01/t1.csv?offset=4 |
| 8 | stage://stage01/t1.csv?offset=4&size=2 |
+------+-------------------------------------------------+
8 rows in set (0.01 sec)
```
87 changes: 87 additions & 0 deletions docs/MatrixOne/Reference/Data-Types/datalink-type.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# DATALINK 类型

`DATALINK` 类型用于存储指向文档 (例如 satge) 或文件链接的特殊数据类型。它的主要目的是在数据库中存储文档的链接地址,而不是存储文档本身。这种类型可以应用于各种场景,特别是在处理大规模文档管理时,提供对文档的快捷访问,而不需要将文档实际存储在数据库中。

使用 `DATALINK` 数据类型可以:

- 节省存储空间:文档实际存储在外部存储中(例如对象存储系统),而数据库只保存链接。
- 方便的文档访问:通过存储链接,系统可以快速访问文档,无需额外的存储和处理。
- 提高数据操作效率:避免了直接在数据库中处理大文件,提高了数据操作的速度和效率。

## 插入 DATALINK 类型数据

**语法结构**

```
INSERT INTO TABLE_NAME VALUES ('<file://<path>/<filename>>|<stage://<stage_name>/<path>/<file_name>>?<offset=xx>&<size=xxx>')
```

**参数释义**

| 参数 | 说明 |
| ---- | ---- |
| file | 指向本地文件系统文件位置。|
| stage | 指向 stage 指向文件位置。|
| offset | 非必填。偏移量,表明读的内容的起点。|
| size | 非必填。指定读取内容的大小,单位为子节。|

## 读取 DATALINK 类型数据

如果要读 `DATALINK` 指向文件链接的数据,可以使用 [load_file](../../Reference/Functions-and-Operators/Other/load_file.md) 函数。

## 示例

`/Users/admin/case` 下有文件 `t1.csv`

```bash
(base) admin@192 case % cat t1.csv
this is a test message
```
```sql
drop table test01;
create table test01 (col1 int, col2 datalink);
create stage stage01 url='file:///Users/admin/case/';
insert into test01 values (1, 'file:///Users/admin/case/t1.csv');
insert into test01 values (2, 'file:///Users/admin/case/t1.csv?size=2');
insert into test01 values (3, 'file:///Users/admin/case/t1.csv?offset=4');
insert into test01 values (4, 'file:///Users/admin/case/t1.csv?offset=4&size=2');
insert into test01 values (5, 'stage://stage01/t1.csv');
insert into test01 values (6, 'stage://stage01/t1.csv?size=2');
insert into test01 values (7, 'stage://stage01/t1.csv?offset=4');
insert into test01 values (8, 'stage://stage01/t1.csv?offset=4&size=2');

mysql> select * from test01;
+------+-------------------------------------------------+
| col1 | col2 |
+------+-------------------------------------------------+
| 1 | file:///Users/admin/case/t1.csv |
| 2 | file:///Users/admin/case/t1.csv?size=2 |
| 3 | file:///Users/admin/case/t1.csv?offset=4 |
| 4 | file:///Users/admin/case/t1.csv?offset=4&size=2 |
| 5 | stage://stage01/t1.csv |
| 6 | stage://stage01/t1.csv?size=2 |
| 7 | stage://stage01/t1.csv?offset=4 |
| 8 | stage://stage01/t1.csv?offset=4&size=2 |
+------+-------------------------------------------------+
8 rows in set (0.01 sec)

mysql> select col1, load_file(col2) from test01;
+------+-------------------------+
| col1 | load_file(col2) |
+------+-------------------------+
| 1 | this is a test message
|
| 2 | th |
| 3 | is a test message
|
| 4 | i |
| 5 | this is a test message
|
| 6 | th |
| 7 | is a test message
|
| 8 | i |
+------+-------------------------+
8 rows in set (0.01 sec)
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# **LOAD_FILE()**

## **函数说明**

`LOAD_FILE()` 函数用于读取 datalink 类型指向文件的内容。

## **函数语法**

```
>LOAD_FILE(datalink_type_data) ;
```

## **参数释义**

| 参数 | 说明 |
| ---- | ---- |
| datalink_type_data | datalink 类型数据,可以使用[cast()](../../../Reference/Operators/operators/cast-functions-and-operators/cast/)函数进行转换|

## 示例

`/Users/admin/case` 下有文件 `t1.csv`

```bash
(base) admin@192 case % cat t1.csv
this is a test message
```
```sql
create table t1 (col1 int, col2 datalink);
create stage stage1 url='file:///Users/admin/case/';
insert into t1 values (1, 'file:///Users/admin/case/t1.csv');
insert into t1 values (2, 'stage://stage1//t1.csv');

mysql> select * from t1;
+------+---------------------------------+
| col1 | col2 |
+------+---------------------------------+
| 1 | file:///Users/admin/case/t1.csv |
| 2 | stage://stage1//t1.csv |
+------+---------------------------------+
2 rows in set (0.00 sec)

mysql> select col1, load_file(col2) from t1;
+------+-------------------------+
| col1 | load_file(col2) |
+------+-------------------------+
| 1 | this is a test message
|
| 2 | this is a test message
|
+------+-------------------------+
2 rows in set (0.01 sec)


mysql> select load_file(cast('file:///Users/admin/case/t1.csv' as datalink));
+--------------------------------------------------------------+
| load_file(cast(file:///Users/admin/case/t1.csv as datalink)) |
+--------------------------------------------------------------+
| this is a test message
|
+--------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select load_file(cast('stage://stage1//t1.csv' as datalink));
+-----------------------------------------------------+
| load_file(cast(stage://stage1//t1.csv as datalink)) |
+-----------------------------------------------------+
| this is a test message
|
+-----------------------------------------------------+
1 row in set (0.00 sec)
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# **SAVE_FILE()**

## **函数说明**

`SAVE_FILE()` 函数用于向 datalink 指向文件中写入内容,指行返回写入内容字节长度。

## **函数语法**

```
>SAVE_FILE(datalink_type_data,content) ;
```

## **参数释义**

| 参数 | 说明 |
| ---- | ---- |
| datalink_type_data | datalink 类型数据,可以使用[cast()](../../../Reference/Operators/operators/cast-functions-and-operators/cast/)函数进行转换|
| content | 需要写入 datalink 指向文件的内容|

## 示例

```
drop stage if exists tab1;
create stage stage01 url='file:///Users/admin/case/';
mysql> select save_file(cast('stage://stage01/test.csv' as datalink), 'this is a test message');
+-------------------------------------------------------------------------------+
| save_file(cast(stage://stage01/test.csv as datalink), this is a test message) |
+-------------------------------------------------------------------------------+
| 22 |
+-------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> select save_file(cast('file:///Users/admin/case/test1.csv' as datalink), 'this is another test message');
+-----------------------------------------------------------------------------------------------+
| save_file(cast(file:///Users/admin/case/test1.csv as datalink), this is another test message) |
+-----------------------------------------------------------------------------------------------+
| 28 |
+-----------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
```

```bash
(base) admin@192 case % cat test.csv
this is a test message

(base) admin@192 case % cat test1.csv
this is another test message
```
Loading

0 comments on commit 6335646

Please sign in to comment.