Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quantile serias scalar function #2055

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,52 @@ specific language governing permissions and limitations
under the License.
-->

## Description
The `QUANTILE_PERCENT` function is used to calculate the quantile value for a given percentage. It takes two parameters: a quantile_state column and a constant floating-point number representing the percentage. The function returns a floating-point number that represents the quantile value at the given percentage position.
Comment on lines +25 to +26
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

章节之间加上空行,防止渲染错误

## Syntax
```
QUANTILE_PERCENT(<quantile_state>, <percent>)
```
Comment on lines +28 to +30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
```
QUANTILE_PERCENT(<quantile_state>, <percent>)
```
```sql
QUANTILE_PERCENT(<quantile_state>, <percent>)
```

## Parameters

| Parameter | Description |
| -- | -- |
| `<quantile_state>` | The target column.|
| `<percent>` | Target percent.|

## Return value

A `Double` type to represent quantile.

## Example

```sql
CREATE TABLE IF NOT EXISTS ${tableName_21} (
`dt` int(11) NULL COMMENT "",
`id` int(11) NULL COMMENT "",
`price` quantile_state QUANTILE_UNION NOT NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(`dt`, `id`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`dt`) BUCKETS 1
PROPERTIES ("replication_num" = "1");

INSERT INTO quantile_state_agg_test VALUES(20220201,0, to_quantile_state(1, 2048));

INSERT INTO quantile_state_agg_test VALUES(20220201,1, to_quantile_state(-1, 2048)),
(20220201,1, to_quantile_state(0, 2048)),(20220201,1, to_quantile_state(1, 2048)),
(20220201,1, to_quantile_state(2, 2048)),(20220201,1, to_quantile_state(3, 2048));

SELECT dt, id, quantile_percent(quantile_union(price), 0) FROM quantile_state_agg_test GROUP BY dt, id ORDER BY dt, id
--------------

+----------+------+--------------------------------------------+
| dt | id | quantile_percent(quantile_union(price), 0) |
+----------+------+--------------------------------------------+
| 20220201 | 0 | 1 |
| 20220201 | 1 | -1 |
+----------+------+--------------------------------------------+
2 rows in set (0.42 sec)
```
Comment on lines +61 to +71
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SELECT dt, id, quantile_percent(quantile_union(price), 0) FROM quantile_state_agg_test GROUP BY dt, id ORDER BY dt, id
--------------
+----------+------+--------------------------------------------+
| dt | id | quantile_percent(quantile_union(price), 0) |
+----------+------+--------------------------------------------+
| 20220201 | 0 | 1 |
| 20220201 | 1 | -1 |
+----------+------+--------------------------------------------+
2 rows in set (0.42 sec)
```
SELECT dt, id, quantile_percent(quantile_union(price), 0) FROM quantile_state_agg_test GROUP BY dt, id ORDER BY dt, id
```
```text
+----------+------+--------------------------------------------+
| dt | id | quantile_percent(quantile_union(price), 0) |
+----------+------+--------------------------------------------+
| 20220201 | 0 | 1 |
| 20220201 | 1 | -1 |
+----------+------+--------------------------------------------+
```



Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
{
"title": "quantile_percent",
"title": "quantile_state_empty",
"language": "en"
}
---
Expand All @@ -22,4 +22,30 @@ specific language governing permissions and limitations
under the License.
-->

## Description

Return an empty `quantile_state` type column.

## Syntax
```
QUANTILE_STATE_EMPTY()
```

## Return value

An empty `quantile_state` type column.

## Example

```sql
--------------
select quantile_percent(quantile_union(quantile_state_empty()), 0)
--------------

+-------------------------------------------------------------+
| quantile_percent(quantile_union(quantile_state_empty()), 0) |
+-------------------------------------------------------------+
| NULL |
+-------------------------------------------------------------+
1 row in set (0.12 sec)
```
Comment on lines +40 to +51
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

按照 quantile-percent 评论中的统一改一下 example 格式

  • 查询和结果要分开
  • 结果用 text 格式
  • 不要最后一行 x rows...

Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,52 @@ specific language governing permissions and limitations
under the License.
-->

## Description

This function converts numeric types to `QUANTILE_STATE` type. The compression parameter is optional and can be set in the range [2048, 10000]. The larger the value, the higher the accuracy of subsequent quantile approximation calculations, the greater the memory consumption, and the longer the calculation time. If the compression parameter is not specified or the value is set outside the range [2048, 10000], it runs with the default value of 2048.

## Syntax
```sql
TO_QUANTILE_STATE(<raw_data> <compression>)
```
## Parameters

| Parameter | Description |
| -- | -- |
| `<raw_data>` | The target column.|
| `<compression>` | Compression threshold.|

## Return value

The converted column of `QUANTILE_STATE` type.

## Example

```sql
CREATE TABLE IF NOT EXISTS ${tableName_21} (
`dt` int(11) NULL COMMENT "",
`id` int(11) NULL COMMENT "",
`price` quantile_state QUANTILE_UNION NOT NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(`dt`, `id`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`dt`) BUCKETS 1
PROPERTIES ("replication_num" = "1");

INSERT INTO quantile_state_agg_test VALUES(20220201,0, to_quantile_state(1, 2048));

INSERT INTO quantile_state_agg_test VALUES(20220201,1, to_quantile_state(-1, 2048)),
(20220201,1, to_quantile_state(0, 2048)),(20220201,1, to_quantile_state(1, 2048)),
(20220201,1, to_quantile_state(2, 2048)),(20220201,1, to_quantile_state(3, 2048));

SELECT dt, id, quantile_percent(quantile_union(price), 0) FROM quantile_state_agg_test GROUP BY dt, id ORDER BY dt, id
--------------

+----------+------+--------------------------------------------+
| dt | id | quantile_percent(quantile_union(price), 0) |
+----------+------+--------------------------------------------+
| 20220201 | 0 | 1 |
| 20220201 | 1 | -1 |
+----------+------+--------------------------------------------+
2 rows in set (0.42 sec)
```
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,52 @@ specific language governing permissions and limitations
under the License.
-->

## Description
`QUANTILE_PERCENT` 函数用于计算给定百分比的分位数值。它接受两个参数:一个 quantile_state 列和一个表示百分比的常量浮点数。该函数返回一个浮点数,表示给定百分比位置的分位数值。
## Syntax
```
QUANTILE_PERCENT(<quantile_state>, <percent>)
```
## Parameters

| 参数 | 描述 |
| -- | -- |
| `<quantile_state>` | 目标列。|
| `<percent>` | 目标百分比。|

## Return value

返回一个 `Double` 类型的分位数值。

## Example

```sql
CREATE TABLE IF NOT EXISTS ${tableName_21} (
`dt` int(11) NULL COMMENT "",
`id` int(11) NULL COMMENT "",
`price` quantile_state QUANTILE_UNION NOT NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(`dt`, `id`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`dt`) BUCKETS 1
PROPERTIES ("replication_num" = "1");

INSERT INTO quantile_state_agg_test VALUES(20220201,0, to_quantile_state(1, 2048));

INSERT INTO quantile_state_agg_test VALUES(20220201,1, to_quantile_state(-1, 2048)),
(20220201,1, to_quantile_state(0, 2048)),(20220201,1, to_quantile_state(1, 2048)),
(20220201,1, to_quantile_state(2, 2048)),(20220201,1, to_quantile_state(3, 2048));

SELECT dt, id, quantile_percent(quantile_union(price), 0) FROM quantile_state_agg_test GROUP BY dt, id ORDER BY dt, id
--------------

+----------+------+--------------------------------------------+
| dt | id | quantile_percent(quantile_union(price), 0) |
+----------+------+--------------------------------------------+
| 20220201 | 0 | 1 |
| 20220201 | 1 | -1 |
+----------+------+--------------------------------------------+
2 rows in set (0.42 sec)
```


Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
{
"title": "quantile_percent",
"title": "quantile_state_empty",
"language": "zh-CN"
}
---
Expand All @@ -22,4 +22,30 @@ specific language governing permissions and limitations
under the License.
-->

## Description

返回一个空的 `quantile_state` 类型列。

## Syntax
```
QUANTILE_STATE_EMPTY()
```

## Return value

一个空的 `quantile_state` 类型列。

## Example

```sql
--------------
select quantile_percent(quantile_union(quantile_state_empty()), 0)
--------------

+-------------------------------------------------------------+
| quantile_percent(quantile_union(quantile_state_empty()), 0) |
+-------------------------------------------------------------+
| NULL |
+-------------------------------------------------------------+
1 row in set (0.12 sec)
```
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,54 @@ specific language governing permissions and limitations
under the License.
-->

## Description

此函数将数值类型转化成 `QUANTILE_STATE` 类型。 compression 参数是可选项,可设置范围是[2048, 10000],值越大,后续分位数近似计算的精度越高,内存消耗越大,计算耗时越长。 compression 参数未指定或设置的值在[2048, 10000]范围外,以 2048 的默认值运行

## Syntax
```sql
TO_QUANTILE_STATE(<raw_data> <compression>)
```
## Parameters

| 参数 | 描述 |
| -- | -- |
| `<raw_data>` | 目标列。|
| `<compression>` | 压缩阈值。|

## Return value

转换之后的 `QUANTILE_STATE` 类型的列。

## Example

```sql
CREATE TABLE IF NOT EXISTS ${tableName_21} (
`dt` int(11) NULL COMMENT "",
`id` int(11) NULL COMMENT "",
`price` quantile_state QUANTILE_UNION NOT NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(`dt`, `id`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`dt`) BUCKETS 1
PROPERTIES ("replication_num" = "1");

INSERT INTO quantile_state_agg_test VALUES(20220201,0, to_quantile_state(1, 2048));

INSERT INTO quantile_state_agg_test VALUES(20220201,1, to_quantile_state(-1, 2048)),
(20220201,1, to_quantile_state(0, 2048)),(20220201,1, to_quantile_state(1, 2048)),
(20220201,1, to_quantile_state(2, 2048)),(20220201,1, to_quantile_state(3, 2048));

SELECT dt, id, quantile_percent(quantile_union(price), 0) FROM quantile_state_agg_test GROUP BY dt, id ORDER BY dt, id
--------------

+----------+------+--------------------------------------------+
| dt | id | quantile_percent(quantile_union(price), 0) |
+----------+------+--------------------------------------------+
| 20220201 | 0 | 1 |
| 20220201 | 1 | -1 |
+----------+------+--------------------------------------------+
2 rows in set (0.42 sec)
```


Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,52 @@ specific language governing permissions and limitations
under the License.
-->

## Description
`QUANTILE_PERCENT` 函数用于计算给定百分比的分位数值。它接受两个参数:一个 quantile_state 列和一个表示百分比的常量浮点数。该函数返回一个浮点数,表示给定百分比位置的分位数值。
## Syntax
```
QUANTILE_PERCENT(<quantile_state>, <percent>)
```
## Parameters

| 参数 | 描述 |
| -- | -- |
| `<quantile_state>` | 目标列。|
| `<percent>` | 目标百分比。|

## Return value

返回一个 `Double` 类型的分位数值。

## Example

```sql
CREATE TABLE IF NOT EXISTS ${tableName_21} (
`dt` int(11) NULL COMMENT "",
`id` int(11) NULL COMMENT "",
`price` quantile_state QUANTILE_UNION NOT NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(`dt`, `id`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`dt`) BUCKETS 1
PROPERTIES ("replication_num" = "1");

INSERT INTO quantile_state_agg_test VALUES(20220201,0, to_quantile_state(1, 2048));

INSERT INTO quantile_state_agg_test VALUES(20220201,1, to_quantile_state(-1, 2048)),
(20220201,1, to_quantile_state(0, 2048)),(20220201,1, to_quantile_state(1, 2048)),
(20220201,1, to_quantile_state(2, 2048)),(20220201,1, to_quantile_state(3, 2048));

SELECT dt, id, quantile_percent(quantile_union(price), 0) FROM quantile_state_agg_test GROUP BY dt, id ORDER BY dt, id
--------------

+----------+------+--------------------------------------------+
| dt | id | quantile_percent(quantile_union(price), 0) |
+----------+------+--------------------------------------------+
| 20220201 | 0 | 1 |
| 20220201 | 1 | -1 |
+----------+------+--------------------------------------------+
2 rows in set (0.42 sec)
```


Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
{
"title": "quantile_percent",
"title": "quantile_state_empty",
"language": "zh-CN"
}
---
Expand All @@ -22,4 +22,30 @@ specific language governing permissions and limitations
under the License.
-->

## Description

返回一个空的 `quantile_state` 类型列。

## Syntax
```
QUANTILE_STATE_EMPTY()
```

## Return value

一个空的 `quantile_state` 类型列。

## Example

```sql
--------------
select quantile_percent(quantile_union(quantile_state_empty()), 0)
--------------

+-------------------------------------------------------------+
| quantile_percent(quantile_union(quantile_state_empty()), 0) |
+-------------------------------------------------------------+
| NULL |
+-------------------------------------------------------------+
1 row in set (0.12 sec)
```
Loading