Skip to content

Commit

Permalink
docs: add doc for rule metrics (#2757)
Browse files Browse the repository at this point in the history
Signed-off-by: yisaer <[email protected]>
  • Loading branch information
Yisaer committed Apr 9, 2024
1 parent bab777a commit 4d3ae9f
Show file tree
Hide file tree
Showing 6 changed files with 46 additions and 2 deletions.
24 changes: 23 additions & 1 deletion docs/en_US/operation/usage/monitor_with_prometheus.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,25 @@ Prometheus is an open source system monitoring and alerting toolkit hosted at CN

eKuiper's rules are continuously running streaming task. Rules are used to process unbounded streams of data, and under normal circumstances, rules are started and run continuously, producing operational status data. Until the rule is stopped manually or after an unrecoverable error. eKuiper provides a status API to get the running metrics of the rules. At the same time, eKuiper integrates with Prometheus, making it easy to monitor various status metrics through the latter. This tutorial is intended for users who are already familiar with eKuiper and will introduce rule status metrics and how to monitor specific indicators via Prometheus.

## Prometheus Metrics

eKuiper exposes the following metrics to prometheus to reflect the current cluster status:

```text
kuiper_rule_status: The status showed status of each rule in eKuiper. 1 represents running, 0 represents paused, and -1 represents abnormal exit.
kuiper_rule_count: How many rules are running and how many rules are suspended in eKuiper.
```

## Rule Status Metrics

Once a rule has been created and run successfully using eKuiper, the user can view the rule's operational status metrics via the CLI, REST API or the management console. For example, for an existing rule1, you can get the rule run metrics in JSON format via `curl -X GET "http://127.0.0.1:9081/rules/rule1/status"`.

```json
{
"status": "running",
"lastStartTimestamp": "1712126817659",
"lastStopTimestamp": "0",
"nextStopTimestamp": "0",
"source_demo_0_records_in_total": 265,
"source_demo_0_records_out_total": 265,
"source_demo_0_process_latency_us": 0,
Expand Down Expand Up @@ -38,7 +50,9 @@ Once a rule has been created and run successfully using eKuiper, the user can vi
}
```

The rule status consists of two main parts, one is the status, which is used to indicate whether the rule is running properly or not, its value may be `running`, `stopped manually`, etc. The other part is the metrics for each operator of the rule. The operator of the rule is generated based on the SQL of the rule, which may be different for each rule. In this example, the rule SQL is the simplest `SELECT * FROM demo`, the action is MQTT, and the generated operators are [source_demo, op_project, sink_mqtt]. Each of these operators has the same kind of metrics, which together with the operator names form a single metric. For example, the metric for the number of records_in_total for the operator source_demo_0 is `source_demo_0_records_in_total`.
The rule status consists of two main parts, one is the status, which is used to indicate whether the rule is running properly or not, its value may be `running`, `stopped manually`, etc. And it contains the unix timestamp in milliseconds of when the rule was started and when it was paused.

The other part is the metrics for each operator of the rule. The operator of the rule is generated based on the SQL of the rule, which may be different for each rule. In this example, the rule SQL is the simplest `SELECT * FROM demo`, the action is MQTT, and the generated operators are [source_demo, op_project, sink_mqtt]. Each of these operators has the same kind of metrics, which together with the operator names form a single metric. For example, the metric for the number of records_in_total for the operator source_demo_0 is `source_demo_0_records_in_total`.

### Metric Types

Expand Down Expand Up @@ -133,6 +147,14 @@ eKuiper is predefined in the Grafana panel to help users more clearly and intuit
https://github.com/lf-edge/ekuiper/blob/master/metrics/metrics.json
```

You can view the historical status of the rule through the following panel. 1 means the rule is running, 0 means the rule is suspended normally, and -1 means the rule exited abnormally. The metric is `kuiper_rule_status`.

![rule status](./resources/ruleStatus.png)

You can view how many running rules and paused rules there are inside eKuiper through the following panel. The metric is `kuiper_rule_count`.

![rule count](./resources/ruleCount.png)

## Summary

This article introduced the rule metrics in eKuiper and how to use Prometheus to monitor these metrics. Users can further explore more advanced features of Prometheus based on this to improve eKuiper's operation and maintenance.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
24 changes: 23 additions & 1 deletion docs/zh_CN/operation/usage/monitor_with_prometheus.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,25 @@ Prometheus 是一个托管于 CNCF 的开源系统监控和警报工具包,许

eKuiper 的规则是一个持续运行的流式计算任务。规则用于处理无界的数据流,正常情况下,规则启动后会一直运行,不断产生运行状态数据。直到规则被手动停止或出现不可恢复的错误后停止。eKuiper 中的规则提供了状态 API,可获取规则的运行指标。同时,eKuiper 整合了 Prometheus,可方便地通过后者监控各种状态指标。本教程面向已经初步了解 eKuiper 的用户,将介绍规则状态指标以及如何通过 Prometheus 监控特定的指标。

## Prometheus 指标

eKuiper 向 prometheus 暴露了如下指标来反应当前集群的状态:

```text
kuiper_rule_status: eKuiper 中每条规则的状态指标,1代表运行,0代表暂停,-1代表异常退出。
kuiper_rule_count: eKuiper 中有多少条规则运行,多少条规则暂停。
```

## 规则状态指标

使用 eKuiper 创建规则并运行成功后,用户可以通过 CLI,REST API 或者管理控制台查看规则的运行状态指标。例如,已有规则 rule1,可通过 `curl -X GET "http://127.0.0.1:9081/rules/rule1/status"` 获取 JSON 格式的规则运行指标,如下所示:

```json
{
"status": "running",
"lastStartTimestamp": "1712126817659",
"lastStopTimestamp": "0",
"nextStopTimestamp": "0",
"source_demo_0_records_in_total": 265,
"source_demo_0_records_out_total": 265,
"source_demo_0_process_latency_us": 0,
Expand Down Expand Up @@ -38,7 +50,9 @@ eKuiper 的规则是一个持续运行的流式计算任务。规则用于处理
}
```

运行指标主要包括两个部分,一部分是 status,用于标示规则是否正常运行,其值可能为 `running``stopped manually` 等。另一部分为规则每个算子的运行指标。规则的算子根据规则的 SQL 生成,每个规则可能会有所不同。在此例中,规则 SQL 为最简单的 `SELECT * FROM demo`,action 为 MQTT,其生成的算子为 [source_demo,op_project,sink_mqtt] 3个。每一种算子都有相同数目的运行指标,与算子名字合起来构成一条指标。例如,算子 source_demo_0 的输入数量 records_in_total 的指标为 `source_demo_0_records_in_total`
运行指标主要包括两个部分,一部分是 status,用于标示规则是否正常运行,其值可能为 `running``stopped manually` 等, 以及包含规则是何时启动,何时暂停的 unix 时间戳,单位为毫秒。

另一部分为规则每个算子的运行指标。规则的算子根据规则的 SQL 生成,每个规则可能会有所不同。在此例中,规则 SQL 为最简单的 `SELECT * FROM demo`,action 为 MQTT,其生成的算子为 [source_demo,op_project,sink_mqtt] 3个。每一种算子都有相同数目的运行指标,与算子名字合起来构成一条指标。例如,算子 source_demo_0 的输入数量 records_in_total 的指标为 `source_demo_0_records_in_total`

### 运行指标

Expand Down Expand Up @@ -133,6 +147,14 @@ eKuiper 预定义了在 Grafana 面板用于帮助用户更加清晰直观的从
https://github.com/lf-edge/ekuiper/blob/master/metrics/metrics.json
```

你可以通过以下面板查看规则的历史状态,1 代表规则正在运行,0 代表规则正常暂停,-1 代表规则异常退出,指标名为 `kuiper_rule_status`。

![rule status](./resources/ruleStatus.png)

你可以通过以下面板查看 eKuiper 内部有多少正在运行的规则和暂停的规则,指标名为 `kuiper_rule_count`。

![rule count](./resources/ruleCount.png)

## 总结

本文介绍了 eKuiper 中的规则状态指标以及如何使用 Prometheus 简单地监控这些状态指标。用户朋友可以基于此进一步探索 Prometheus 的更多高级功能,更好地实现 eKuiper 的运维。
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 4d3ae9f

Please sign in to comment.