Skip to content

Commit

Permalink
add cpu_watcher_vis_guide && libbpf_sar
Browse files Browse the repository at this point in the history
  • Loading branch information
albertxu216 committed Apr 12, 2024
1 parent b980974 commit 1bf1b8e
Show file tree
Hide file tree
Showing 28 changed files with 249 additions and 0 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
# cpu_wacther的可视化

## 1.配置环境

在使用cpu_watcher可视化之前,请先配置docker、go的环境,具体配置方法可参考:

### 1.1 docker

先参考该链接进行docker-desktop的安装:

* [在 Ubuntu 上安装 Docker Desktop |Docker 文档](https://docs.docker.com/desktop/install/ubuntu/#install-docker-desktop)

在准备启动docker-desktop时,可能遇到打不开docker-desktop的情况,如下所示:

![image1](image/image1.png)

这是因为虚拟机暂时不支持虚拟化,可以先关闭虚拟机,重新编辑虚拟机设置,开启虚拟化引擎的两个选项,再开机配置kvm;

* [在 Linux 上安装 Docker Desktop |Docker 文档](https://docs.docker.com/desktop/install/linux-install/)

![image2](image/image2.png)

### 1.2 go环境:

本可视化功能对go的版本有要求,请安装go1.19+版本,具体安装流程可参考如下链接:

* [go:快速升级Go版本,我几分钟就解决了_go 升级版本-CSDN博客](https://blog.csdn.net/m0_37482190/article/details/128673828)

## 2.使用cpuwatcher 可视化

* 首先先进入lmp目录下的lmp/eBPF_Supermarket/CPU_Subsystem/cpu_watcher文件夹

```BASH
cd lmp/eBPF_Supermarket/CPU_Subsystem/cpu_watcher
```

在该目录下 进行make编译

```bash
make -j 20
```

* 在lmp目录下的eBPF_Visualization/eBPF_prometheus文件夹下

* 执行`make`指令,编译可视化的go语言工具

在执行make指令时,如果出现如下报错,是因为go包管理代理网址无法访问`proxy.golang.org`

```bash
go: golang.org/x/[email protected]: Get "https://proxy.golang.org/golang.org/x/exp/@v/v0.0.0-20190731235908-ec7cb31e5a56.mod": dial tcp 172.217.160.113:443: i/o timeout
```

只需要换一个国内能访问的2代理地址即可

```bash
go env -w GOPROXY=https://goproxy.cn
```

* 执行`make start_service`指令,配置下载docker镜像并启动grafana和prometheus服务

* 执行如下指令开始采集数据以及相关处理:

```bash
./data-visual collect /home/zhang/lmp/eBPF_Supermarket/CPU_Subsystem/cpu_watcher/cpu_watcher -s
```

* 在网页打开网址:http://192.168.159.128:8090/metrics 此处为`localhost:8090/metrics`,便可以看到暴露在http网页中的数据;

![image3](image/image3.png)

* 在网页打开网址:http://192.168.159.128:3000/ 即可进入grafana服务,使用初始密码登录(user:admin pswd: admin)进入管理界面:

- 点击【Home-Connection-Add new connection】,选择Prometheus,建立与Prometheus服务器的连接:

![image4](image/image4.png)

这个172.17.0.1表示docker0网桥的 IPv4 地址。在 Docker 中,通常会将docker0的第一个 IP 地址分配给Docker主机自身。因此,172.17.0.1是 Docker主机上Docker守护进程的 IP 地址,所以在Grafana数据源这块设置成[http://172.17.0.1:9090](http://172.17.0.1:9090/) ,然后点击下面的【Save & test】按钮

- 进入可视化配置界面:

![image4.5](image/image4.5.png)
![image5](image/image5.png)

- 在下方处进行如图所示的配置,点击Run queries即可以可视化的方式监控avg_delay字段的数据:

![image6](image/image6.png)

## 3.cpu_watcher各子工具可视化输出

本次可视化输出样例,是在对比系统正常运行和高负载运行时本工具的使用情况,使用stress加压工具对cpu进行持续5min的加压

```bash
stress --cpu 8 --timeout 300s
```

### 3.1 cpu_watcher -s

**【irq Time】可视化输出结果**

![image7](image/image7.png)

**【softirq Time】可视化输出结果**

![image8](image/image8.png)
**【cswch】可视化输出结果**

![image9](image/image9.png)
**【proc】可视化输出结果**

![image10](image/image10.png)

**【Kthread】可视化输出结果**

![image11](image/image11.png)

**【idle】可视化输出结果**

![image12](image/image12.png)

**【sys】可视化输出结果**

![image-20240411132742107](image/image-20240411132742107.png)

**【sysc】可视化输出结果**

![image-20240411132807253](image/image-20240411132807253.png)

**【utime】可视化输出结果**

![image-20240411132842070](image/image-20240411132842070.png)

**【cpu处于不同状态对比图】可视化输出结果**

![image-20240411132914396](image/image-20240411132914396.png)

### 3.2 cpu_watcher -c

**【cs_delay】可视化输出结果**

![image-20240411133505763](image/image-20240411133505763.png)

### 3.3 cpu_watcher -d

**【schedule_delay】可视化输出结果**

【max_delay】

![image-20240411133841698](image/image-20240411133841698.png)

【avg_delay】

![image-20240411135159178](image/image-20240411135159178.png)

【min_delay】

![image-20240411135335523](image/image-20240411135335523.png)

### 3.4 cpu_watcher -p

**【preempt】可视化输出结果**

![image-20240411142421440](image/image-20240411142421440.png)

## 3.5 cpu_watcher -S

**【syscall_delay】可视化输出结果**

![image-20240411144331888](image/image-20240411144331888.png)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
81 changes: 81 additions & 0 deletions eBPF_Supermarket/CPU_Subsystem/cpu_watcher/docs/libbpf_sar.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# libbpf_sar功能介绍:

libbpf_sar是cpu_watcher工具中的一个子工具,通过cpu_watcher -s使用;

## 1.应用场景及意义

libbpf_sar是一个基于eBPF的按照指定时间间隔(默认为1s)来统计特定事件发生次数和特定事件占用CPU时间的工具。使用它可以帮助您查看事件发生速率和CPU资源利用率,并提供CPU负载画像以剖析负载的来源及其占用比例。

与传统工具相比,libbpf_sar可提供更为细致的指标,如:

* 1.可把内核态时间剖析为内核线程执行时间sar和进程系统调用时间

* 2.部分linux发行版可能由于内核编译选项确实而不能记录irq时间,本工具可以弥补这一缺陷,并且不需要对内核做出任何更改,可动态检测

3.可附加到指定进程上,对该进程占用的CPU资源进行实时监测

## 2.性能参数及观测意义

在 Linux 中,与 CPU 相关的状态可以分为很多类,如用户态、系统调用、硬中断以及软中断等,同时还有一些内核线程执行的状态,如 kthread,idle 进程。

同时,也有一些对 CPU 相当重要的事件,如新进程创建,进程切换计数,运行队 列长度等,对这些细分事件进行准确的监测,有利于我们剖析出 CPU 负载根源, 并定位出性能瓶颈。

libbpf_sar主要是对 CPU 上的关键事件提供的具体信息参数与含义如表3-6所示:

libbpf_sar 实例采集信息及含义

| **性能指标** | **含义** |
| ------------ | -------------------------------- |
| Proc | 新创建的进程数 |
| Cswch | 进程切换计数 |
| runqlen | 运行队列长度 |
| irqTime | 硬中断时间 |
| Softirq | 软中断时间 |
| Idle | Idle 进程运行时间 |
| Sysc | 加上内核线程运行时间的系统态时间 |
| Utime | 用户态执行时间 |
| sys | 系统调用执行时间 |

本实例采集到的信息将系统的 CPU 进行了精准分类,并且也统计了关键事件的触发频率,对于系统的性能分析有重要的意义

## 3.输出格式:

```bash
time proc/s cswch/s irqTime/us softirq/us idle/ms kthread/us sysc/ms utime/ms sys/ms
15:55:43 48 1389 1646 8866 6811 3243 688 717 691
15:55:44 31 1089 1587 7375 6759 1868 659 707 660
15:55:45 47 1613 1685 8885 6792 3268 796 828 799
15:55:46 0 2133 5938 7797 7643 8106 8 20 17
15:55:47 1 3182 5128 14279 6644 4883 314 363 319
15:55:48 0 1815 1773 11329 6753 4286 282 313 287
15:55:49 31 1249 1605 9859 6752 4442 545 585 549
15:55:50 47 1601 1712 11348 6765 6249 210 242 216
15:55:51 0 1238 10591 12709 6802 13896 238 262 252
15:55:52 0 1145 1658 10000 6863 4593 308 333 313
15:55:53 0 1317 1587 9090 6798 4699 383 414 387
15:55:54 31 1254 1531 9570 6755 4252 381 414 385
15:55:55 47 1547 1624 10985 6769 6516 344 373 350
15:55:56 0 1064 2187 9892 6851 4585 189 212 194
```

* proc/s 列的数据表示 CPU 每秒新创建的进程数;
* cswch/s 列的数据表示 CPU 每秒进程切换的数量;
* runqlen 列的数据表示 CPU 运行队列的长度;
* irqTime/us 列的数据表示 CPU 处理硬中断的时间,以 us 计时;
* softirq/s 列的数据表示 CPU 每秒处理软中断的时间,以 us 计时;
* idle/ms 列的数据表示 系统处于空闲态的时间;
* kthread/us 列的数据表示系统中内核线程执行的时间;

* sysc/ms 表示系统中内核线程外加系统调用处理的总时间;
* utime/ms 表示进程在用户态执行的总时间;
* sys/ms 表示系统调用执行的总时间。

事件的统计方式是对每个CPU分开统计然后求和,因此有的项的占用时间可能超过1s。所有事件占用时间的和应当为1s * 总CPU核数。由于技术问题,输出的第一项可能偏差较大,可以不予理会。按Ctrl+C本程序将停止统计,并输出在程序运行的时间段内各表项的平均值按照2s的采样频率显示各事件的CPU占用率。数据是带颜色的,蓝色表示CPU占比小于30%,绿色表示占比大于30%小于60%,红色表示占比大于60%。



## 4.数据可视化

![image-20240411160509242](image/image-20240411160509242.png)
![image-20240411170250839](image/image-20240411170250839.png)
![image-20240411170311182](image/image-20240411170311182.png)

0 comments on commit 1bf1b8e

Please sign in to comment.