Skip to content

Commit 9166a16

Browse files
authored
New Intel PowerStat input plugin (influxdata#8488)
1 parent 99287d8 commit 9166a16

16 files changed

+2384
-0
lines changed

README.md

+1
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,7 @@ For documentation on the latest development code see the [documentation index][d
214214
* [influxdb](./plugins/inputs/influxdb)
215215
* [influxdb_listener](./plugins/inputs/influxdb_listener)
216216
* [influxdb_v2_listener](./plugins/inputs/influxdb_v2_listener)
217+
* [intel_powerstat](plugins/inputs/intel_powerstat)
217218
* [intel_rdt](./plugins/inputs/intel_rdt)
218219
* [internal](./plugins/inputs/internal)
219220
* [interrupts](./plugins/inputs/interrupts)

plugins/inputs/all/all.go

+1
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ import (
6363
_ "github.com/influxdata/telegraf/plugins/inputs/influxdb"
6464
_ "github.com/influxdata/telegraf/plugins/inputs/influxdb_listener"
6565
_ "github.com/influxdata/telegraf/plugins/inputs/influxdb_v2_listener"
66+
_ "github.com/influxdata/telegraf/plugins/inputs/intel_powerstat"
6667
_ "github.com/influxdata/telegraf/plugins/inputs/intel_rdt"
6768
_ "github.com/influxdata/telegraf/plugins/inputs/internal"
6869
_ "github.com/influxdata/telegraf/plugins/inputs/interrupts"
+206
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
# Intel PowerStat Input Plugin
2+
3+
Telemetry frameworks allow users to monitor critical platform level metrics.
4+
Key source of platform telemetry is power domain that is beneficial for MANO/Monitoring&Analytics systems
5+
to take preventive/corrective actions based on platform busyness, CPU temperature, actual CPU utilization
6+
and power statistics. Main use cases are power saving and workload migration.
7+
8+
Intel PowerStat plugin supports Intel based platforms and assumes presence of Linux based OS.
9+
10+
### Configuration:
11+
```toml
12+
# Intel PowerStat plugin enables monitoring of platform metrics (power, TDP) and per-CPU metrics like temperature, power and utilization.
13+
[[inputs.intel_powerstat]]
14+
## All global metrics are always collected by Intel PowerStat plugin.
15+
## User can choose which per-CPU metrics are monitored by the plugin in cpu_metrics array.
16+
## Empty array means no per-CPU specific metrics will be collected by the plugin - in this case only platform level
17+
## telemetry will be exposed by Intel PowerStat plugin.
18+
## Supported options:
19+
## "cpu_frequency", "cpu_busy_frequency", "cpu_temperature", "cpu_c1_state_residency", "cpu_c6_state_residency", "cpu_busy_cycles"
20+
# cpu_metrics = []
21+
```
22+
### Example: Configuration with no per-CPU telemetry
23+
This configuration allows getting global metrics (processor package specific), no per-CPU metrics are collected:
24+
```toml
25+
[[inputs.intel_powerstat]]
26+
cpu_metrics = []
27+
```
28+
29+
### Example: Configuration with no per-CPU telemetry - equivalent case
30+
This configuration allows getting global metrics (processor package specific), no per-CPU metrics are collected:
31+
```toml
32+
[[inputs.intel_powerstat]]
33+
```
34+
35+
### Example: Configuration for CPU Temperature and Frequency only
36+
This configuration allows getting global metrics plus subset of per-CPU metrics (CPU Temperature and Current Frequency):
37+
```toml
38+
[[inputs.intel_powerstat]]
39+
cpu_metrics = ["cpu_frequency", "cpu_temperature"]
40+
```
41+
42+
### Example: Configuration with all available metrics
43+
This configuration allows getting global metrics and all per-CPU metrics:
44+
```toml
45+
[[inputs.intel_powerstat]]
46+
cpu_metrics = ["cpu_frequency", "cpu_busy_frequency", "cpu_temperature", "cpu_c1_state_residency", "cpu_c6_state_residency", "cpu_busy_cycles"]
47+
```
48+
49+
### SW Dependencies:
50+
Plugin is based on Linux Kernel modules that expose specific metrics over `sysfs` or `devfs` interfaces.
51+
The following dependencies are expected by plugin:
52+
- _intel-rapl_ module which exposes Intel Runtime Power Limiting metrics over `sysfs` (`/sys/devices/virtual/powercap/intel-rapl`),
53+
- _msr_ kernel module that provides access to processor model specific registers over `devfs` (`/dev/cpu/cpu%d/msr`),
54+
- _cpufreq_ kernel module - which exposes per-CPU Frequency over `sysfs` (`/sys/devices/system/cpu/cpu%d/cpufreq/scaling_cur_freq`).
55+
56+
Minimum kernel version required is 3.13 to satisfy all requirements.
57+
58+
Please make sure that kernel modules are loaded and running. You might have to manually enable them by using `modprobe`.
59+
Exact commands to be executed are:
60+
```
61+
sudo modprobe cpufreq-stats
62+
sudo modprobe msr
63+
sudo modprobe intel_rapl
64+
```
65+
66+
**Telegraf with Intel PowerStat plugin enabled may require root access to read model specific registers (MSRs)**
67+
to retrieve data for calculation of most critical per-CPU specific metrics:
68+
- `cpu_busy_frequency_mhz`
69+
- `cpu_temperature_celsius`
70+
- `cpu_c1_state_residency_percent`
71+
- `cpu_c6_state_residency_percent`
72+
- `cpu_busy_cycles_percent`
73+
74+
To expose other Intel PowerStat metrics root access may or may not be required (depending on OS type or configuration).
75+
76+
### HW Dependencies:
77+
Specific metrics require certain processor features to be present, otherwise Intel PowerStat plugin won't be able to
78+
read them. When using Linux Kernel based OS, user can detect supported processor features reading `/proc/cpuinfo` file.
79+
Plugin assumes crucial properties are the same for all CPU cores in the system.
80+
The following processor properties are examined in more detail in this section:
81+
processor _cpu family_, _model_ and _flags_.
82+
The following processor properties are required by the plugin:
83+
- Processor _cpu family_ must be Intel (0x6) - since data used by the plugin assumes Intel specific
84+
model specific registers for all features
85+
- The following processor flags shall be present:
86+
- "_msr_" shall be present for plugin to read platform data from processor model specific registers and collect
87+
the following metrics: _powerstat_core.cpu_temperature_, _powerstat_core.cpu_busy_frequency_,
88+
_powerstat_core.cpu_busy_cycles_, _powerstat_core.cpu_c1_state_residency_, _powerstat_core._cpu_c6_state_residency_
89+
- "_aperfmperf_" shall be present to collect the following metrics: _powerstat_core.cpu_busy_frequency_,
90+
_powerstat_core.cpu_busy_cycles_, _powerstat_core.cpu_c1_state_residency_
91+
- "_dts_" shall be present to collect _powerstat_core.cpu_temperature_
92+
- Processor _Model number_ must be one of the following values for plugin to read _powerstat_core.cpu_c1_state_residency_
93+
and _powerstat_core.cpu_c6_state_residency_ metrics:
94+
95+
| Model number | Processor name |
96+
|-----|-------------|
97+
| 0x37 | Intel Atom® Bay Trail |
98+
| 0x4D | Intel Atom® Avaton |
99+
| 0x5C | Intel Atom® Apollo Lake |
100+
| 0x5F | Intel Atom® Denverton |
101+
| 0x7A | Intel Atom® Goldmont |
102+
| 0x4C | Intel Atom® Airmont |
103+
| 0x86 | Intel Atom® Jacobsville |
104+
| 0x96 | Intel Atom® Elkhart Lake |
105+
| 0x9C | Intel Atom® Jasper Lake |
106+
| 0x1A | Intel Nehalem-EP |
107+
| 0x1E | Intel Nehalem |
108+
| 0x1F | Intel Nehalem-G |
109+
| 0x2E | Intel Nehalem-EX |
110+
| 0x25 | Intel Westmere |
111+
| 0x2C | Intel Westmere-EP |
112+
| 0x2F | Intel Westmere-EX |
113+
| 0x2A | Intel Sandybridge |
114+
| 0x2D | Intel Sandybridge-X |
115+
| 0x3A | Intel Ivybridge |
116+
| 0x3E | Intel Ivybridge-X |
117+
| 0x4E | Intel Atom® Silvermont-MID |
118+
| 0x5E | Intel Skylake |
119+
| 0x55 | Intel Skylake-X |
120+
| 0x8E | Intel Kabylake-L |
121+
| 0x9E | Intel Kabylake |
122+
| 0x6A | Intel Icelake-X |
123+
| 0x6C | Intel Icelake-D |
124+
| 0x7D | Intel Icelake |
125+
| 0x7E | Intel Icelake-L |
126+
| 0x9D | Intel Icelake-NNPI |
127+
| 0x3C | Intel Haswell |
128+
| 0x3F | Intel Haswell-X |
129+
| 0x45 | Intel Haswell-L |
130+
| 0x46 | Intel Haswell-G |
131+
| 0x3D | Intel Broadwell |
132+
| 0x47 | Intel Broadwell-G |
133+
| 0x4F | Intel Broadwell-X |
134+
| 0x56 | Intel Broadwell-D |
135+
| 0x66 | Intel Cannonlake-L |
136+
| 0x57 | Intel Xeon® PHI Knights Landing |
137+
| 0x85 | Intel Xeon® PHI Knights Mill |
138+
| 0xA5 | Intel CometLake |
139+
| 0xA6 | Intel CometLake-L |
140+
| 0x8F | Intel Sapphire Rapids X |
141+
| 0x8C | Intel TigerLake-L |
142+
| 0x8D | Intel TigerLake |
143+
144+
### Metrics
145+
All metrics collected by Intel PowerStat plugin are collected in fixed intervals.
146+
Metrics that reports processor C-state residency or power are calculated over elapsed intervals.
147+
When starting to measure metrics, plugin skips first iteration of metrics if they are based on deltas with previous value.
148+
149+
**The following measurements are supported by Intel PowerStat plugin:**
150+
- powerstat_core
151+
152+
- The following Tags are returned by plugin with powerstat_core measurements:
153+
154+
| Tag | Description |
155+
|-----|-------------|
156+
| `package_id` | ID of platform package/socket |
157+
| `core_id` | ID of physical processor core |
158+
| `cpu_id` | ID of logical processor core |
159+
Measurement powerstat_core metrics are collected per-CPU (cpu_id is the key)
160+
while core_id and package_id tags are additional topology information.
161+
162+
- Available metrics for powerstat_core measurement
163+
164+
| Metric name (field) | Description | Units |
165+
|-----|-------------|-----|
166+
| `cpu_frequency_mhz` | Current operational frequency of CPU Core | MHz |
167+
| `cpu_busy_frequency_mhz` | CPU Core Busy Frequency measured as frequency adjusted to CPU Core busy cycles | MHz |
168+
| `cpu_temperature_celsius` | Current temperature of CPU Core | Celsius degrees |
169+
| `cpu_c1_state_residency_percent` | Percentage of time that CPU Core spent in C1 Core residency state | % |
170+
| `cpu_c6_state_residency_percent` | Percentage of time that CPU Core spent in C6 Core residency state | % |
171+
| `cpu_busy_cycles_percent` | CPU Core Busy cycles as a ratio of Cycles spent in C0 state residency to all cycles executed by CPU Core | % |
172+
173+
174+
175+
- powerstat_package
176+
177+
- The following Tags are returned by plugin with powerstat_package measurements:
178+
179+
| Tag | Description |
180+
|-----|-------------|
181+
| `package_id` | ID of platform package/socket |
182+
Measurement powerstat_package metrics are collected per processor package - _package_id_ tag indicates which
183+
package metric refers to.
184+
185+
- Available metrics for powerstat_package measurement
186+
187+
| Metric name (field) | Description | Units |
188+
|-----|-------------|-----|
189+
| `thermal_design_power_watts` | Maximum Thermal Design Power (TDP) available for processor package | Watts |
190+
| `current_power_consumption_watts` | Current power consumption of processor package | Watts |
191+
| `current_dram_power_consumption_watts` | Current power consumption of processor package DRAM subsystem | Watts |
192+
193+
194+
### Example Output:
195+
196+
```
197+
powerstat_package,host=ubuntu,package_id=0 thermal_design_power_watts=160 1606494744000000000
198+
powerstat_package,host=ubuntu,package_id=0 current_power_consumption_watts=35 1606494744000000000
199+
powerstat_package,host=ubuntu,package_id=0 current_dram_power_consumption_watts=13.94 1606494744000000000
200+
powerstat_core,core_id=0,cpu_id=0,host=ubuntu,package_id=0 cpu_frequency_mhz=1200.29 1606494744000000000
201+
powerstat_core,core_id=0,cpu_id=0,host=ubuntu,package_id=0 cpu_temperature_celsius=34i 1606494744000000000
202+
powerstat_core,core_id=0,cpu_id=0,host=ubuntu,package_id=0 cpu_c6_state_residency_percent=92.52 1606494744000000000
203+
powerstat_core,core_id=0,cpu_id=0,host=ubuntu,package_id=0 cpu_busy_cycles_percent=0.8 1606494744000000000
204+
powerstat_core,core_id=0,cpu_id=0,host=ubuntu,package_id=0 cpu_c1_state_residency_percent=6.68 1606494744000000000
205+
powerstat_core,core_id=0,cpu_id=0,host=ubuntu,package_id=0 cpu_busy_frequency_mhz=1213.24 1606494744000000000
206+
```

plugins/inputs/intel_powerstat/dto.go

+37
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
package intel_powerstat
2+
3+
type msrData struct {
4+
mperf uint64
5+
aperf uint64
6+
timeStampCounter uint64
7+
c3 uint64
8+
c6 uint64
9+
c7 uint64
10+
throttleTemp uint64
11+
temp uint64
12+
mperfDelta uint64
13+
aperfDelta uint64
14+
timeStampCounterDelta uint64
15+
c3Delta uint64
16+
c6Delta uint64
17+
c7Delta uint64
18+
readDate int64
19+
}
20+
21+
type raplData struct {
22+
dramCurrentEnergy float64
23+
socketCurrentEnergy float64
24+
socketEnergy float64
25+
dramEnergy float64
26+
readDate int64
27+
}
28+
29+
type cpuInfo struct {
30+
physicalID string
31+
coreID string
32+
cpuID string
33+
vendorID string
34+
cpuFamily string
35+
model string
36+
flags string
37+
}

0 commit comments

Comments
 (0)