Skip to content

Commit

Permalink
agent: Fix CPU usage reporting for cgroup v2 in kata-agent
Browse files Browse the repository at this point in the history
kata-agent incorrectly reports CPU time for cgroup v2, causing 1000x underreporting.

For cgroup v2, kata-agent reads the cpu.stat file, which reports the time consumed by the processes in the cgroup in µs.
However, there was a bug in kata-agent where it returned this value in µs without converting it to ns.

This commit adds the necessary µs to ns conversion for cgroup v2, aligning it with v1 behavior and kata-shim's expectations.

This fixes kata-containers#10278

Signed-off-by: Alex Man <[email protected]>
  • Loading branch information
alexman-stripe committed Sep 11, 2024
1 parent 6f88972 commit 7e400f7
Showing 1 changed file with 14 additions and 3 deletions.
17 changes: 14 additions & 3 deletions src/agent/rustjail/src/cgroups/fs/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -724,9 +724,20 @@ fn get_cpuacct_stats(cg: &cgroups::Cgroup) -> MessageField<CpuUsage> {
let cpu_controller: &CpuController = get_controller_or_return_singular_none!(cg);
let stat = cpu_controller.cpu().stat;
let h = lines_to_map(&stat);
let usage_in_usermode = *h.get("user_usec").unwrap_or(&0);
let usage_in_kernelmode = *h.get("system_usec").unwrap_or(&0);
let total_usage = *h.get("usage_usec").unwrap_or(&0);
// All fields in CpuUsage are expressed in nanoseconds (ns).
//
// For cgroup v1 (cpuacct controller):
// kata-agent reads the cpuacct.stat file, which reports the number of ticks
// consumed by the processes in the cgroup. It then converts these ticks to nanoseconds.
// Ref: https://www.kernel.org/doc/Documentation/cgroup-v1/cpuacct.txt
//
// For cgroup v2 (cpu controller):
// kata-agent reads the cpu.stat file, which reports the time consumed by the
// processes in the cgroup in microseconds (us). It then converts microseconds to nanoseconds.
// Ref: https://www.kernel.org/doc/Documentation/cgroup-v2.txt, section 5-1-1. CPU Interface Files
let usage_in_usermode = *h.get("user_usec").unwrap_or(&0) * 1000;
let usage_in_kernelmode = *h.get("system_usec").unwrap_or(&0) * 1000;
let total_usage = *h.get("usage_usec").unwrap_or(&0) * 1000;
let percpu_usage = vec![];

MessageField::some(CpuUsage {
Expand Down

0 comments on commit 7e400f7

Please sign in to comment.