Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Procfs improvements #4540

Merged
merged 8 commits into from
Feb 6, 2025
Merged

Procfs improvements #4540

merged 8 commits into from
Feb 6, 2025

Conversation

geyslan
Copy link
Member

@geyslan geyslan commented Jan 21, 2025

Ran these in main and procfs-work branches:

dist/tracee --metrics --pyroscope --pprof -s tree=2609608 -e sched_process_exec,sched_process_fork,sched_process_exit --proctree source=both --proctree disable-procfs -o none
dist/tracee --metrics --pyroscope --pprof -s tree=2241901 -e sched_process_exec,sched_process_fork,sched_process_exit --proctree source=both -o none

Stressor details:

Details
8 threads with 1_000_000 ops each running on:

cpu: AMD Ryzen 9 7950X 16-Core Processor
MemTotal: 64923992 kB (64GB)

| Metric        | NO procfs (%) | WITH procfs  (%) |
|---------------|--------------:|-----------------:|
| CPU avg       | -2.65%        | -28.46%          |
| Malloc avg    | -1.56%        | -22.60%          |
| Heap avg      | -1.94%        | +5.65%           |
| Heap obj avg  | -0.52%        | +1.82%           |

NOTE: You may notice that the WITH procfs Heap values ​​have increased - it is a side effect of the CPU time improvement in the proc package. Since the process termination window has been reduced (from feed request to proc file reading), more child processes have been added to the Process still living in LRU.

Close: #4547

1. Explain what the PR does

c5be62d perf(proctree): remove stat call
d474eeb chore: introduce builders with specific fields
c717863 perf(proc): improve ns
8fc7eb3 perf(proc): improve status file parsing
7471ae2 perf(proctree/proc): align fields to real size
764ba4a perf(proc): improve stat file parsing
2afe594 perf(proc): introduce ReadFile for /proc
d6818c1 perf(proc): use formatters for procfs file paths

c5be62d perf(proctree): remove stat call

Calling stat on /proc/<pid> would only increase the window for
process termination between the stat call and the read of the file.

This also replaces fmt.Sprintf with string concatenation and
strconv.FormatInt for better performance.

d474eeb chore: introduce builders with specific fields

- NewProcStatFields()
- NewThreadStatFields()
- NewProcStatusFields()
- NewThreadStatusFields()

c717863 perf(proc): improve ns

Reduce ProcNS memory footprint by using the right member type sizes -
namespace id is an uint32, since it is the inode number in
struct ns_common.

This change also improves the performance of GetAllProcNS(), GetProcNS()
and GetMountNSFirstProcesses().

8fc7eb3 perf(proc): improve status file parsing

Remove the use of library functions to parse the status file and instead
parse it manually (on the fly) to reduce the number of allocations and
improve performance.

7471ae2 perf(proctree/proc): align fields to real size

Propagate values based on its real size which in most cases is smaller
than int (64-bit). This change reduces the memory footprint or at least
the stress on the stack/heap.

764ba4a perf(proc): improve stat file parsing

Remove the use of library functions to parse the stat file and instead
parse it manually (on the fly) to reduce the number of allocations and
improve performance.

chore(proc): align parsing of stat field with the formats size

This also align parsing sizes with the formats to avoid wrong parsing
of the stat file. The internal fields are represented aligned with the
actual kernel fields to avoid any confusion (signed/unsigned).

2afe594 perf(proc): introduce ReadFile for /proc

`os.ReadFile` is not efficient for reading files in `/proc` because it
attempts to determine the file size before reading. This step is
unnecessary for `/proc` files, as they are virtual files with sizes
that are often reported as unknown or `0`.

`proc.ReadFile` is a new function designed specifically for reading
files in `/proc`. It reads directly into a buffer and is more efficient
than `os.ReadFile` because it allows tuning the initial buffer size to
better suit the characteristics of `/proc` files.

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^BenchmarkReadFile$
github.com/aquasecurity/tracee/pkg/utils/proc -benchtime=10000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/utils/proc
cpu: AMD Ryzen 9 7950X 16-Core Processor
BenchmarkReadFile/ProcFSReadFile/Empty_File-32        10000000  3525 ns/op  408 B/op  4 allocs/op
BenchmarkReadFile/OsReadFile/Empty_File-32            10000000  4070 ns/op  872 B/op  5 allocs/op
BenchmarkReadFile/ProcFSReadFile/Small_File-32        10000000  3961 ns/op  408 B/op  4 allocs/op
BenchmarkReadFile/OsReadFile/Small_File-32            10000000  4538 ns/op  872 B/op  5 allocs/op
BenchmarkReadFile/ProcFSReadFile/Exact_Buffer_Size-32 10000000  4229 ns/op  920 B/op  5 allocs/op
BenchmarkReadFile/OsReadFile/Exact_Buffer_Size-32     10000000  4523 ns/op  872 B/op  5 allocs/op
BenchmarkReadFile/ProcFSReadFile_/proc/self/stat-32   10000000  4043 ns/op  408 B/op  4 allocs/op
BenchmarkReadFile/OsReadFile_/proc/self/stat-32       10000000  4585 ns/op  872 B/op  5 allocs/op
PASS
ok  	github.com/aquasecurity/tracee/pkg/utils/proc	334.751s

d6818c1 perf(proc): use formatters for procfs file paths

Since the type of the converted primitive is already known, formatter
helpers should be used to construct procfs file paths instead of relying
on `fmt.Sprintf`. Using `fmt.Sprintf` is relatively costly due to its
internal formatting logic, which is unnecessary for simple path
construction.

2. Explain how to test it

3. Other comments

@geyslan geyslan self-assigned this Jan 21, 2025
pkg/utils/proc/ns.go Fixed Show fixed Hide fixed
pkg/utils/proc/ns.go Fixed Show fixed Hide fixed
@rscampos
Copy link
Collaborator

rscampos commented Feb 5, 2025

Good ideia about replacing ReadFile :)
The benchmark results for ReadFile on /proc are close to yours. The ns/op improvements 16,01%, 14,79%,12,76% and 14,85%. Tested on Arm64.

BenchmarkReadFile/ProcFSReadFile/Empty_File-4         	  288828	      4053 ns/op	     408 B/op	       4 allocs/op
BenchmarkReadFile/OsReadFile/Empty_File-4             	  246860	      4826 ns/op	     856 B/op	       5 allocs/op
BenchmarkReadFile/ProcFSReadFile/Small_File-4         	  244072	      5012 ns/op	     408 B/op	       4 allocs/op
BenchmarkReadFile/OsReadFile/Small_File-4             	  212805	      5882 ns/op	     856 B/op	       5 allocs/op
BenchmarkReadFile/ProcFSReadFile/Exact_Buffer_Size-4  	  236595	      5045 ns/op	     920 B/op	       5 allocs/op
BenchmarkReadFile/OsReadFile/Exact_Buffer_Size-4      	  207237	      5783 ns/op	     856 B/op	       5 allocs/op
BenchmarkReadFile/ProcFSReadFile_/proc/self/stat-4    	  242958	      4895 ns/op	     408 B/op	       4 allocs/op
BenchmarkReadFile/OsReadFile_/proc/self/stat-4        	  207411	      5749 ns/op	     856 B/op	       5 allocs/op

Copy link
Collaborator

@rscampos rscampos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good work!! @geyslan, LGTM

Since the type of the converted primitive is already known, formatter
helpers should be used to construct procfs file paths instead of relying
on `fmt.Sprintf`. Using `fmt.Sprintf` is relatively costly due to its
internal formatting logic, which is unnecessary for simple path
construction.
`os.ReadFile` is not efficient for reading files in `/proc` because it
attempts to determine the file size before reading. This step is
unnecessary for `/proc` files, as they are virtual files with sizes
that are often reported as unknown or `0`.

`proc.ReadFile` is a new function designed specifically for reading
files in `/proc`. It reads directly into a buffer and is more efficient
than `os.ReadFile` because it allows tuning the initial buffer size to
better suit the characteristics of `/proc` files.

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^BenchmarkReadFile$
github.com/aquasecurity/tracee/pkg/utils/proc -benchtime=10000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/utils/proc
cpu: AMD Ryzen 9 7950X 16-Core Processor
BenchmarkReadFile/ProcFSReadFile/Empty_File-32        10000000  3525 ns/op  408 B/op  4 allocs/op
BenchmarkReadFile/OsReadFile/Empty_File-32            10000000  4070 ns/op  872 B/op  5 allocs/op
BenchmarkReadFile/ProcFSReadFile/Small_File-32        10000000  3961 ns/op  408 B/op  4 allocs/op
BenchmarkReadFile/OsReadFile/Small_File-32            10000000  4538 ns/op  872 B/op  5 allocs/op
BenchmarkReadFile/ProcFSReadFile/Exact_Buffer_Size-32 10000000  4229 ns/op  920 B/op  5 allocs/op
BenchmarkReadFile/OsReadFile/Exact_Buffer_Size-32     10000000  4523 ns/op  872 B/op  5 allocs/op
BenchmarkReadFile/ProcFSReadFile_/proc/self/stat-32   10000000  4043 ns/op  408 B/op  4 allocs/op
BenchmarkReadFile/OsReadFile_/proc/self/stat-32       10000000  4585 ns/op  872 B/op  5 allocs/op
PASS
ok  	github.com/aquasecurity/tracee/pkg/utils/proc	334.751s
Remove the use of library functions to parse the stat file and instead
parse it manually (on the fly) to reduce the number of allocations and
improve performance.

chore(proc): align parsing of stat field with the formats size

This also align parsing sizes with the formats to avoid wrong parsing
of the stat file. The internal fields are represented aligned with the
actual kernel fields to avoid any confusion (signed/unsigned).
Propagate values based on its real size which in most cases is smaller
than int (64-bit). This change reduces the memory footprint or at least
the stress on the stack/heap.
Remove the use of library functions to parse the status file and instead
parse it manually (on the fly) to reduce the number of allocations and
improve performance.
Reduce ProcNS memory footprint by using the right member type sizes -
namespace id is an uint32, since it is the inode number in
struct ns_common.

This change also improves the performance of GetAllProcNS(), GetProcNS()
and GetMountNSFirstProcesses().
- NewProcStatFields()
- NewThreadStatFields()
- NewProcStatusFields()
- NewThreadStatusFields()
Calling stat on /proc/<pid> would only increase the window for
process termination between the stat call and the read of the file.

This also replaces fmt.Sprintf with string concatenation and
strconv.FormatInt for better performance.
@geyslan
Copy link
Member Author

geyslan commented Feb 6, 2025

/fast-forward

@github-actions github-actions bot merged commit c5be62d into aquasecurity:main Feb 6, 2025
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

reduce cpu/memory of proc pkg
2 participants