Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Detect and log OOM kills in container logs #660

Merged
merged 8 commits into from
Oct 30, 2024

Conversation

nickpetrovic
Copy link
Contributor

@nickpetrovic nickpetrovic commented Oct 25, 2024

  • Start goroutine of watching for oom events right after runc container pid created
  • Send oom message to output chan
  • Move delete container instance from processStopContainerEvents to stopContainer and only delete if kill=true
  • Move "container still running" log message after detecting for orphaned container

Resolve BE-1972

pkg/worker/lifecycle.go Outdated Show resolved Hide resolved
pkg/worker/worker.go Outdated Show resolved Hide resolved
pkg/worker/worker.go Outdated Show resolved Hide resolved
pkg/worker/lifecycle.go Outdated Show resolved Hide resolved
- Add Pod UID to pod environment
- Move capturing of container pid to spawn function
- Start goroutine of watching for oom kills right after runc pid created

Resolve BE-1972
- Move container structs to lifecycle.go
- Use cgroups package to detect cgroup version and read memory events
- Set cgroup path on runc container spec
- Send oom message to output chan
- Remove POD_UID env var
- Move delete container instance from processStopContainerEvents to stopContainer and only delete if kill=true
- Move "container still running" log message after detecting for orphaned container
@nickpetrovic nickpetrovic merged commit 9170c6d into main Oct 30, 2024
3 checks passed
@nickpetrovic nickpetrovic deleted the np/detect-oom-kills branch October 30, 2024 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants