Stale ebpf maps if agent stops abruptly #537

anubhabMajumdar · 2024-07-10T17:57:59Z

Describe the bug

If agent pod is OOMKIlled, Packetparser leaves behind stale maps and qdiscs. These are never cleaned up on restart.

To Reproduce
Steps to reproduce the behavior:

Deploy retina-advanced
Exec into a node and kill the controller process repeatedly
Check maps and qdiscs

Expected behavior
Only one instance of maps should exist for each plugin and one ingress/egress qdisc for each veth.

Platform (please complete the following information):

OS: Linux
Kubernetes Version: 1.29
Host: AKS
Retina Version: current

Additional context
Suggestion - Cleanup should happen in init container (probably we need privilege to clean up residual maps and qdiscs)

The text was updated successfully, but these errors were encountered:

nddq · 2024-07-29T20:07:00Z

I've looked into the issue and found that eBPF maps are deleted when their Close() function is called, which occurs in the plugin's Stop() function during a graceful shutdown. However, those maps persist if the agent is forcibly terminated via signals like SIGTERM or SIGKILL, bypassing the plugins' Stop() function. To address this, we should implement a goroutine in the main thread to catch these signals and handle cleanup.

timraymond · 2024-08-05T18:51:59Z

@nddq I think making it crash-only would be more robust (as much as I like defer). On-boot we should check if those maps erroneously exist, then delete and recreate them.

anubhabMajumdar added help wanted Extra attention is needed lang/go The Go Programming Language area/plugins area/ebpf priority/0 P0 labels Jul 10, 2024

nddq self-assigned this Jul 12, 2024

nddq mentioned this issue Jul 29, 2024

fix: daemon listen for SIGTERM and SIGINIT to gracefully clean up and shutdown #569

Closed

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stale ebpf maps if agent stops abruptly #537

Stale ebpf maps if agent stops abruptly #537

anubhabMajumdar commented Jul 10, 2024

nddq commented Jul 29, 2024

timraymond commented Aug 5, 2024

Stale ebpf maps if agent stops abruptly #537

Stale ebpf maps if agent stops abruptly #537

Comments

anubhabMajumdar commented Jul 10, 2024

nddq commented Jul 29, 2024

timraymond commented Aug 5, 2024