Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add alert for high TCP memory utilization on nodes #213

Merged
merged 4 commits into from
Aug 15, 2024

Conversation

simu
Copy link
Member

@simu simu commented Aug 15, 2024

Add alert which fires when TCP memory utilization on a node exceeds the configured kernel-level TCP memory "pressure" threshold. (6.25% on RHEL8/RHEL9). Add an alert runbook which contains all the debugging snippets we've discovered so far.

Checklist

  • The PR has a meaningful title. It will be used to auto-generate the
    changelog.
    The PR has a meaningful description that sums up the change. It will be
    linked in the changelog.
  • PR contains a single logical change (to build a better changelog).
  • Update the documentation.
  • Categorize the PR by adding one of the labels:
    bug, enhancement, documentation, change, breaking, dependency
    as they show up in the changelog.

simu added 3 commits August 15, 2024 11:24
The alert doesn't have a label `node`, use `instance` instead which
contains the node name.
We alert when a node's TCP memory utilization exceeds 6.25% of the
node's total memory. This matches the threshold configured by
RHEL8/RHEL9 for the kernel's TCP memory pressure threshold.
@simu simu added the enhancement New feature or request label Aug 15, 2024
@simu simu force-pushed the feat/node-tcp-memory-alert branch from 99757f0 to bdb1718 Compare August 15, 2024 12:49
@simu simu force-pushed the feat/node-tcp-memory-alert branch from bdb1718 to 9a2cef9 Compare August 15, 2024 13:19
@simu simu marked this pull request as ready for review August 15, 2024 13:19
@simu simu requested a review from a team as a code owner August 15, 2024 13:19
@simu
Copy link
Member Author

simu commented Aug 15, 2024

Runbook is a bit of a brain-dump from myself. Feel free to suggest structure improvements.

@simu simu merged commit f148ee9 into master Aug 15, 2024
27 checks passed
@simu simu deleted the feat/node-tcp-memory-alert branch August 15, 2024 16:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants