-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Feature: System Health
Garrett LeSage edited this page May 22, 2018
·
4 revisions
System health is both a subset and superset of PCP integration. In other words, health includes PCP-only health metrics, as well as non-PCP metrics.
- Display system status at a glance
- Provide ways to easily fix issues
- When an easy fix is not possible, provide information on how to resolve a problem
When no health issues are detected, the system should report a healthy state. The lists below are not the only problems that could occur (more could be added later), but are a starter list of possible issues with a system.
- Security-related software updates available
- Issues mounting filesystems (as specified in fstab, etc.)
- Insufficient storage space on partitions
- SMART issues
- Bad clusters on a disk
- IO issues
- Swap is active
- Issues with bringing up network interfaces
The following detectable issues require PCP to be installed to be accurate and/or useful:
- CPU load is constantly too high
- Not enough memory is free
- Network is constantly saturated
- Swap is often used
- Excessive waiting for storage (disk is >85% busy)
- Huge page fragmentation/defragmentation (memory is fragmented and system is spending a lot of time shuffling chunks of memory around to defragment)
- Network errors exist
- Packet receive (RX) queue is too small, causing many packages to be dropped
(Mockups are rough sketches and are not intended to be finalized or "pixel-perfect".)