-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Feature: System Health
Garrett LeSage edited this page May 22, 2018
·
4 revisions
System health is both a subset and superset of PCP integration. In other words, health includes PCP-only health metrics, as well as non-PCP metrics.
- Display system status at a glance
- Provide ways to easily fix issues
- When an easy fix is not possible, provide information on how to resolve a problem
When no health issues are detected, the system should report a healthy state. The lists below are not the only problems that could occur (more could be added later), but are a starter list of possible issues with a system.
Some of these issues have simple solutions that Cockpit can automatically fix.
- Security-related software updates available
- Click to view the software updates page
- Issues mounting filesystems (as specified in fstab, etc.)
- Display issues with the filesystem mounts, along with errors while mounting
- Insufficient storage space on partitions
- Show partitions with small amounts of space
- SMART issues
- Display issue, which may include:
- Bad clusters on a disk
- IO issues
- Display issue, which may include:
- Swap is currently active
- Display warning that swap is active (PCP needs to be installed for more details; see below)
- Issues with bringing up network interfaces
- Show problematic network interfaces; click to switch to the network page
- Enabled & running systemd service keeps restarting
- Click to display service's page with its log visible
Several detectable issues require PCP to be installed to be accurate and/or useful. Most of these will not have a simple 1-click solution. Most will require displaying info and/or digging a bit further.
- CPU load is constantly too high
- Identify top offenders over a window of time and provide actions to stop/restart services and/or kill processes
- Not enough memory is free
- Identify top offenders and provide actions (similar to CPU load), suggest upgrading RAM
- Swap is often used (PCP-enhanced version of swap rule above)
- Related to not enough available memory issue (above)
- Show top memory offenders while swap is active (this should help identify the offenders)
- Network is constantly saturated
- Show processes transferring the most data
- Excessive waiting for storage (disk is >85% busy)
- Huge page fragmentation/defragmentation (memory is fragmented and system is spending a lot of time shuffling chunks of memory around to defragment)
- Network errors exist
- Packet receive (RX) queue is too small, causing many packages to be dropped
- Provide a means to specify a new queue length, with a suggested default
(Mockups are rough sketches and are not intended to be finalized or "pixel-perfect".)