This document provides links to troubleshooting information for services and functionality provided by CSM.
In the main repository landing page, change the branch to the CSM version being used on the system (for example, release/1.0, release/1.2, release/1.3).
Use the pre-populated GitHub "Search or jump to..." function in the upper left hand side of the page and append keywords related to the exiting problem seen into the existing search. (The example searches for "ping" and "PXE" related troubleshooting resources on the "main" branch.)
-
Follow any run-books, guides, or procedures which are directly related to the problem.
-
Change the branch to "main" and search a second time to retrieve very recent or beta run-books and guides.
-
Users can also expand the search beyond the "troubleshooting" section (instead of doing "path troubleshooting") and/or use more advanced GitHub searches such as "path configure" to find the right context.
- Known issues
- Booting
- Compute rolling upgrades
- Configuration management
- ConMan
- Customer Management Network (CMN)
- Domain Name Service (DNS)
- Grafana dashboards
- Kubernetes
- MetalLB
- Node management
- Security and authentication
- Spire
- UAS
- Utility storage
- SAT/HSM/CAPMC Component Power State Mismatch
- HMS Discovery job not creating
RedfishEndpoint
s in Hardware State Manager initrd.img.xz
not found- SSL Certificate Validation Issues
- SLS Not Working During Node Rebuild
- Antero node NID allocation
- HPE nodes not properly transitioning power state
- Issues Related to Unified Extensible Firmware Interface (UEFI)
- Issues Related to Dynamic Host Configuration Protocol (DHCP)
- Issues Related to the Boot Script Service
- Issues Related to Trivial File Transfer Protocol (TFTP)
- Troubleshooting Using Kubernetes
- Log File Locations and Ports Used
- Issues Related to Slow Boot Times
CRUS was deprecated in CSM 1.2.0. It will be removed in a future CSM release and replaced with BOS V2, which will provide similar functionality. See Deprecated features.
- Nodes Failing to Upgrade in a CRUS Session
- Failed CRUS Session Because of Unmet Conditions
- Failed CRUS Session Because of Bad Parameters
- ConMan Blocking Access to a Node BMC
- ConMan Failing to Connect to a Console
- ConMan Asking for Password on SSH Connection
- DHCP run book
- DNS run book
- General configuration and troubleshooting
- Troubleshoot CMN Issues
- Troubleshoot DHCP Issues
- Troubleshoot Common DNS Issues
- Troubleshoot PowerDNS Issues
- Troubleshoot Common DNS configuration Issues
- Troubleshoot External DNS Issues
- Troubleshoot BGP not accepting routes from MetalLB
- Troubleshoot BGP services without an allocated IP address
- Troubleshoot PXE boot
- General Kubernetes Commands for Troubleshooting
- Kubernetes Log File Locations
- Liveliness or Readiness Probe Failures
- Unresponsive
kubectl
Commands - Kubernetes Node
NotReady
- Kubernetes Pods not Starting
- Postgres Database
- Recover from Postgres WAL Event
- Restore Postgres
- Disaster Recovery for Postgres
- Issues with Redfish Endpoint
DiscoveryCheck
for Redfish Events from Nodes - Interfaces with IP Address Issues
- Loss of Console Connections and Logs on Gigabyte Nodes
- Restore Spire Postgres without a Backup
- Spire Database Cluster DNS Lookup Failure
- Spire Failing to Start on NCNs