This document provides links to troubleshooting information for services and functionality provided by CSM.
- Known issues
- Kubernetes
- Grafana dashboards
- UAS
- Booting
- Compute rolling upgrades
- Configuration management
- Security and authentication
- ConMan
- Utility storage
- Node management
- Customer Management Network (CMN)
- Domain Name Service (DNS)
- MetalLB
- Spire
- SAT/HSM/CAPMC Component Power State Mismatch
- HMS Discovery job not creating
RedfishEndpoint
s in Hardware State Manager initrd.img.xz
not found- SSL Certificate Validation Issues
- SLS Not Working During Node Rebuild
- General Kubernetes Commands for Troubleshooting
- Kubernetes Log File Locations
- Liveliness or Readiness Probe Failures
- Unresponsive
kubectl
Commands - Kubernetes Node
NotReady
- Kubernetes Pods not Starting
- Postgres Database
- Recover from Postgres WAL Event
- Restore Postgres
- Disaster Recovery for Postgres
- Viewing UAI Log Output
- Stale Brokered UAIs
- UAI Stuck in
ContainerCreating
- Duplicate Mount Paths in a UAI
- Missing or Incorrect UAI Images
- Common Mistakes When Creating a Custom End-User UAI Image
- Issues Related to Unified Extensible Firmware Interface (UEFI)
- Issues Related to Dynamic Host Configuration Protocol (DHCP)
- Issues Related to the Boot Script Service
- Issues Related to Trivial File Transfer Protocol (TFTP)
- Troubleshooting Using Kubernetes
- Log File Locations and Ports Used
- Issues Related to Slow Boot Times
CRUS was deprecated in CSM 1.2.0. It will be removed in a future CSM release and replaced with BOS V2, which will provide similar functionality. See Deprecated features.
- Nodes Failing to Upgrade in a CRUS Session
- Failed CRUS Session Because of Unmet Conditions
- Failed CRUS Session Because of Bad Parameters
- ConMan Blocking Access to a Node BMC
- ConMan Failing to Connect to a Console
- ConMan Asking for Password on SSH Connection
- Failure to Get Ceph Health
- Down OSDs
- Ceph OSDs Reporting Full
- System Clock Skew
- Unresponsive S3 Endpoint
- Ceph-Mon Processes Stopping and Exceeding Max Restarts
- Large Object Map Objects in Ceph Health
- Failure of RGW Health Check
- Troubleshoot S3FS Mounts