Several provider verifications and troubleshooting options are presented in this section which aid in persistent storage investigations including:
- Ceph Status and Health
- Ceph Configuration and Detailed Health
- Ceph Related Pod Status
- Kubernetes General Events
kubectl -n rook-ceph get cephclusters
root@node1:~/helm-charts/charts# kubectl -n rook-ceph get cephclusters
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL
rook-ceph /var/lib/rook 1 69m Ready Cluster created successfully HEALTH_OK
kubectl -n rook-ceph describe cephclusters
- Ensure the name is correct in the Nodes section
- The
Health
key should have a value ofHEALTH_OK
as shown in example output below - Review any output of interest in the Events section
Storage:
Config:
Osds Per Device: 1
Nodes:
Name: node2
Resources:
Use All Devices: true
Wait Timeout For Healthy OSD In Minutes: 10
Status:
Ceph:
Capacity:
Bytes Available: 107333730304
Bytes Total: 107369988096
Bytes Used: 36257792
Last Updated: 2022-05-05T18:43:50Z
Health: HEALTH_OK
Last Checked: 2022-05-05T18:43:50Z
Versions:
Mgr:
ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable): 1
Mon:
ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable): 1
Osd:
ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable): 3
Overall:
ceph version 16.2.5 (0883bdea7337b95e4b611c768c0279868462204a) pacific (stable): 5
Conditions:
Last Heartbeat Time: 2022-05-05T18:43:51Z
Last Transition Time: 2022-05-05T17:34:32Z
Message: Cluster created successfully
Reason: ClusterCreated
Status: True
Type: Ready
Message: Cluster created successfully
Phase: Ready
State: Created
Storage:
Device Classes:
Name: ssd
Version:
Image: ceph/ceph:v16.2.5
Version: 16.2.5-0
Events: <none>
kubectl -n rook-ceph get pods
root@node1:~/akash# kubectl -n rook-ceph get pods
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-269qv 3/3 Running 0 77m
csi-cephfsplugin-provisioner-5c8b6d6f4-9j4tm 6/6 Running 0 77m
csi-cephfsplugin-provisioner-5c8b6d6f4-gwhhh 6/6 Running 0 77m
csi-cephfsplugin-qjp86 3/3 Running 0 77m
csi-rbdplugin-nzm45 3/3 Running 0 77m
csi-rbdplugin-provisioner-8564cfd44-55gmq 6/6 Running 0 77m
csi-rbdplugin-provisioner-8564cfd44-gtmqb 6/6 Running 0 77m
csi-rbdplugin-t8klb 3/3 Running 0 77m
rook-ceph-crashcollector-node2-74c68c58b7-kspv6 1/1 Running 0 77m
rook-ceph-mgr-a-6cd6ff8c9f-z6fvk 1/1 Running 0 77m
rook-ceph-mon-a-79fdcc8b9c-nr5vf 1/1 Running 0 77m
rook-ceph-operator-bf9c6fd7-px76k 1/1 Running 0 79m
rook-ceph-osd-0-747fcf4864-mrq6f 1/1 Running 0 77m
rook-ceph-osd-prepare-node2-x4qqv 0/1 Completed 0 76m
rook-ceph-tools-6646766697-lgngb 1/1 Running 0 79m
- Enters a scrolling events output which would display persistent storage logs and issues if present
kubectl get events --sort-by='.metadata.creationTimestamp' -A -w
root@node1:~/helm-charts/charts# kubectl get events --sort-by='.metadata.creationTimestamp' -A -w
warning: --watch or --watch-only requested, --sort-by will be ignored
NAMESPACE LAST SEEN TYPE REASON OBJECT MESSAGE
akash-services 37m Normal ScalingReplicaSet deployment/akash-provider Scaled up replica set akash-provider-6bf9986cdc to 1
akash-services 37m Normal Scheduled pod/akash-provider-6bf9986cdc-btvlg Successfully assigned akash-services/akash-provider-6bf9986cdc-btvlg to node2
akash-services 37m Normal SuccessfulCreate replicaset/akash-provider-6bf9986cdc Created pod: akash-provider-6bf9986cdc-btvlg
akash-services 37m Normal SuccessfulDelete replicaset/akash-provider-76966c6795 Deleted pod: akash-provider-76966c6795-lvphs
akash-services 37m Normal Created pod/akash-provider-6bf9986cdc-btvlg Created container provider
akash-services 36m Normal Killing pod/akash-provider-76966c6795-lvphs Stopping container provider
akash-services 37m Normal Pulled pod/akash-provider-6bf9986cdc-btvlg Container image "ghcr.io/ovrclk/akash:0.1.0" already present on machine
akash-services 37m Normal ScalingReplicaSet deployment/akash-provider Scaled down replica set akash-provider-76966c6795 to 0
akash-services 37m Normal Started pod/akash-provider-6bf9986cdc-btvlg Started container provider
akash-services 30m Normal SuccessfulCreate replicaset/inventory-operator-645fddd5cc Created pod: inventory-operator-645fddd5cc-86jr9
akash-services 30m Normal ScalingReplicaSet deployment/inventory-operator Scaled up replica set inventory-operator-645fddd5cc to 1
akash-services 30m Normal Scheduled pod/inventory-operator-645fddd5cc-86jr9 Successfully assigned akash-services/inventory-operator-645fddd5cc-86jr9 to node2
akash-services 30m Normal Pulling pod/inventory-operator-645fddd5cc-86jr9 Pulling image "ghcr.io/ovrclk/k8s-inventory-operator"
akash-services 30m Normal Created pod/inventory-operator-645fddd5cc-86jr9 Created container inventory-operator
akash-services 30m Normal Started pod/inventory-operator-645fddd5cc-86jr9 Started container inventory-operator
akash-services 30m Normal Pulled pod/inventory-operator-645fddd5cc-86jr9 Successfully pulled image "ghcr.io/ovrclk/k8s-inventory-operator" in 5.154257083s
ingress-nginx 12m Normal RELOAD pod/ingress-nginx-controller-59xcv NGINX reload triggered due to a change in configuration
ingress-nginx 12m Normal RELOAD pod/ingress-nginx-controller-tk8zj NGINX reload triggered due to a change in configuration