Redeploy the iPXE and TFTP services if a pod with a ceph-fs Process Virtualization Service (PVS) on a Kubernetes worker node is causing a HEALTH_WARN
error.
Resolve issues with ceph-fs and ceph-mds by restarting the iPXE and TFTP services. The Ceph cluster will return to a healthy state after this procedure.
This procedure requires administrative privileges.
-
Find the iPXE and TFTP deployments.
kubectl get deployments -n services|egrep 'tftp|ipxe'
Example output:
cray-ipxe 1/1 1 1 22m cray-tftp 3/3 3 3 28m
-
Delete the deployments for the iPXE and TFTP services.
kubectl -n services delete deployment cray-tftp kubectl -n services delete deployment cray-ipxe
-
Check the status of Ceph.
Ceph commands need to be run on
ncn-m001
. If a health warning is shown after checking the status, the ceph-mds daemons will need to be restarted on the manager nodes.-
Check the health of the Ceph cluster.
ceph -s
Example output:
cluster: id: bac74735-d804-49f3-b920-cd615b18316b health: HEALTH_WARN 1 filesystem is degraded services: mon: 3 daemons, quorum ncn-m001,ncn-m002,ncn-m003 (age 13d) mgr: ncn-m001(active, since 24h), standbys: ncn-m002, ncn-m003 mds: cephfs:1/1 {0=ncn-m002=up:reconnect} 2 up:standby osd: 60 osds: 60 up (since 4d), 60 in (since 4d) rgw: 5 daemons active (ncn-s001.rgw0, ncn-s002.rgw0, ncn-s003.rgw0, ncn-s004.rgw0, ncn-s005.rgw0) data: pools: 13 pools, 1664 pgs objects: 2.47M objects, 9.3 TiB usage: 26 TiB used, 78 TiB / 105 TiB avail pgs: 1664 active+clean io: client: 990 MiB/s rd, 111 MiB/s wr, 2.76k op/s rd, 1.03k op/s wr
-
Obtain more information on the health of the cluster.
ceph health detail
Example output:
HEALTH_WARN 1 filesystem is degraded FS_DEGRADED 1 filesystem is degraded fs cephfs is degraded
-
Show the status of all CephFS components.
ceph fs status
Example output:
cephfs - 9 clients ====== +------+-----------+----------+----------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+-----------+----------+----------+-------+-------+ | 0 | reconnect | ncn-m002 | | 11.0k | 74 | +------+-----------+----------+----------+-------+-------+ +-----------------+----------+-------+-------+ | Pool | type | used | avail | +-----------------+----------+-------+-------+ | cephfs_metadata | metadata | 780M | 20.7T | | cephfs_data | data | 150M | 20.7T | +-----------------+----------+-------+-------+ +-------------+ | Standby MDS | +-------------+ | ncn-m003 | | ncn-m001 |
-
Restart the ceph-mds service.
This step should only be done if a health warning is shown in the previous substeps.
for i in 1 2 3 ; do ansible ncn-m00$i -m shell -a "systemctl restart ceph-mds@ncn-m00$i"; done
-
-
Failover the ceph-mds daemon.
This step should only be done if a health warning still exists after restarting the ceph-mds service.
ceph mds fail ncn-m002
The initial output will display the following:
cephfs - 0 clients ====== +------+--------+----------+----------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+--------+----------+----------+-------+-------+ | 0 | **rejoin** | ncn-m003 | | 0 | 0 | +------+--------+----------+----------+-------+-------+ +-----------------+----------+-------+-------+ | Pool | type | used | avail | +-----------------+----------+-------+-------+ | cephfs_metadata | metadata | 781M | 20.7T | | cephfs_data | data | 117M | 20.7T | +-----------------+----------+-------+-------+ +-------------+ | Standby MDS | +-------------+ | ncn-m002 | | ncn-m001 | +-------------+
The rejoin status should turn to active:
cephfs - 7 clients ====== +------+--------+----------+---------------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+--------+----------+---------------+-------+-------+ | 0 | **active** | ncn-m003 | Reqs: 0 /s | 11.1k | 193 | +------+--------+----------+---------------+-------+-------+ +-----------------+----------+-------+-------+ | Pool | type | used | avail | +-----------------+----------+-------+-------+ | cephfs_metadata | metadata | 781M | 20.7T | | cephfs_data | data | 117M | 20.7T | +-----------------+----------+-------+-------+ +-------------+ | Standby MDS | +-------------+ | ncn-m002 | | ncn-m001 | +-------------+
-
Ensure the service is deleted along with the associated PVC.
The output for the command below should empty. If an output is displayed, such as in the example below, then the resources have not been deleted.
kubectl get pvc -n services|grep tftp
Example of resources not being deleted in returned output:
cray-tftp-shared-pvc Bound pvc-315d08b0-4d00-11ea-ad9d-b42e993b7096 5Gi RWX ceph-cephfs-external 29m
Optional: Use the following command to delete the associated PVC.
kubectl -n services delete pvc PVC_NAME
-
Deploy the TFTP service.
Wait for the TFTP pods to come online and verify the PVC was created.
loftsman helm upgrade cray-tftp loftsman/cray-tftp
-
Deploy the iPXE service.
This may take a couple of minutes and may show up in error state. Wait a couple minutes and it will go to running.
loftsman helm upgrade cms-ipxe loftsman/cms-ipxe
-
Log into the iPXE pod and verify the iPXE file was created.
This may take another couple of minutes while it is creating the files.
-
Find the iPXE pod ID.
kubectl get pods -n services --no-headers -o wide | grep cray-ipxe | awk '{print $1}'
-
Log into the pod using the iPXE pod ID.
kubectl exec -n services -it IPXE_POD_ID /bin/sh
To see the containers in the pod:
kubectl describe pod/CRAY-IPXE_POD_NAME -n services
-
-
Log into the TFTP pods and verify it is seeing the correct file size.
-
Find the TFTP pod ID.
kubectl get pods -n services --no-headers -o wide | grep cray-tftp | awk '{print $1}'
Example output:
cray-tftp-7dc77f9cdc-bn6ml cray-tftp-7dc77f9cdc-ffgnh cray-tftp-7dc77f9cdc-mr6zd cray-tftp-modprobe-42648 cray-tftp-modprobe-4kmqg cray-tftp-modprobe-4sqsk cray-tftp-modprobe-hlfcc cray-tftp-modprobe-r6bvb cray-tftp-modprobe-v2txr
-
Log into the pod using the TFTP pod ID.
kubectl exec -n services -it TFTP_POD_ID /bin/sh
-
Change to the /var/lib/tftpboot directory.
# cd /var/lib/tftpboot
-
Check the ipxe.efi size on the TFTP servers.
If there are any issues, the file will have a size of 0 bytes.
# ls -l
Example output:
total 1919 -rw-r--r-- 1 root root 980768 May 15 16:49 debug.efi -rw-r--r-- 1 root root 983776 May 15 16:50 ipxe.efi
-