Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add drive repair support #880

Merged
merged 1 commit into from
Jul 26, 2024

Conversation

balamurugana
Copy link
Member

No description provided.

@balamurugana balamurugana marked this pull request as draft November 1, 2023 02:43
@balamurugana balamurugana force-pushed the Add-drive-repair-support branch 13 times, most recently from 3965749 to f3d086d Compare November 5, 2023 02:08
@balamurugana balamurugana force-pushed the Add-drive-repair-support branch 5 times, most recently from 54dbd4b to d982696 Compare July 10, 2024 07:21
@balamurugana balamurugana marked this pull request as ready for review July 10, 2024 07:46
@balamurugana balamurugana force-pushed the Add-drive-repair-support branch from d982696 to c32402e Compare July 10, 2024 08:04
@balamurugana balamurugana force-pushed the Add-drive-repair-support branch from c32402e to 57a4de9 Compare July 25, 2024 05:16
@Praveenrajmani
Copy link
Collaborator

Praveenrajmani commented Jul 25, 2024

If an inuse drive (drive with published volumes) is repaired, the job fails with the following

➜  directpv git:(xfsrepair) ✗ kubectl logs repair-c65a2d95-d140-4743-a0b1-8507ac772298-x8g5g -n directpv
I0725 08:12:04.934567       1 repair.go:70] xfs_repair: cannot open /dev/dm-1: Device or resource busy
E0725 08:12:04.934639       1 main.go:148] "unable to execute command" err="unable to run xfs_repair on device /dev/dm-1; exit status 1"

and the status of the drive will be "repairing"

➜  directpv git:(xfsrepair) ✗ kubectl directpv list drives --all 
┌────────────────────────────────┬──────┬───────────┬─────────┬─────────┬─────────┬───────────┐
│ NODE                           │ NAME │ MAKE      │ SIZE    │ FREE    │ VOLUMES │ STATUS    │
├────────────────────────────────┼──────┼───────────┼─────────┼─────────┼─────────┼───────────┤
│ praveen-thinkpad-x1-carbon-6th │ dm-0 │ vg0-lv--0 │ 1.9 GiB │ 1.8 GiB │ 4       │ Ready     │
│ praveen-thinkpad-x1-carbon-6th │ dm-2 │ vg0-lv--2 │ 1.9 GiB │ 1.8 GiB │ 4       │ Ready     │
│ praveen-thinkpad-x1-carbon-6th │ dm-3 │ vg0-lv--3 │ 1.9 GiB │ 1.8 GiB │ 4       │ Ready     │
│ praveen-thinkpad-x1-carbon-6th │ dm-1 │ vg0-lv--1 │ 1.9 GiB │ 1.8 GiB │ 4       │ Repairing │
└────────────────────────────────┴──────┴───────────┴─────────┴─────────┴─────────┴───────────┘

this error is however expected, can we consider the following?

  • Can we do some checks in kubectl directpv repair to make sure the drive to be repaired doesn't have any running workloads (No published volumes)?

  • Can we go ahead and umount the drive completely (umounting all the volume mounts as well) before xfs_repair? Anyway, repairing requires a minio pod restart which will bring back the volume mounts.

(OR)

Explaining in the doc to make sure to cordon the node and delete the respective pod to umount all the PV mounts before executing repair.

@balamurugana ^^

@balamurugana balamurugana force-pushed the Add-drive-repair-support branch from 57a4de9 to 5ef188a Compare July 25, 2024 09:48
@balamurugana
Copy link
Member Author

@Praveenrajmani As Kubernetes is declarative, it is not guaranteed to check mounted volumes on the drive. The repair command may see the drive is free from volume mounts, but repair job would fail with Device or resource busy because of job scheduling.

It is completely an admin responsibility when to run repair command. Documenting prerequisites for repair command is the only way to go.

Praveenrajmani
Praveenrajmani previously approved these changes Jul 25, 2024
Copy link
Collaborator

@Praveenrajmani Praveenrajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add repair doc to explain necessary pre-requisites for repairing.

Tested and LGTM otherwise.

Praveenrajmani
Praveenrajmani previously approved these changes Jul 25, 2024
@balamurugana balamurugana force-pushed the Add-drive-repair-support branch from 8200f95 to f114533 Compare July 26, 2024 04:09
@Praveenrajmani Praveenrajmani merged commit 7d99392 into minio:master Jul 26, 2024
25 checks passed
@balamurugana balamurugana deleted the Add-drive-repair-support branch August 1, 2024 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants