-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zpool import hangs forever. Edit: trim under heavy IO might damage the pool #356
Comments
Updated. |
Oh hey, new ticket. OK, those stacks are a good start, but they just show it called the kernel. We need to dump the kernel stacks to see what is happening there. If you know how to attach windbg/VS debugger, the command is |
I can reproduce this by issue a trim when having heavy IO, and after a while all the vdev in the pool start to have checksum error and grow quickly, showing same number across all of them. Then the pool freeze and when you reboot, windows will take forever to reboot and you have to power off. After the reboot, you will not be able to import this pool rw in both Windows/Linux. |
OK so you are saying potentially trim might be veerrryy bad, and I should disable it for now? |
No sir, I am still trying to get exactly what is happening there, and it seems trim is very likely related to this due that it has some chance to ruin the whole pool in my experience. I was just trying to solve problems. I was writing 1.2G/s and this can happen in my case. |
Appears to corrupt pool, take safe option until it can be investigated. Signed-off-by: Jorgen Lundman <[email protected]>
Yeah, I will disable trim to be safe. I can not test trim with VM, as it isn't supported by vmware. I will plug in a real device and double check the sector math is correct. |
More context: I got two pool failures under similar situation. With 2 pools one is raidz and another raidz2. All vdevs are healthy, responsive and can be trimmed individually when not on ZFS. The following command is used to create the pool.
|
Hmm starting to sweat here, I think maybe I need to add on the partition-offset to the offset to trim. As you can see on first line here, and last line, in the selected block https://github.com/openzfsonwindows/openzfs/blob/windows/module/os/windows/zfs/vdev_disk.c#L727-L755 😬 |
OK pushed out a new release |
Well, as I don't have a test pool currently... I am not dare to try it now 😭 |
Shouldn't be related to the trim thing, that is now disabled. It would be interesting to get a dump of the processes while import is running to see what it is doing - wonder if Windows has a way to do that without setting a remote debugger. |
Open one if you come across it again. I was wondering if it was unlinked-drain being slow, but afaik that is async these days. |
Also, if you are just rebooting, or shutdown, you don't need to "unmount" and "export". You should generally just export if you are to move the storage to different hardware, like, plugging into another machine, or booting a different OS. |
Oh, I thought it was mandatory and I was doing before shutting down the system 😅 Right now I'm scrubbing the pool. Let's see if everything's fine.
I'll open an issue in case something strange happens again (TBH I'm missing some kind of log file :D) |
kstat will dump a bunch of logs, if you set the verbose=1 afaik. |
System information
Describe the problem you're observing
When 'zpool import', the command hangs at the last drive's
setting physpath here
and freezes. Cannot import the pool, andzpool status
hangs with no output, cannot get data.Describe how to reproduce the problem
After some heavy IO, scrub and trim. When doing trim, everything freezes. No exact idea how it goes wrong. It was running a scrub but it seems the whole pool freeze, and a reboot ruins everything.
Include any warning/errors/backtraces from the system logs
Tried import in Linux, same issue, seems pool is broken.
If you are lucky, try import readonly with recovery on.
The text was updated successfully, but these errors were encountered: