I/O errors on zvol, but no erros reported by zpool scrub #15720
Replies: 7 comments 54 replies
-
I am currently destroying all the datasets/zvols affected by this issue on the backup machine. However, I'll leave at least one intact for debugging. |
Beta Was this translation helpful? Give feedback.
-
You should probably read #12014 before destroying everything in a fire. Assuming you're using native encryption ,of course. |
Beta Was this translation helpful? Give feedback.
-
Thanks, @rincebrain, for this zdb test/investigation I decided to use a dataset which I can share. If sharing the output of Also, somewhere in the past I've changed Output of
Contents of
... with Output of Finding out the offset... I assume From part of
At the end of
So, L0 DVA in this case will be |
Beta Was this translation helpful? Give feedback.
-
Here is the link to output of |
Beta Was this translation helpful? Give feedback.
-
Ok, one thing less to suspect
Yes, there are 63 such zvols presently. I'm in the process of manual zfs send / recv some of them - for testing. |
Beta Was this translation helpful? Give feedback.
-
Thanks, I understand that this is exceptionally weird. I really appreciate Your help. |
Beta Was this translation helpful? Give feedback.
-
Another problem caused by |
Beta Was this translation helpful? Give feedback.
-
System information
Describe the problem you're observing
I have a backup machine, to which backups are sent using
zrepl
tool. Recently, due to power failure (which also disabled UPS - it was "weird" power failure, forced UPS to shutdown) I was faced with a need to restore some ZFS ZVOLs from backup. I usedzfs send | zfs recv
to accomplish this goal.(The backup machine was connected to the same UPS and was also shut down abruptly.)
zfs send -Lec | zfs recv ...
completed without issue, however the result on the production machine was a ZVOL which exhibited I/O errors upon trying todd
it (also, was impossible tofsck
and reported I/O errors).zpool status -v
on the production machine reported errors correctly:The names under
errors:
are lost, because I deleted the affected ZVOLs - earlier it was showingbelt/vms/vdi-proxy/root
which is a path to ZVOL being restored.Weirdly, when trying
dd
on the backup machine, using the same snapshots, and evenzfs clone
them to ZVOLs, anddd
-ing them to/dev/null
no errors were reported byzpool status -v
Describe how to reproduce the problem
I do not know how to reproduce these specific state, since, apparently it has taken a long time to occur without any error messages being reported. The last "integral" snapshot (without I/O errors) is from 2023-11-08.
Include any warning/errors/backtraces from the system logs
Output of
zpool events -v
on production machine (restore target) afterdd
from broken ZVOL:Output from
dmesg
afterdd
affected ZVOL:Output of dmesg during
e2fsck
on affected ZVOL:e2fsck
due to write I/O errors.Changed ZFS kernel params (
modprobe.d/zfs.conf
):Questions - most important
zfs destroy
-ing affected ZVOLs?Possible causes
zrepl
? However,zrepl
uses ZFS shell utils.fstrim -a
in all VMs, which makes all ZVOLs receive TRIM/DISCARD/UNMAP dailyNon-causes
memtest86+
zfs scrub
on backup machinesync=never
- not usedGoals of this report
Non-goals of this report
Questions and doubts
zpool scrub
on the backup machine detect those?EDIT1:
Beta Was this translation helpful? Give feedback.
All reactions