-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error correction on a single drive #51
Comments
It's much more complicated. HDD/SSD have error correction codes and correct bitrot errors themselfs. So, to handle bad sectors. we need raid on different disks or on same disk (dup). About bit-flips, that's also tricky. AFAIK in metadata btrfs has 32 bytes to store checksums. Currently by default only 4 bytes for CRC32 is used. CRC32 can only reliable detect errors up to N (4-5 bits for 4k) (because of possible hash collisions) - so if we do any error correction here it can not reliably work. In case of xxhash, we use 8 bytes hash sum, with better resistance to collisions. It possible to do something about it. In case of checksum error, we can simply try to pruteforce and recompute each possible combination of 4k sector with hash sum ~ 256 attemts, and after each recompute - check checksum, if checksum matched that means we succsesfuly corrected errors (up to consequative 256 bits). Cool right? Taking into account behavior of disks in case of errors, the only possible case where it will helps - in memory bitrot. Does it make sense to pay additional computation costs and require every one, who want this, to use xxhash + xor interdata checksum? I don't know. Any way, it will not help with faulty disk, it will not fix sector IO errors - so you will lose your currepted data, and you still need to replace it. |
I was thinking this would be a good feature some months ago as well, and I think there are good reasons why it would be worth adding. First of all, HDD/SSD controllers are a black box and cannot be trusted to ecc/csum with 100% reliability. If they could, we wouldn't need csum in the file system, we could just write bytes. So, certainly, because we csum, and we raid, we should consider that single-drive bitrot detection is worth thinking about too. Furtherly, it can be possible to do parity bitrot checks on arbitrary redundancy levels, like with parchive, using block maps. Reserving 1% of a drive's data, every 100 blocks of the filesystem, you have 1 blocks parity striped into the data to check the previous 100 blocks data integrity, and correct up to 1 blocks data in that space, in the case of a csum mismatch or unreadable block. It may also be better to do 10X to 10 blocks or some other larger number chunks, to avoid adjacent bitrot problems, so you can have 10 bad reads per, say, 1000 blocks of data. Being able to assign arbitrary %s of a drive to parity. It could also be used on a per-subvolume basis, for ensuring important personal data has some degree of recoverability in case of data corruption. btrfs is very good at detecting corruption, and there are still ways to consistently get corrupted data in cases where it shouldn't, such as corrupting data in subvolume snapshots by imaging an entire drive while a file in the root subvolume is in use, ie disk imaging an active drive with firefox open will corrupt the cookies.sqlite file. csum and a small amount of parity could recover from these small errors smoothly. I'm not sure parity actually can help with the above two cases in an in-flight filesystem, since the parity may be overwritten as well with bad data, but I think it's possible to have a (write -> check -> parity if integrity) process for at least some of these cases, or possibly snapshot parity with the subvolume to improve snapshot integrity at least. There's really no good reason a snapshot of a subvolume should face data corruption from writes to the source subvolume, but it can still happen. We should have parity for sanity and because we don't trust software or hardware, and the best solution is not always the available one. Ed: though, per subvolume parity could present a lot of problems with differing parity values across subvolume snapshots and a great deal of data magnification when you have lots of subvolumes. It might be best only to inherit parity on read-only snapshots, and to throw away parity flag when creating RW snapshots, to help manage parity overflow. |
Could error correcting codes be stored so that bad sectors and bit-flips could be transparently corrected? This would be useful on laptops with only a single internal drive. DUP causes you to lose half of your usable space, and RAID5 with multiple partitions is extremely slow as the drive has to seek across multiple partitions. Technically par2 could be used but it isn't suitable for stuff like 100+ GB virtual machine images or files that are constantly updating.
The text was updated successfully, but these errors were encountered: