-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRC-32 checks #49
Comments
I do not currently plan to do any CRC32 checks in yauzl. I know that is a feature of other zip file readers, and I'm not sure how valuable it would be for yauzl to support it. In my opinion, hash-based error checking should really never be done inside a file format. A file format does not correspond to any situation where data corruption could occur. Rather error checking should happen when errors can happen, such as during network transmission. If you're trying to check for errors in your storage hardware degrading over time, then you can keep checksums of files beside the files and do your error checking whenever you like regardless of the file format. Additionally, if error checking is outside the file format, then you're not limited to a fixed set of hash algorithms, like CRC32. That being said, it's very popular to include redundancy information in file formats. Even the brand new FLIF image format includes optional CRC32 checksums, despite everything else about the format seeming very progressive. Perhaps checksums in files are not as useless as I think. I'm not sure. If anyone makes an argument to convince me that it's valuable to do the CRC32 checks in yauzl, I will gladly add optional support for it. Keep in mind that CRC32 computation is not free, so it will slow down unzipping slightly. Currently, yauzl can unzip faster than Info-ZIP's If yauzl added support for CRC32 checking, then it would be an error emitted from the read stream obtained from |
I would like to (belatedly) speak up in favour of CRC32 checking. I agree with you in principle that hash-based error checking is better done outside of a file format. And there are more appropriate hashing algorithms than CRC32 for many uses. However, in my use case, I do not control the source of ZIP files I work with, nor the transmission medium by which I receive them. The best indication I have of whether files are corrupted is the CRC32 values in the ZIP file. I imagine this is not an uncommon situation (I notice some other issues on this repo where @thejoshwolfe has asked how a problematic ZIP file was created, and the answer was "no idea, someone sent it to me"). Would you be willing to reopen this issue? |
I believe I can simply add documentation to the README explaining how to do the CRC32 checks outside of yauzl. I think it's as simple as piping the I'm reopening the issue to look into it. |
I have just published a module on npm yauzl-crc that adds CRC32 checking. @thejoshwolfe If you have time, would you mind taking a look to see if I've missed anything? Streams are not my strong suit. |
For what it's worth, I am using this library to unzip archives and saw on some hardware it corrupts a file with null bytes. This is somewhat rare and is probably hardware specific. (Might be overheating, but that's just a guess). Decided to dive into how this is possible and saw this issue. On large systems the probability is near zero, but because of scale it happens. Creates some very subtle bugs. |
Readme says CRC32 check not performed. Is that an option or do you not plan to support it?
The text was updated successfully, but these errors were encountered: