-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature request: zsttool #11
Comments
In a quick review, I find that |
Thanks for looking into it! |
@fiddyschmitt Maybe t2sz might be something for you. It compresses to zstd in such a manner that it can be easily seeked, e.g., with ratarmount, indexed_zstd, and libzstd-seek. |
Awesome, thanks @mxmlnkn. That's really interesting. Do you know if t2sz can be used to create an index for an existing zst file (without having to create a new zst file)? |
Unfortunately, not. I'm pretty sure last time I looked at the file formats, I found that it would be near impossible to do. Similar to gzip, zstd is a sequence of streams and blocks. This is btw also true for xz and lz4, I think. And while blocks are somewhat seekable, they require a back-reference window, i.e., the last x bytes from the previous decoding procedure. In contrast, streams are completely independent. This is why t2sz creates multiple streams instead of the default one stream per zstd file that the zstd standard compressor creates. But, while the back-reference windows in gzip are limited to 32 KiB, they can be as large as 2 GiB for zstd, xz, and lz4 if I remember correctly. This makes indexing near-impossible because you would have to save up to 2 GiB per checkpoint. Maybe, an index implementation could check how large the actually required back-reference windows are. And in case, they are quite small, an index could still be created. I doubt that there are many zstd compression levels for which this is possible but that is only speculation. One mitigating factor, similar to gzip, could be uncompressed blocks inside the archive. If they are large enough, a checkpoint could be created there as the uncompressed chunks would serve as the back-reference window for all compressed blocks thereafter. |
Fascinating, thanks! |
Hi Roberto,
Going out on a limb here, but do you think you can make a tool to index Zstandard files?
The text was updated successfully, but these errors were encountered: