Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any plan to make a pigzlib like zlib? #50

Open
sfchen opened this issue Nov 29, 2017 · 11 comments
Open

Any plan to make a pigzlib like zlib? #50

sfchen opened this issue Nov 29, 2017 · 11 comments

Comments

@sfchen
Copy link

sfchen commented Nov 29, 2017

A zlib like library for pigz is wanted

@madler
Copy link
Owner

madler commented Nov 29, 2017

I have considered adding multi-threaded compression to zlib. However I'm not sure what sort of interface people would be looking for. What do you imagine the interface would look like?

@sfchen
Copy link
Author

sfchen commented Nov 29, 2017

It will be great if zlib will be multi-threaded. It will be better if the multi-threaded APIs are similar with current zlib.

@madler
Copy link
Owner

madler commented Nov 29, 2017

Similar how? More importantly, different how? How would the multi-threading be controlled? I would like to have a specific design for the interface with some level of consensus from potential users before implementing it.

@dnbaker
Copy link

dnbaker commented Nov 29, 2017

I mostly access zlib as is through gzgetc and gzgets, which don’t have terribly intuitive liftovers. More intuitive would be a pointer to read from like a file, but at that point you might as well just use popen with a shell call to pigz.

@madler
Copy link
Owner

madler commented Nov 29, 2017

"liftover"?

@dnbaker
Copy link

dnbaker commented Nov 29, 2017

I meant conversion or adaptation. (IE, making a pigz-version of gzwrite/gzread/gzgetc/gzgets.) I mostly am imagining that making those successive calls wouldn’t often benefit from parallelization unless you were filling a sufficient large buffer and then dispatching its compression in parallel as needed.

I guess it mostly just depends on how the implementation works. I do imagine I’d want to set the number of threads at file handle creation and leave the arguments to functions the same as their serial counterparts.

@madler
Copy link
Owner

madler commented Nov 29, 2017

Parallel compression needs much more memory than single-thread compression, both for large data buffers and for the multiple compression engines themselves. gzwrite does not need to do anything right away. You could send it small amounts of data and it could accumulate it in a buffer until it has enough to send to a compression engine in a thread. The user would need to say how many threads they want, and how much memory to use, implying an acceptable latency on accumulating data for chunks of compression.

@madler
Copy link
Owner

madler commented Nov 29, 2017

By the way, this would only be for compression. Decompression would be single-thread.

@joaoe
Copy link

joaoe commented Dec 10, 2017

However I'm not sure what sort of interface people would be looking for.

I'd expect at least an API that would be a drop-in replacement for zlib, function by function. So, instead of #include<zlib.h> the programmer would do #include<pigzlib/zlib.h> and it would compile and work out of the box. Or he/she would change the compile settings from -I...include/zlib to -I...include/pigzlib and the same would be true.

But I'm not entirely sure full compatibility is possible, like for low level primitives.

Then on top of that you could add some extra APIs to control/monitor resource usage, but that can be a second feature. This way, adoption of pigzlib would be quite easy and straightforward.

@dschwartz783
Copy link

dschwartz783 commented Mar 6, 2018

It might make sense to have the library offer both a drop-in replacement, and more customize-able functions that would let developers specify things such as thread count. It would be pretty awesome to let OpenSSH use multi-threaded compression with a simple addition to the compilation process, for example. I think this is a must.

@stilsch
Copy link

stilsch commented Mar 6, 2018

or think about RSYNC with PIGZ-compression - would be awsome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants