Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow performance #39

Open
bkuczenski opened this issue Jul 14, 2016 · 1 comment
Open

Slow performance #39

bkuczenski opened this issue Jul 14, 2016 · 1 comment

Comments

@bkuczenski
Copy link

bkuczenski commented Jul 14, 2016

Hi- I've been using pylzma to handle large(ish) 7z files ranging from 50MB-1.0GB compressed. I am trying to access individual files from the archive, one at a time, and I noticed that performance can be highly variable, and is very slow in comparison to ZipFile.

Below I compared performance for two archives containing the same files (I created the ZIP by extracting the 7z file and recompressing it with zip):

http://nbviewer.jupyter.org/github/bkuczenski/lca-tools/blob/master/doc/7z%20profiling.ipynb

On the one hand, the ZIP file is almost 6x as large as the 7Z file; on the other hand, 7z access seems 10x-100x slower.

My question: is there a way for me to improve the performance of py7zlib? is there a better way to use the archive to reference single files? Or is there a technical limitation that prevents this?

n.b. the performance is no different if I keep the archive open between successive retrievals. It is consistent for the same file over multiple trials (some are fast, others are slow- in this case all the files are about the same size so that's not the issue).

Thanks for any feedback.

@bkuczenski
Copy link
Author

This turns out to be due to high memory requirements

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant