-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling too large inputs gracefully #5
base: master
Are you sure you want to change the base?
Conversation
Nice catch ! Because I'm not in favor of patching lz4.c (which comes directly from Yann Collet's repo). |
Although checking |
Good point. I actually had one request to work on that one. The solution was to move from 'int' to 'size_t'. The problem is about the error codes. They are currently provided as negative numbers. As a consequence, and since having an input size >=2GB was considered uncommon, if not out of scope, i finally selected to remain with int, avoiding any change to existing user base. Maybe that's a decision to review again. |
Why not force |
For the record, @Cyan4973 is Yann Collet :) |
I'll make some test with your proposal. |
I can live with the 2GB limit as such, but the problem is that I want to make sure my data doesn't get silently truncated if I at some point attempt to compress/decompress something that is too long. Also, because LZ4_compressBound may overflow (in lz4.c), I think it should be patched, in addition to the python wrappers. @Cyan4973 In lz4.c, the LZ4_compressBound function may overflow internally, so I propose to fix that by detecting oversize inputs and flagging them as errors. |
If there is a 2GB limit, people will start writing their stuff to compress large files in blocks, and then not all implementations will necessarily be compatible. |
Going on old issues. Is this one fixed? I'm gonna update the release this week. |
There is now an official streaming format specification It allows creation of compressed streams of any size, cut into independent (or chained) blocks. There is also an optional checksum mechanism defined, for error detection. |
Many years later...
Is this going to be fixed? The error is clear, but the limitation is not ideal. |
Upstream has moved. Please file an issue at the current upstream. https://github.com/python-lz4/python-lz4 $ pip list | grep lz4 Is this going to be fixed? — |
Currently, if the input is over 2 GB, lz4.compress may silently truncate the input, or if you get lucky, raise "SystemError: Negative size passed to PyString_FromStringAndSize". This pull request handles too large inputs and raises ValueError for too large inputs. Requires Python 2.5 or newer for Py_ssize_t support. Bumping version number to 0.4.1.