-
Notifications
You must be signed in to change notification settings - Fork 980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRILL-8511: Overflow appeared when the batch reached rows limit #2943
base: master
Are you sure you want to change the base?
Conversation
@rymarm, thanks for finding this issue. I believe your fix is correct. The I would feel more confident if we had a unit test. That this bug exists suggests that, despite the rather excessive number of tests I wrote way back when, I somehow omitted this case. Perhaps a reason is that the bit vector is special: it uses code different than the other vectors, and so needs its own tests. Tests for the fixed-width vectors are in
This gives us 9 cases, some that will force a resize, some that should just fit in the initial allocation. Then, read the bits out of the vector, verifying that the "empty" bits are the Why write tests? It is less embarrassing for a test to point out a problem than to find it in a production system. I apologize that I somehow missed doing the tests in the first place. The second concern is whether this same error was made for other writers. The answer seems to be "no": the other writers use a different structure for |
@paul-rogers thank you so much for the review! I found and fixed one mistake which the failed tests highlighted already and will write additional tests for the cases you described. I agree that test-driven development is much safer and better than a simple fix without test coverage of the affected place. It is just hard to me to write tests for Drill, they looks complicated and not isolated to me. But I'll write them. |
@rymarm, thanks for looking at the tests. While writing tests for Drill can be hard, we developed a bunch of tools for particular cases. In the case of the vector writers, those tests I referenced above should show you what to do. What I did, back when I wrote this stuff, is to run the tests in my IDE. I've not had much luck in recent years running the whole unit test suite locally. But, I can still generally run individual unit tests. A handy way to proceed is to first create a new file that has the test frameworks included. Then, write one simple test that just writes to the vector and reads back the data. At that point, it should be clear how to create other cases. It would be great to temporarily revert your change and create a case that fails, then restore the change and ensure things work. Please let me know if creating the tests takes more than a small amount of time. I can perhaps help out by creating a basic bit vector use case (if I can remember how all that code works!) |
Just an FYI, but the Github actions are not working well at the moment. The two that use Hadoop 2 seem to have some connectivity issue with Hive. @jnturton is working on it. |
DRILL-8511: Overflow appeared when the batch reached rows limit
Description
The size-aware scan framework fails to end the batch.
Framework tries to reallocate the vector on batch end, due to a hidden, minor bug in
BitColumnWriter
- which in general is not notable, but in a specific case, when the initial vector allocation size limit is exceeded and a reader reaches the batch row size limit.BitColumnWriter
uses instead of a write index a value count and this causes unexpected vector reallocation (look at the changes).Documentation
No changes required.
Testing
Manual tests