Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-8458: Use correct size of definition level bytes slice when reading Parquet v2 data page #2838

Merged

Conversation

handmadecode
Copy link
Contributor

@handmadecode handmadecode commented Oct 23, 2023

DRILL-8458: Use correct size of definition level bytes slice when reading Parquet data page

Description

The byte buffer slice used for the definition level bytes when reading a Parquet v2 data page is sized correctly (in the call to ByteBuffer::limit)

Documentation

Bugfix only, no documentation changes

Testing

Unit test added in new test class org.apache.drill.exec.store.parquet.TestRepeatedColumn

@jnturton jnturton changed the title Use correct size of definition level bytes slice when reading Parquet data page DRILL-845: Use correct size of definition level bytes slice when reading Parquet data page Oct 24, 2023
@jnturton jnturton self-requested a review October 24, 2023 11:50
@jnturton jnturton self-assigned this Oct 24, 2023
@jnturton jnturton added bug backport-to-stable This bug fix is applicable to the latest stable release and should be considered for inclusion there labels Oct 24, 2023
@cgivre cgivre changed the title DRILL-845: Use correct size of definition level bytes slice when reading Parquet data page DRILL-8458: Use correct size of definition level bytes slice when reading Parquet data page Oct 24, 2023
@jnturton jnturton changed the title DRILL-8458: Use correct size of definition level bytes slice when reading Parquet data page DRILL-8458: Use correct size of definition level bytes slice when reading Parquet v2 data page Oct 26, 2023
Copy link
Contributor

@jnturton jnturton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argh, a basic buffer arithmetic bug by yours truly. I guess it's remained undetected so far because Parquet v2 is still uncommon the wild. And because of insufficient Parquet v2 test coverage.

Thank you very much for this fix which looks great. Would you mind seeing if you can relocate the test and its data? We've got TestParquetComplex already and also some Parquet v2 test files for which a naming pattern has been started, e.g.

exec/java-exec/src/test/resources/parquet/parquet_v2_logical_types_simple.parquet

(Feel free to oragnise things differently too, just aiming to bundle like tests and data files).

@handmadecode
Copy link
Contributor Author

Argh, a basic buffer arithmetic bug by yours truly. I guess it's remained undetected so far because Parquet v2 is still uncommon the wild. And because of insufficient Parquet v2 test coverage.

I guess we've all caused our fair share of those bugs ;-)

Thank you very much for this fix which looks great. Would you mind seeing if you can relocate the test and its data? We've got TestParquetComplex already and also some Parquet v2 test files for which a naming pattern has been started, e.g.

exec/java-exec/src/test/resources/parquet/parquet_v2_logical_types_simple.parquet

Sure, how about renaming the test file to exec/java-exec/src/test/resources/parquet/parquet_v2_large_repetition_levels.parquet?

Would you prefer if the test was moved to an existing test class, e.g. TestParquetComplex?

@jnturton
Copy link
Contributor

A test data file name like that would be great thank you. Yes, please may you move the test itself inside TestParquetComplex and see if you can integrate your test data file generation code into ParquetSimpleTestFileGenerator? It already takes a Parquet version number parameter IIRC (up until now, only ever set to v1). I think we may need to come back and organise our tests better for Parquet v2, but I don't want to saddle this PR with that reorganisation.

@jnturton
Copy link
Contributor

jnturton commented Oct 28, 2023

P.S. if you find the test data generator only fits simply into ParquetSimpleTestFileGenerator as a standalone method that is unrelated to what's there already then I think that's fine. I'm pointing to things I know are there and related but not going so far as to ask "How would this look in code?"

@handmadecode
Copy link
Contributor Author

PR updated with refactored test code

Copy link
Contributor

@jnturton jnturton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. Thanks.

@jnturton jnturton merged commit 03fd1f0 into apache:master Oct 31, 2023
8 checks passed
cgivre pushed a commit to cgivre/drill that referenced this pull request Nov 2, 2023
jnturton pushed a commit that referenced this pull request Dec 31, 2023
jnturton pushed a commit to jnturton/drill that referenced this pull request Dec 31, 2023
jnturton pushed a commit to jnturton/drill that referenced this pull request Jan 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-to-stable This bug fix is applicable to the latest stable release and should be considered for inclusion there bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants