-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[C++][Parquet] Raise an error when reading Parquet data with invalid repetition levels #45185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This was referenced Jan 7, 2025
mapleFU
added a commit
that referenced
this issue
Mar 28, 2025
… when delimiting records (#45186) ### Rationale for this change See #45185. Invalid repetition levels would previously only cause a fatal error in debug builds. ### What changes are included in this PR? Converts an existing `ARROW_DCHECK_EQ` of the repetition level with a check that will raise an exception in release builds too. ### Are these changes tested? Yes, using a new example file (apache/parquet-testing#67) ### Are there any user-facing changes? Yes, reading columns with invalid repetition levels as Arrow arrays will now raise an exception. * GitHub Issue: #45185 Lead-authored-by: Adam Reeve <[email protected]> Co-authored-by: mwish <[email protected]> Signed-off-by: mwish <[email protected]>
Issue resolved by pull request 45186 |
zanmato1984
pushed a commit
to zanmato1984/arrow
that referenced
this issue
Apr 15, 2025
…levels when delimiting records (apache#45186) ### Rationale for this change See apache#45185. Invalid repetition levels would previously only cause a fatal error in debug builds. ### What changes are included in this PR? Converts an existing `ARROW_DCHECK_EQ` of the repetition level with a check that will raise an exception in release builds too. ### Are these changes tested? Yes, using a new example file (apache/parquet-testing#67) ### Are there any user-facing changes? Yes, reading columns with invalid repetition levels as Arrow arrays will now raise an exception. * GitHub Issue: apache#45185 Lead-authored-by: Adam Reeve <[email protected]> Co-authored-by: mwish <[email protected]> Signed-off-by: mwish <[email protected]>
zanmato1984
pushed a commit
to zanmato1984/arrow
that referenced
this issue
Apr 15, 2025
…levels when delimiting records (apache#45186) ### Rationale for this change See apache#45185. Invalid repetition levels would previously only cause a fatal error in debug builds. ### What changes are included in this PR? Converts an existing `ARROW_DCHECK_EQ` of the repetition level with a check that will raise an exception in release builds too. ### Are these changes tested? Yes, using a new example file (apache/parquet-testing#67) ### Are there any user-facing changes? Yes, reading columns with invalid repetition levels as Arrow arrays will now raise an exception. * GitHub Issue: apache#45185 Lead-authored-by: Adam Reeve <[email protected]> Co-authored-by: mwish <[email protected]> Signed-off-by: mwish <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug, including details regarding any error messages, version, and platform.
When looking into #45073 I found that Arrow doesn't raise an error when reading data with invalid repetition levels into Arrow list arrays.
The encryption test files included an int64 list column with leaf-values equal to i * 1,000,000,000,000, where i is the leaf-value index. The repetition level was set to 1 for even leaf indices and 0 for odd indices, meaning the first repetition level was 1 which is invalid. This file is read by PyArrow without any error being raised though, and the first leaf value (0) is skipped:
I wouldn't expect an error to be raised if reading the raw values and repetition levels with the lower-level Parquet C++ API, but think reading this data as an Arrow list should raise an error.
Component(s)
C++, Parquet
The text was updated successfully, but these errors were encountered: