-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parquet RowGroup pruning for Dictionary(Decimal)
type incorrect
#13821
Comments
Dictionary(Decimal)
typeDictionary(Decimal)
type incorreft
Dictionary(Decimal)
type incorreftDictionary(Decimal)
type incorrect
Research findings: with parquest_pruning turned off
both queries return the same result: +-----+ |
take |
after i dig into this bug, here is the report:
so |
Would it be a good approach to fix this by coercing both sides of BinaryExpr like Here's a conceptual fix:
|
@kosiew the root cause of the issue is how arrow writer handles data for Regarding RG pruning not being enabled in case literal and column have different datatypes -- I think adding some safe casting to enable it would be a useful feature. For this, perhaps, PruningPredicate and its interactions with ParquetExec will be a good place to start dig in. |
Describe the bug
Parquet RowGroup pruning by statistics works incorrectly for
Dictionary(Decimal)
type.To Reproduce
Expected behavior
Results from both queries from the script above should match
Additional context
The problem also happens with bloom filters (if enable them in pattern matching expressions in
prune_by_bloom_filters
), so there is a chance thatArrowWriter
produces incorrect metadata (statistics / bloom filters).+ after adding larger value to the batch (like
100 as i128
which is10.0
when casted to decimal(4, 1)), RG is not pruned, so perhaps something like* pow(10, scale)
lost while writing statistics / calculating filters.The text was updated successfully, but these errors were encountered: