-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-28705: Fix Data Inconsistency due to missing 'IS NOT? TRUE/FALSE' parsing in Partition Filter Pruning #5614
base: master
Are you sure you want to change the base?
Conversation
0cca4c7
to
9e37ed6
Compare
…' parsing in Partition Filter Pruning
9e37ed6
to
3570e41
Compare
|
@ayushtkn @deniskuzZ @okumin Can you help to review the PR |
@zabetak Can you help to review the PR. Thanks |
I'm willing to review this PR but cannot immediately do so. I haven't used HMS for years and need more time to familiarize myself. |
@Indhumathi27 By the way, you may update either the title of this pull request or the title of the JIRA ticket |
@deniskuzZ @okumin |
select id from t1 where ((month=3 or month=4) and year='2024') is not true; | ||
select id from t1 where ((month=3 or month=4) and year='2024') is true; | ||
select id from t1 where ((month=3 or month=4) and year='2024') is not false; | ||
select id from t1 where ((month=3 or month=4) and year='2024') is false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add test case for validating the third expression you have added in PartitionFilter.g4
| LPAREN key=identifier RPAREN NOT? IN LPAREN values=constantSeq RPAREN IS NOT? booleanLiteral # inConditionWithBoolean
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CASE WHEN (month=3 or month=4) THEN true else false END) - this case will cover this scenario. It is added in the qfile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about NOT IN (constantSeq) IS NOT booleanLiteral
?
@@ -262,4 +262,163 @@ public String visitQuotedIdentifier(PartitionFilterParser.QuotedIdentifierContex | |||
return StringUtils.replace(ctx.getText().substring(1, ctx.getText().length() -1 ), "``", "`"); | |||
} | |||
|
|||
@Override | |||
public TreeNode visitConditionIsBoolean(PartitionFilterParser.ConditionIsBooleanContext ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we rename visitBooleanCondition
case "FALSE": | ||
// For FALSE: return negated expression if not negated, otherwise return as is | ||
return isNegated ? exprNode : negateTreeNode(exprNode); | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you've defined booleanLiteral
as TRUE/FALSE, how could it be something else?`
https://github.com/apache/hive/pull/5614/files#diff-4f2ef6e04f868805ddc84ef7500193d1004c01a821805d1593c073f0e6e65c20R93
return (isNegated && Boolean.parseBoolean(booleanLiteral)) ?
negateTreeNode(exprNode) : exprNode;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
booleanLiteral is just a string variable name which is used in G4 file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't it restricted in G4 file to just TRUE/FALSE?
boolean isNot = ctx.NOT() != null; | ||
|
||
// Determine the expected boolean value (TRUE/FALSE) and negate if necessary | ||
boolean expectedValue = ctx.booleanLiteral().getText().equalsIgnoreCase("TRUE"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use Boolean.parseBoolean()
// Determine the expected boolean value (TRUE/FALSE) and negate if necessary
String booleanLiteral = ctx.booleanLiteral().getText();
// Return the node based on the expected boolean value (negated or not)
return (isNot && Boolean.parseBoolean(booleanLiteral)) ?
negateTree(innerNode) : innerNode;
} | ||
|
||
@Override | ||
public TreeNode visitWrappedExpressionIsBoolean(PartitionFilterParser.WrappedExpressionIsBooleanContext ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
visitBooleanWrappedExpression
return expectedValue ? innerNode : negateTree(innerNode); | ||
} | ||
|
||
private TreeNode negateTree(TreeNode node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
negateTree
seems to be an exact copy of negateTreeNode
, except it doesn't modify the LeafNode, but creates a negated clone.
Why do we need both, can we refactor?
} | ||
|
||
@Override | ||
public TreeNode visitInConditionWithBoolean(PartitionFilterParser.InConditionWithBooleanContext ctx) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
visitInConditionWithBoolean
is mostly a duplicate of the existing visitInCondition
, instead of making the code generic and overriding isPositive
flag, you created multiple code duplicates
return buildInTreeFromNodes(nodes, isNotIn ? LogicalOperator.AND : LogicalOperator.OR); | ||
} | ||
|
||
private TreeNode buildInTreeFromNodes(List<LeafNode> nodes, LogicalOperator operator) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is that different from buildTreeFromNodes
?
What changes were proposed in this pull request?
Added missing support for parsing 'IS NOT? TRUE/FALSE' conditions in partition filter pruning. Specifically, it handles cases where partition filters with boolean expressions such as IS NOT TRUE or IS NOT FALSE are involved, ensuring correct partition pruning behaviour.
Why are the changes needed?
These changes address data inconsistencies in queries involving partition filter pushdown to the Hive Metastore (HMS), especially when dealing with boolean expressions like IS NOT? TRUE or IS NOT? FALSE. Previously, the partition pruning logic failed to correctly interpret these expressions, leading to incorrect results and inefficient query execution. With the new parsing logic, partition pruning now handles these complex boolean conditions properly, ensuring accurate data retrieval and better query performance.
Does this PR introduce any user-facing change?
No
Is the change a dependency upgrade?
No
How was this patch tested?
Test scenarios added