Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-28705: Fix Data Inconsistency due to missing 'IS NOT? TRUE/FALSE' parsing in Partition Filter Pruning #5614

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Indhumathi27
Copy link
Contributor

What changes were proposed in this pull request?

Added missing support for parsing 'IS NOT? TRUE/FALSE' conditions in partition filter pruning. Specifically, it handles cases where partition filters with boolean expressions such as IS NOT TRUE or IS NOT FALSE are involved, ensuring correct partition pruning behaviour.

Why are the changes needed?

These changes address data inconsistencies in queries involving partition filter pushdown to the Hive Metastore (HMS), especially when dealing with boolean expressions like IS NOT? TRUE or IS NOT? FALSE. Previously, the partition pruning logic failed to correctly interpret these expressions, leading to incorrect results and inefficient query execution. With the new parsing logic, partition pruning now handles these complex boolean conditions properly, ensuring accurate data retrieval and better query performance.

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

Test scenarios added

@Indhumathi27 Indhumathi27 changed the title HIVE-28705: Fix Data Inconsistency Due to missing 'IS NOT? TRUE/FALSE parsing in Partition Filter Pruning HIVE-28705: Fix Data Inconsistency due to missing 'IS NOT? TRUE/FALSE' parsing in Partition Filter Pruning Jan 16, 2025
@Indhumathi27
Copy link
Contributor Author

@ayushtkn @deniskuzZ @okumin Can you help to review the PR

@Indhumathi27
Copy link
Contributor Author

@zabetak Can you help to review the PR. Thanks

@okumin
Copy link
Contributor

okumin commented Jan 28, 2025

I'm willing to review this PR but cannot immediately do so. I haven't used HMS for years and need more time to familiarize myself.

@okumin
Copy link
Contributor

okumin commented Jan 28, 2025

@Indhumathi27 By the way, you may update either the title of this pull request or the title of the JIRA ticket
https://issues.apache.org/jira/browse/HIVE-28705

deniskuzZ

This comment was marked as outdated.

@Indhumathi27
Copy link
Contributor Author

@deniskuzZ @okumin
Can you help to review and merge this PR. Thanks

select id from t1 where ((month=3 or month=4) and year='2024') is not true;
select id from t1 where ((month=3 or month=4) and year='2024') is true;
select id from t1 where ((month=3 or month=4) and year='2024') is not false;
select id from t1 where ((month=3 or month=4) and year='2024') is false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add test case for validating the third expression you have added in PartitionFilter.g4
| LPAREN key=identifier RPAREN NOT? IN LPAREN values=constantSeq RPAREN IS NOT? booleanLiteral # inConditionWithBoolean
?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CASE WHEN (month=3 or month=4) THEN true else false END) - this case will cover this scenario. It is added in the qfile

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about NOT IN (constantSeq) IS NOT booleanLiteral ?

@@ -262,4 +262,163 @@ public String visitQuotedIdentifier(PartitionFilterParser.QuotedIdentifierContex
return StringUtils.replace(ctx.getText().substring(1, ctx.getText().length() -1 ), "``", "`");
}

@Override
public TreeNode visitConditionIsBoolean(PartitionFilterParser.ConditionIsBooleanContext ctx) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we rename visitBooleanCondition

case "FALSE":
// For FALSE: return negated expression if not negated, otherwise return as is
return isNegated ? exprNode : negateTreeNode(exprNode);
default:
Copy link
Member

@deniskuzZ deniskuzZ Feb 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you've defined booleanLiteral as TRUE/FALSE, how could it be something else?`
https://github.com/apache/hive/pull/5614/files#diff-4f2ef6e04f868805ddc84ef7500193d1004c01a821805d1593c073f0e6e65c20R93

return (isNegated && Boolean.parseBoolean(booleanLiteral)) ? 
    negateTreeNode(exprNode) : exprNode;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

booleanLiteral is just a string variable name which is used in G4 file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it restricted in G4 file to just TRUE/FALSE?

boolean isNot = ctx.NOT() != null;

// Determine the expected boolean value (TRUE/FALSE) and negate if necessary
boolean expectedValue = ctx.booleanLiteral().getText().equalsIgnoreCase("TRUE");
Copy link
Member

@deniskuzZ deniskuzZ Feb 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use Boolean.parseBoolean()

// Determine the expected boolean value (TRUE/FALSE) and negate if necessary
String booleanLiteral = ctx.booleanLiteral().getText();

// Return the node based on the expected boolean value (negated or not)
return (isNot && Boolean.parseBoolean(booleanLiteral)) ?
    negateTree(innerNode) : innerNode;

}

@Override
public TreeNode visitWrappedExpressionIsBoolean(PartitionFilterParser.WrappedExpressionIsBooleanContext ctx) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

visitBooleanWrappedExpression

return expectedValue ? innerNode : negateTree(innerNode);
}

private TreeNode negateTree(TreeNode node) {
Copy link
Member

@deniskuzZ deniskuzZ Feb 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

negateTree seems to be an exact copy of negateTreeNode, except it doesn't modify the LeafNode, but creates a negated clone.

Why do we need both, can we refactor?

}

@Override
public TreeNode visitInConditionWithBoolean(PartitionFilterParser.InConditionWithBooleanContext ctx) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

visitInConditionWithBoolean is mostly a duplicate of the existing visitInCondition, instead of making the code generic and overriding isPositive flag, you created multiple code duplicates

return buildInTreeFromNodes(nodes, isNotIn ? LogicalOperator.AND : LogicalOperator.OR);
}

private TreeNode buildInTreeFromNodes(List<LeafNode> nodes, LogicalOperator operator) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is that different from buildTreeFromNodes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants