-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add rewriter for multiple similar APPROX_PERCENTILEs in same clause #14
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you should move lot of this logic to actual visit methods.
|
||
public ApproxPercentilePatternMatcher() | ||
{ | ||
selectMap = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at the Presto coding style. We use ImmutableMap.Builder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now return an immutable map.
Done
private boolean isLiteral(AstNode node) | ||
{ | ||
return node.getId() == JJTUNSIGNEDNUMERICLITERAL; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now, just move this method into ASTNode. Or better yet have a util class - NodeUtil that has all these static functions. Also, name the function to reflect what it does - like isUnsignedNumericLiteral
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left in this class for now -- will factor this and similar utility methods into a NodeUtil class in a NodeUtil-specific PR.
Done
|
||
private void processSelectItem(AstNode node, Map<String, List<AstNode>> map) | ||
{ | ||
AstNode functionCall = node.GetFirstChildOfKind(JJTFUNCTIONCALL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it will be good to generalize it to handle approx percentiles anywhere in the expression. Best to do this in visitFunctionCall
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return; | ||
} | ||
AstNode identifier = functionCall.GetFirstChildOfKind(JJTIDENTIFIER); | ||
if (identifier == null || !identifier.GetImage().equalsIgnoreCase("APPROX_PERCENTILE")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a utilitiy method - isApproxPercentile and do this check there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
return; | ||
} | ||
map.putIfAbsent(firstArg.GetImage(), new LinkedList<>()); | ||
map.get(firstArg.GetImage()).add(node); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of getImage, use Unparser.unparse(firstArg) so it will be general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
30dc75e
to
a8c3d49
Compare
a8c3d49
to
4a46b4c
Compare
4a46b4c
to
a1aaeea
Compare
f7444b2
to
92a67b6
Compare
eea9344
to
b4db47a
Compare
public class ApproxPercentileRewriter | ||
extends Rewriter | ||
{ | ||
private static final int ZERO_INDEXING_OFFSET = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confusing naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
APPROX_PERCENTILE is 1-indexed while the data structure we store our percentiles in is 0-indexing, making this offset necessary. I feel like this name is descriptive given that context?
extends Rewriter | ||
{ | ||
private static final int ZERO_INDEXING_OFFSET = 1; | ||
private static final String REPLACEMENT_FORMAT = " APPROX_PERCENTILE(%s, ARRAY%s)[%d]"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use ARRAY[%s]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also don't add space padding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
.anyMatch(key -> firstArgToApproxPercentileNode.get(key).size() >= 2); | ||
} | ||
|
||
private boolean canApplyApproxPercentileRewrite(AstNode node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be more like: needsApproxPercentileRewrite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe needsRewrite and make it take more specific type: FunctionCall
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return firstArgToPercentiles.get(firstArg).indexOf(secondArg) + ZERO_INDEXING_OFFSET; | ||
} | ||
|
||
private static Optional<AstNode> getNthArgument(AstNode node, int n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this to AstNode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be moved to NodeUtils in a future PR.
return false; | ||
} | ||
Optional<String> image = Optional.ofNullable(identifier.get().GetImage()); | ||
return image.isPresent() && image.get().equalsIgnoreCase("APPROX_PERCENTILE"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use a static final constant for the name of the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return getNthArgument(node, 1).filter(astNode -> astNode.getId() == JJTUNSIGNEDNUMERICLITERAL).isPresent(); | ||
} | ||
|
||
private static boolean isApproxPercentileNode(AstNode node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it needed here? You should move it to the pattern matcher class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will be moved to NodeUtils in a future PR.
{ | ||
if (isApproxPercentileNode(node) && hasUnsignedLiteralSecondArg(node)) { | ||
String firstArg = unparse(getNthArgument(node, 0).get()).trim(); | ||
String secondArg = unparse(getNthArgument(node, 1).get()).trim(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to unparser the second argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed -- removed unparsing when we deal with secondArg.
"SELECT APPROX_PERCENTILE(x, ARRAY[0.1, 0.2, 0.3]) FROM (SELECT APPROX_PERCENTILE(x, 0.1) from T);" | ||
}; | ||
|
||
private static final ImmutableMap<String, String> STATEMENT_TO_REWRITTEN_STATEMENT = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can simply have two arrays - one original sql and other one positionally matching rewritten sql. Or just array of pairs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented two arrays.
Done.
extends Rewriter | ||
{ | ||
private static final int ZERO_INDEXING_OFFSET = 1; | ||
private static final String REPLACEMENT_FORMAT = " APPROX_PERCENTILE(%s, ARRAY%s)[%d]"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also don't add space padding.
.anyMatch(key -> firstArgToApproxPercentileNode.get(key).size() >= 2); | ||
} | ||
|
||
private boolean canApplyApproxPercentileRewrite(AstNode node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe needsRewrite and make it take more specific type: FunctionCall
e251722
to
e8204c4
Compare
e8204c4
to
85ae620
Compare
Added a rewriter for the following anti-pattern:
-> Query has multiple APPROX_PERCENTILEs in the same clause
-> The first argument for each APPROX_PERCENTILE is identical
-> The second argument for each APPROX_PERCENTILE is a literal