Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add rewriter for multiple similar APPROX_PERCENTILEs in same clause #14

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jnmugerwa
Copy link
Member

@jnmugerwa jnmugerwa commented Mar 11, 2021

Added a rewriter for the following anti-pattern:

-> Query has multiple APPROX_PERCENTILEs in the same clause
-> The first argument for each APPROX_PERCENTILE is identical
-> The second argument for each APPROX_PERCENTILE is a literal

@jnmugerwa jnmugerwa requested a review from kaikalur March 11, 2021 03:32
Copy link

@kaikalur kaikalur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you should move lot of this logic to actual visit methods.


public ApproxPercentilePatternMatcher()
{
selectMap = new HashMap<>();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at the Presto coding style. We use ImmutableMap.Builder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now return an immutable map.

Done

private boolean isLiteral(AstNode node)
{
return node.getId() == JJTUNSIGNEDNUMERICLITERAL;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for now, just move this method into ASTNode. Or better yet have a util class - NodeUtil that has all these static functions. Also, name the function to reflect what it does - like isUnsignedNumericLiteral

Copy link
Member Author

@jnmugerwa jnmugerwa Mar 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left in this class for now -- will factor this and similar utility methods into a NodeUtil class in a NodeUtil-specific PR.

Done


private void processSelectItem(AstNode node, Map<String, List<AstNode>> map)
{
AstNode functionCall = node.GetFirstChildOfKind(JJTFUNCTIONCALL);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it will be good to generalize it to handle approx percentiles anywhere in the expression. Best to do this in visitFunctionCall

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return;
}
AstNode identifier = functionCall.GetFirstChildOfKind(JJTIDENTIFIER);
if (identifier == null || !identifier.GetImage().equalsIgnoreCase("APPROX_PERCENTILE")) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a utilitiy method - isApproxPercentile and do this check there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

return;
}
map.putIfAbsent(firstArg.GetImage(), new LinkedList<>());
map.get(firstArg.GetImage()).add(node);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of getImage, use Unparser.unparse(firstArg) so it will be general.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@jnmugerwa jnmugerwa requested a review from kaikalur March 12, 2021 18:09
@jnmugerwa jnmugerwa changed the title [WIP] Add rewriter for multiple similar APPROX_PERCENTILEs in SELECT clause [WIP] Add rewriter for multiple similar APPROX_PERCENTILEs in same clause Mar 16, 2021
@jnmugerwa jnmugerwa force-pushed the rewrite/approx-percentile branch 2 times, most recently from f7444b2 to 92a67b6 Compare March 16, 2021 20:09
@jnmugerwa jnmugerwa force-pushed the rewrite/approx-percentile branch 2 times, most recently from eea9344 to b4db47a Compare March 29, 2021 22:52
public class ApproxPercentileRewriter
extends Rewriter
{
private static final int ZERO_INDEXING_OFFSET = 1;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confusing naming.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APPROX_PERCENTILE is 1-indexed while the data structure we store our percentiles in is 0-indexing, making this offset necessary. I feel like this name is descriptive given that context?

extends Rewriter
{
private static final int ZERO_INDEXING_OFFSET = 1;
private static final String REPLACEMENT_FORMAT = " APPROX_PERCENTILE(%s, ARRAY%s)[%d]";
Copy link

@kaikalur kaikalur Mar 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use ARRAY[%s]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also don't add space padding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

.anyMatch(key -> firstArgToApproxPercentileNode.get(key).size() >= 2);
}

private boolean canApplyApproxPercentileRewrite(AstNode node)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be more like: needsApproxPercentileRewrite

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe needsRewrite and make it take more specific type: FunctionCall

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return firstArgToPercentiles.get(firstArg).indexOf(secondArg) + ZERO_INDEXING_OFFSET;
}

private static Optional<AstNode> getNthArgument(AstNode node, int n)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this to AstNode?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be moved to NodeUtils in a future PR.

return false;
}
Optional<String> image = Optional.ofNullable(identifier.get().GetImage());
return image.isPresent() && image.get().equalsIgnoreCase("APPROX_PERCENTILE");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a static final constant for the name of the function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return getNthArgument(node, 1).filter(astNode -> astNode.getId() == JJTUNSIGNEDNUMERICLITERAL).isPresent();
}

private static boolean isApproxPercentileNode(AstNode node)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it needed here? You should move it to the pattern matcher class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be moved to NodeUtils in a future PR.

{
if (isApproxPercentileNode(node) && hasUnsignedLiteralSecondArg(node)) {
String firstArg = unparse(getNthArgument(node, 0).get()).trim();
String secondArg = unparse(getNthArgument(node, 1).get()).trim();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to unparser the second argument?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed -- removed unparsing when we deal with secondArg.

"SELECT APPROX_PERCENTILE(x, ARRAY[0.1, 0.2, 0.3]) FROM (SELECT APPROX_PERCENTILE(x, 0.1) from T);"
};

private static final ImmutableMap<String, String> STATEMENT_TO_REWRITTEN_STATEMENT =

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simply have two arrays - one original sql and other one positionally matching rewritten sql. Or just array of pairs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented two arrays.

Done.

extends Rewriter
{
private static final int ZERO_INDEXING_OFFSET = 1;
private static final String REPLACEMENT_FORMAT = " APPROX_PERCENTILE(%s, ARRAY%s)[%d]";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also don't add space padding.

.anyMatch(key -> firstArgToApproxPercentileNode.get(key).size() >= 2);
}

private boolean canApplyApproxPercentileRewrite(AstNode node)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe needsRewrite and make it take more specific type: FunctionCall

@jnmugerwa jnmugerwa force-pushed the rewrite/approx-percentile branch 3 times, most recently from e251722 to e8204c4 Compare April 1, 2021 04:01
@jnmugerwa jnmugerwa requested a review from kaikalur April 1, 2021 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants