Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Groovy Parser correctly handle nested parenthesis #4718

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
b04247c
Skip ending parentheses when parsing visitMethodCallExpression
jevanlingen Nov 26, 2024
4191337
Stop duplicate code in method
jevanlingen Nov 26, 2024
8963f1e
Introduce `visitAndGetAnnotations` method
jevanlingen Nov 29, 2024
c352e7c
Rename `ctor` to `contructor_`
jevanlingen Nov 29, 2024
09aa9c3
Add marker imports
jevanlingen Nov 29, 2024
0de94eb
Add `ClosingParenthese` marker to add extra parentheses
jevanlingen Nov 29, 2024
2b638a8
Add ClosingParenthese marker to add extra parentheses
jevanlingen Dec 2, 2024
d7afb0c
Improve space printer
jevanlingen Dec 2, 2024
8376b4e
Calculate `_INSIDE_PARENTHESES_LEVEL` manually if metadata flag is mi…
jevanlingen Dec 3, 2024
da61c8d
improvement
jevanlingen Dec 3, 2024
d292e34
improvement
jevanlingen Dec 3, 2024
4704580
improvement
jevanlingen Dec 3, 2024
6951ed4
improvement
jevanlingen Dec 3, 2024
e93659f
getting somewhere
jevanlingen Dec 4, 2024
3d7d9ec
Make `countWrapperParenthesesForMethodCalls` private
jevanlingen Dec 4, 2024
d488126
Got the basics working
jevanlingen Dec 4, 2024
e0f2efa
Add comments to explain code
jevanlingen Dec 4, 2024
a1b1a39
Improvement
jevanlingen Dec 4, 2024
fde44e6
Improvement
jevanlingen Dec 5, 2024
05b07e9
Improvement
jevanlingen Dec 5, 2024
560a2a8
Add test from issue #4615
jevanlingen Dec 5, 2024
710a28d
Review comments
jevanlingen Dec 5, 2024
b8add9b
Remove formatting
jevanlingen Dec 5, 2024
2184912
Rename variable
jevanlingen Dec 5, 2024
cd6e229
Update rewrite-groovy/src/main/java/org/openrewrite/groovy/GroovyPars…
jevanlingen Dec 5, 2024
0a61029
Add extra test for comments
jevanlingen Dec 6, 2024
e20e95c
Merge branch 'main' into 4703-groovy-parser-does-not-correctly-handle…
jevanlingen Dec 6, 2024
368a9af
Merge branch 'main' into 4703-groovy-parser-does-not-correctly-handle…
jevanlingen Dec 6, 2024
19502b3
Starting on implementation 2
jevanlingen Dec 6, 2024
9b5a973
Starting on implementation 2
jevanlingen Dec 9, 2024
a1c30b2
Starting on implementation 2
jevanlingen Dec 9, 2024
3e95a86
Add comment [skip ci]
jevanlingen Dec 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
/*
* Copyright 2024 the original author or authors.
* <p>
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
* <p>
* https://www.apache.org/licenses/LICENSE-2.0
* <p>
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.openrewrite.groovy;
jevanlingen marked this conversation as resolved.
Show resolved Hide resolved

import org.codehaus.groovy.ast.expr.MethodCallExpression;
import org.jspecify.annotations.Nullable;
import org.openrewrite.internal.StringUtils;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

class GroovyParserParentheseDiscoverer {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reviewed in detail but at first glance it doesn't seem to be aware of comments so any parentheses in comments might throw it off

// Matches a code block including leading and trailing parenthesis
// Eg: ((((a.invoke~~>("arg")))))<~~ or ~~>(((((( "" as String )))))<~~.toString())
private static final Pattern PARENTHESES_GROUP = Pattern.compile(".*?(\\(+[^()]+\\)+).*", Pattern.DOTALL);
// Matches when string consist of multiple parenthese blocks
// Eg: `((a.invoke())` does not match, `((a.invoke() && b.invoke())` does match
private static final Pattern HAS_SUB_PARENTHESIS = Pattern.compile("\\(+[^)]+\\)+[^)]+\\).*", Pattern.DOTALL);

/**
* Calculates the insideParenthesesLevel for method calls, because the compiler does not set the _INSIDE_PARENTHESES_LEVEL flag for method calls.
*/
public static @Nullable Integer getInsideParenthesesLevel(MethodCallExpression node, String source, int cursor) {
String sourceFromCursor = source.substring(cursor);
Comment on lines +36 to +37
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm somewhat surprised to see source and cursor as separate arguments here when they are merely used to do this substring. Would it make sense to change the method signature here to

Suggested change
public static @Nullable Integer getInsideParenthesesLevel(MethodCallExpression node, String source, int cursor) {
String sourceFromCursor = source.substring(cursor);
public static @Nullable Integer getInsideParenthesesLevel(MethodCallExpression node, String sourceFromCursor) {

Copy link
Contributor Author

@jevanlingen jevanlingen Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had that before, but reverted it back to current state. I think it more clear if the GroovyParserParentheseDiscoverer is a completely separate entity. So no former logic is needed to determine the wrapping. Just pass everything you have (thus the node, the source code of the entire file and the current place you are parsing right now) and let the function discover it all.

Having the cursor argument also prevents from misinterpretation to function, because you need to provide the starting point (that is the cursor) of parsing.

Copy link
Contributor

@knutwannheden knutwannheden Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this can be simplified. For Python I had the same basic problem to solve, but I didn't have any "parentheses level" from the AST or similar. Additionally, in Python the parentheses for tuples and generator expressions are sometimes optional, which doesn't make things simpler.

Anyway, rather than counting the "parentheses levels" up front, I added something like a maybeParenthesized(node, delegate) method which does something like this:

  1. Save cursor and use whitespace() to parse whitespace (and comments) at cursor into a Space
  2. Check if the next character is a (
  3. If not, reset cursor to saved cursor and return result from delegate and we are done
  4. Otherwise push a lambda on a parentheses stack, which is a field on the parser visitor
  5. Call maybeParenthesized() recursively
  6. Again, save cursor, call whitespace() and check if we are at a )
  7. If yes, pop lambda from parentheses stack and apply it to result, which will wrap it in a J.Parentheses
  8. If not, reset cursor and return result

For a language like Java or Python, where we can expect the source files to be reasonably large, I would definitely do something like this, as the substringing and regexp matching won't be free in terms of performance. But for Groovy that would probably not be a big concern, as scripts tend to be small and few.

I will let you think a bit about this and don't want to force you into a different solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, here is the Python code in question. It isn't pretty, so in case you decide to give this approach a try, I would only look at the Python code if you get stuck, as you will probably find a cleaner solution for your parser: https://github.com/openrewrite/rewrite-python/blob/b42101961bbc9daf21d54fe803dd9eaec59c3184/rewrite/rewrite/python/_parser_visitor.py#L1950-L2025

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second Knut's comments and like the approach he describes. I worry that these regexes will be slow and error prone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The one tricky situation I can think of are Java-style type casts, as the first token there is also a (. So if the AST node is a Java-style type cast (assuming they can be distinguished from Groovy-style type casts in the AST), the logic could for example pop that last element from the stack and reset the cursor before calling the corresponding visit method (or do it in the method itself). There are also other alternative solutions.

Copy link
Contributor Author

@jevanlingen jevanlingen Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, your implementation seems like a better idea @knutwannheden. I would rather first merge this and then do a second round with improvements (or better said, go for the new/better implementation). Currently we don't support nested parenthesis, which just breaks scripts that use them. Having this sub-ideal implementation makes them at least runnable, though with a performance loss.

Then I can start working on the better implementation

--

Edit: Talked with @knutwannheden about this, we concluded it's better to not go forward yet:

If we aren't in a hurry to get this done, it will just make the Git history messier to go through, when someone tries to find out why some change was made in the past.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't quite as easy as I made it out to be and I now remember the struggles I had with Python here. The difficulty arises because we don't know at what AST level the parentheses are. Here that parentheses level may come in handy.


// start with a (` character with optional whitespace
if (sourceFromCursor.matches("(?s)^\\s*\\(.*")) {
String sourceNoWhitespace = sourceFromCursor.replaceAll("\\s+", "");
// grab the source code until method and closing parenthesis
Matcher m = Pattern.compile("(?s)(.*?" + node.getMethodAsString() + "[^)]*\\)+).*").matcher(sourceNoWhitespace);
if (m.matches()) {
// Replace all string literals with `<s>` to prevent weird matches with the PARENTHESES_GROUP regex
String s = m.group(1).replaceAll("\"[^\"]*\"", "<s>");
return GroovyParserParentheseDiscoverer.getInsideParenthesesLevelForMethodCalls(s);
}
}

return null;
}

private static int getInsideParenthesesLevelForMethodCalls(String source) {
String s = source;
// lookup for first code block with parenthesis
Matcher m = PARENTHESES_GROUP.matcher(s);
// check if source matches any code block with parenthesis + check if there are still inner parenthesis
while (m.matches() && HAS_SUB_PARENTHESIS.matcher(s).find()) {
int parenthesis = lowestParentheseLevel(m.group(1));
String part = m.group(1).replaceAll("\\(", "").replaceAll("\\)", "");
String regex = StringUtils.repeat("\\(", parenthesis) + part + StringUtils.repeat("\\)", parenthesis);
// remove parentheses and arguments in source code
s = s.replaceAll(regex, "");
// move to next possible code block with parenthesis
m = PARENTHESES_GROUP.matcher(s);
}

return lowestParentheseLevel(s);
}

private static int lowestParentheseLevel(String s) {
int leadingParenthesis = 0;
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if (c == ' ' || c == '\t' || c == '\n' || c == '\r') continue;
else if (c == '(') leadingParenthesis++;
else break;
}
int trailingParenthesis = 0;
for (int i = s.length() - 1; i >= 0; i--) {
char c = s.charAt(i);
if (c == ' ' || c == '\t' || c == '\n' || c == '\r') continue;
else if (c == ')') trailingParenthesis++;
else break;
}

return Math.min(leadingParenthesis, trailingParenthesis);
}
}
Loading