Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZOOKEEPER-4880: Generate comments from zookeeper.jute into code. #2206

Merged
merged 1 commit into from
Dec 7, 2024

Conversation

luozongle01
Copy link
Contributor

Generate comments from zookeeper.jute into code: https://issues.apache.org/jira/browse/ZOOKEEPER-4880

Brief Description

Support generating end-of-line comments and class comments in zookeeper.jute into code.

How to test this change

  1. By comparing the zookeeper-jute/target/generated-sources/java/org/apache/zookeeper and zookeeper-client/zookeeper-client-c/generated directories before and after the modification, there are no other differences except the comments.
image image

2. When the test class has multiple lines of comments, the comments can be generated normally. image

3. Testing vector type properties can generate comments normally. image

4. Testing class type properties can generate comments normally image

5. Testing comments that do not meet expectations will not generate comments and will not affect the generation of normal code. image

@luozongle01
Copy link
Contributor Author

@kezhuw Could you help me take a look? Thanks.

Copy link
Member

@kezhuw kezhuw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @luozongle01, thank you for your contribution!

Given the fact that jute and all targeting languages support both // and multi-line comments, I suggest we output whatever comments captured in origin sharp. With help of beginLine, beginColumn, endLine and endColumn of Token, I think we can reproduce comments in generated code with indentation adjusted.

while (tmpTkn != null) {
String image = tmpTkn.image.trim();
tmpTkn = tmpTkn.next;
if (image.startsWith("//")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The capture does not shaped well for directly usage. I tasted the following capture, and it captured a complete line comment. This way we can output whatever it captured without post work.

SPECIAL_TOKEN :
{
  <SINGLE_LINE_COMMENT: "//" (~["\n","\r"])* ("\n"|"\r"|"\r\n")>
}

See also https://javacc.github.io/javacc/tutorials/token-manager.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your comment. I will make some further modifications.😆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The capture does not shaped well for directly usage. I tasted the following capture, and it captured a complete line comment. This way we can output whatever it captured without post work.

SPECIAL_TOKEN :
{
  <SINGLE_LINE_COMMENT: "//" (~["\n","\r"])* ("\n"|"\r"|"\r\n")>
}

See also https://javacc.github.io/javacc/tutorials/token-manager.html

It seems like there's no need to change this now, because I get all the comments,then I use and distinguished them by line numbe. 😂😂

@luozongle01 luozongle01 force-pushed the ZOOKEEPER-4880 branch 4 times, most recently from 52d4194 to 0631b4f Compare October 29, 2024 15:48
@luozongle01
Copy link
Contributor Author

Hi, @kezhuw, I made some modifications, thank you for your review last time. I feel much better now than last time.
I hope you can help me look again.

Currently, multi-line comments and single-line comments are supported.

I put JField or JRecord into a TreeSet, so that I can get the line or column information of the previous or next element. According to the beginLine, beginColumn, endLine, endColumn information, I determine the range of my comment area, and then format the output in line number order.

Here is my testing process.

  1. Verify that the modification only affects the comments and does not affect the normal code.

  2. When verifying that each JField is on its own line, multi-line comments and single-line comments are generated normally.

image image
  1. If the attributes are on the same line, the comment should belong to the previous Field.
image

@kezhuw kezhuw self-requested a review November 6, 2024 09:33
Copy link
Member

@kezhuw kezhuw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@luozongle01 Thank you for your work!

It might be late, I am sorry for this, but I think we probably should align on how comments from jute should be formatted in generated sources ? I see two approaches.

  1. Keep it in origin format.
  2. Transfer comments from one format to another.

I saw this pr is going on the second approach. I am good to unify the generated comment format. But I think it could be challenge, as we should escape between two formats which could make generated comments strange/inaccurate.

    // /**
    //  * Some dead comments.
    //  */
    //
    // Things changed. We are going ..

Currently, the generated comments blindly strip some special strings. I don't think it is a good thing.

I suggest we keep it simple and don't touch comment content, this is also a safe path. If we want javadoc style comments, I think it is ok to convert them manually.

String[] commentArray = commentMulti.split("\n");
for (String comment : commentArray) {

comment = comment.replaceAll("//|/\\*\\*|/\\*|\\*\\*/|\\*/", "")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think people will want to maintain this kind of code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/do/don't/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think people will want to maintain this kind of code.
I'll amend that and I won't touch the comments so they're not needed here.

comment = comment.replaceAll("//|/\\*\\*|/\\*|\\*\\*/|\\*/", "")
.replaceAll("^[* ]+", "").trim();

if (!comment.isEmpty()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generated comments has no empty line which is not good.

@luozongle01
Copy link
Contributor Author

luozongle01 commented Nov 7, 2024

@luozongle01 Thank you for your work!

It might be late, I am sorry for this, but I think we probably should align on how comments from jute should be formatted in generated sources ? I see two approaches.

  1. Keep it in origin format.
  2. Transfer comments from one format to another.

I saw this pr is going on the second approach. I am good to unify the generated comment format. But I think it could be challenge, as we should escape between two formats which could make generated comments strange/inaccurate.

    // /**
    //  * Some dead comments.
    //  */
    //
    // Things changed. We are going ..

Currently, the generated comments blindly strip some special strings. I don't think it is a good thing.

I suggest we keep it simple and don't touch comment content, this is also a safe path. If we want javadoc style comments, I think it is ok to convert them manually.

@kezhuw Thank you very much for your review. I really didn't consider that 😂😂. I will keep the original format output.

However, for the end-of-line comments, should I continue to keep them at the end of the line or is it better to move them above the code? 🤔

@kezhuw
Copy link
Member

kezhuw commented Nov 7, 2024

for the end-of-line comments, should I continue to keep them at the end of the line or is it better to move them above the code?

I prefer to move them above the code.

@luozongle01
Copy link
Contributor Author

for the end-of-line comments, should I continue to keep them at the end of the line or is it better to move them above the code?

I prefer to move them above the code.

Thanks, I thought so too, I'll modify the code.

@luozongle01
Copy link
Contributor Author

Hi @kezhuw , I made some changes, could you take a look at it for me?

I won't touch the annotation content now.

But I still formatted it a little bit here, because for example, field needs a few spaces, class does not need spaces, and the number of spaces in java and c is also different.

  private static String getComments(String space, List<String> commentList) {
      if (commentList == null || commentList.isEmpty()) {
          return "";
      }

      String comments = String.join("", commentList);

      StringBuilder formatBuilder = new StringBuilder();
      for (String s : comments.split("\r?\n")) {
          formatBuilder.append("\n")
                  .append(space)
                  .append(s.replaceAll("^[ \t]+", ""));
      }

      return formatBuilder.append("\n").toString();
  }

Now the comment is like this
image
image

@kezhuw kezhuw self-requested a review November 9, 2024 08:08
@luozongle01
Copy link
Contributor Author

luozongle01 commented Nov 14, 2024

Hi @kezhuw, These days, I have researched how JavaParser and Protobuf handle comments again.

The Comment#content method of Javaparser can retrieve the content inside the comment, which will remove the first layer/** */ or /* */ or // .
I think I might also be able to obtain the internal content of comments in this way for formatting purposes.

The creatCommentFromToken method is the process of handling content
https://github.com/javaparser/javaparser/blob/master/javaparser-core/src/main/javacc-support/com/github/javaparser/GeneratedJavaParserTokenManagerBase.java


Then I also tested the process of protobuf handling comments.

protobuf will format /** */ or /* */ or // as /** */.
image

Will convert multi line comments nested within single line comments into Html Entity. (Because multi line comments cannot be nested within each other)

image image

So I think I just need to handle the following single line comments and a few situations

/*       /*        /**         /**
          *                     *
          *                     *
*/        */        */          */

For internal comments, I only need to convert/* and */ to HTML Entity (because multi line comments cannot be nested)

So can I make further modifications this way or are there any issues I haven't thought of?

Copy link
Member

@kezhuw kezhuw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! I left some inline comments regarding the code.

Regarding comment style, I would say zookeeper-jute is used only by ZooKeeper and is not adopted by other projects as IDL. It is enough for us to go here.

}

public void addJField(JField jField) {
jFieldTreeSet.add(jField);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extends RangeInfo is aggressive. addCommentRange(JXyz jXyz, RangeInfo rangeInfo) should be sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will modify it

return getCommentsList(jField, jFieldTreeSet, fieldCommentsTokenMap);
}

private static List<String> getCommentsList(RangeInfo rangeInfo,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect this should be much clear. Given that we support only comments in three places:

  1. Comments before class for record types.
  2. Comments before fields.
  3. Line ending comments for fields.

For the first, we could extract comments from Token::specialToken. For the second, similar to the first, but we have to drop comments(could be in either // or /* */ form) belong to previous field. For the third, additional lookup is required, also drop comments belong to next field.

@luozongle01 luozongle01 force-pushed the ZOOKEEPER-4880 branch 3 times, most recently from 74e3326 to a33fc24 Compare November 21, 2024 14:03
@luozongle01
Copy link
Contributor Author

Hi @kezhuw, Thank you for your review last time. The code feels much more concise now.
Could you take another look at it for me?

Copy link
Member

@kezhuw kezhuw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me in direction. I left some inline comments.

<WithinOneLineComment> MORE :
{
<~[]>
<WithinOneLineComment: "//" (~["\n","\r"])* ("\n"|"\r"|"\r\n")>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "ONE_LINE_COMMENT" ? It is as a "Token.kind" constant in generated in "RccConstants".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "ONE_LINE_COMMENT" ? It is as a "Token.kind" constant in generated in "RccConstants".

Yes, I'll make some modifications here.

return getFieldComments(jField, " ");
}

private String getFieldComments(JField jField, String space) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/space/indent/ ?

Copy link
Contributor Author

@luozongle01 luozongle01 Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/space/indent/ ?

This is because org.apache.jute.compiler.JType#genCDecl also contains spaces, so I also used spaces. 😂

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion! I means it should be better to rename variable space to indent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion! I means it should be better to rename variable space to indent.

Okay, I will make some modifications here.

return formatBuilder.append("\n").toString();
}

private static List<String> getComments(Token token) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, all leading spaces in mult-line comment are dropped in generation. Can we reconstruct these for multi-line comment ?

Or how about <MULTI_LINE_COMMENT: "/*" (~["*"])* "*" (~["*","/"] (~["*"])* "*" | "*")* "/"> ? (https://stackoverflow.com/questions/34550579/multiline-comments-in-javacc)

I find no hole despite there is one comment say that "not correct with respect to the language definition".

I think it is easy to reconstruct original format if multi-line comment is captured into one token.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all leading spaces in mult-line comment are dropped in generation

/**
 ** Something.
 **/

For example, above comment will be formatted as following.

/**
** Something.
**/

Copy link
Member

@kezhuw kezhuw Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or how about <MULTI_LINE_COMMENT: "/" (~[""])* "" (~["","/"] (~[""]) "" | "")* "/"> ? (https://stackoverflow.com/questions/34550579/multiline-comments-in-javacc)

I think it is easy to reconstruct original format if multi-line comment is captured into one token.

I am not sure the LOOKAHEAD could help here. I tried but did not succeed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, then I'll research and see if I can capture multi line comments as one token.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, all leading spaces in mult-line comment are dropped in generation. Can we reconstruct these for multi-line comment ?

Or how about <MULTI_LINE_COMMENT: "/*" (~["*"])* "*" (~["*","/"] (~["*"])* "*" | "*")* "/"> ? (https://stackoverflow.com/questions/34550579/multiline-comments-in-javacc)

I find no hole despite there is one comment say that "not correct with respect to the language definition".

I think it is easy to reconstruct original format if multi-line comment is captured into one token.

This way, multiple lines of comments can be obtained into one token.

MORE :
{
  <ENTER_JAVADOC_COMMENT: "/**" ~["/"]> { input_stream.backup(1); } : IN_JAVADOC_COMMENT
|
  <ENTER_MULTILINE_COMMENT: "/*"> : IN_MULTI_LINE_COMMENT
}

<IN_JAVADOC_COMMENT>
SPECIAL_TOKEN :
{
  <JAVADOC_COMMENT: "*/" > : DEFAULT
}

<IN_MULTI_LINE_COMMENT>
SPECIAL_TOKEN :
{
  <MULTI_LINE_COMMENT: "*/" > : DEFAULT
}

<IN_JAVADOC_COMMENT, IN_MULTI_LINE_COMMENT>
MORE :
{
  <COMMENT_CONTENT: ~[] >
}

But there is still a small issue, it seems that there is no way to remove the space at the beginning of each middle line, and there is no space before the first line of multi line comments. 😂😂
image
image

kezhuw added a commit to kezhuw/zookeeper that referenced this pull request Nov 26, 2024
Changes:
1. No change to one-line comment capture.
2. Change only indentation and line separator for multi-line comment.

The reconstruction might be a bit hard given the current comment
capture from apache#2206. I am OK to both approches.
@luozongle01
Copy link
Contributor Author

luozongle01 commented Nov 27, 2024

Hi @kezhuw , Thank you for providing the example.
I didn't expect this to reconstruct the style of the jute. 😂 This is much better than before. I have made modifications based on your example and tested it.

return getFieldComments(jField, " ");
}

private String getFieldComments(JField jField, String space) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion! I means it should be better to rename variable space to indent.

@luozongle01 luozongle01 force-pushed the ZOOKEEPER-4880 branch 2 times, most recently from 3536f54 to 108ae97 Compare November 28, 2024 01:56
@luozongle01
Copy link
Contributor Author

Hi @kezhuw , I have completed the modifications and testing, can you help me take a look again

Copy link
Member

@kezhuw kezhuw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @luozongle01, thank you for the continuous update! I leave some comments regarding concrete implementation.

{
f.setTypeToken(typeTkn);
prevFieldSetNextTkn(typeTkn);
prevField = f;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prevField is not used outside Record, it should be a local also.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about replace nextToken with semicolonToken ? I think it is suitable:

  1. Semicolon is the end of field declaration.
  2. semicolonToken.next is the next token. We can extract possible end of line comments from there.

This way we don't need prevField anymore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about replace nextToken with semicolonToken ? I think it is suitable:

  1. Semicolon is the end of field declaration.
  2. semicolonToken.next is the next token. We can extract possible end of line comments from there.

This way we don't need prevField anymore.

Ah, I see. Sorry I didn't think of writing this before.🤣🤣

@luozongle01
Copy link
Contributor Author

Hi @kezhuw , I have finished the revision. Thank you very much for your review. Since I may not have much work experience, there may be many small problems in my code. Thank you for pointing out where I can improve.😆

Copy link
Member

@kezhuw kezhuw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your review. Since I may not have much work experience, there may be many small problems in my code. Thank you for pointing out where I can improve.😆

Never mind! I believe we are in good direction. I have left inline comments for latest update.

f.setTypeToken(typeTkn);
f.setPreviousToken(previousToken);
f.setNextToken(getToken(1));
previousToken = typeTkn;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal. But I think the previousToken should be <SEMICOLON_TKN>.

Copy link
Member

@kezhuw kezhuw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

int field; /* A multi-line comment. */ /* Another multi-line comment. */

I find that we are not handing above case. I left comment about this.

@luozongle01
Copy link
Contributor Author

int field; /* A multi-line comment. */ /* Another multi-line comment. */

I find that we are not handing above case. I left comment about this.

Your testing is incredibly thorough! I hadn’t considered these cases before. I’m sorry that my code had so many issues and caused you so much extra work. 😂😂😂
I added a few more test cases.

Copy link
Member

@kezhuw kezhuw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Could you squash this pr to one commit ?

@luozongle01
Copy link
Contributor Author

LGTM

Could you squash this pr to one commit ?

Thanks, I have already squash it to one commit.

Copy link
Member

@tisonkun tisonkun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for this neat improvement!

@kezhuw kezhuw merged commit d1d57c4 into apache:master Dec 7, 2024
14 checks passed
@kezhuw
Copy link
Member

kezhuw commented Dec 7, 2024

@luozongle01 Thank you for your contribution! Merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants