Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add indentation-aware TokenBuilder and Lexer #1578

Merged
merged 14 commits into from
Jul 18, 2024
Merged

Add indentation-aware TokenBuilder and Lexer #1578

merged 14 commits into from
Jul 18, 2024

Conversation

aabounegm
Copy link
Member

@aabounegm aabounegm commented Jul 14, 2024

This is useful for languages where indentation is used as a block delimiter, such as Python.
I will try to add a recipe for it in the documentation (and maybe some unit tests) later on.

The code was heavily inspired by the Python indentation chevrotain example, but modified to be more extensible and hopefully fitting for Langium.

Usage example (just a snippet, not full grammar):

fragment Body:
  INDENT
  statements+=Statement
  DEDENT
;

If: 'if' cond=Expr ':' Body;

hidden terminal NEW_LINE: /\r?\n/;
terminal INDENT: 'synthetic:indent'; // The content doesn't really matter.
terminal DEDENT: 'synthetic:dedent'; // It will be overridden anyway
hidden terminal WS: /[ \t]+/; // whitespace and newlines now have to be separated

The main drawback is that the whole language now becomes indentation-sensitive, even areas which would normally not be white-space sensitive, such as the following Python example:

numbers = [
  1,  // unexpected indentation
  2,
]

Multi-mode lexing can be leveraged to overcome this, and if I find a generic way to integrate it with the code from this PR, I will follow it up with another one that does so.

Closes #1016.
Related to #608, #663, #782, and #1085.

This is useful for languages where indentation is used as a block delimiter, such as Python
For consistency with the rest of the codebase
@aabounegm
Copy link
Member Author

Whoops, looks like I messed up a bit with the imports there. That's what I get for using github.dev without cloning locally, I guess 😅. I'll fix it in a sec

@aabounegm
Copy link
Member Author

@msujew Should hopefully be fine now 😄

Copy link
Member

@msujew msujew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @aabounegm!

Do you mind creating some tests cases for the new lexer?

packages/langium/src/parser/lexer.ts Outdated Show resolved Hide resolved
packages/langium/src/parser/lexer.ts Outdated Show resolved Hide resolved
packages/langium/src/parser/token-builder.ts Outdated Show resolved Hide resolved
packages/langium/src/parser/token-builder.ts Outdated Show resolved Hide resolved
packages/langium/src/parser/token-builder.ts Outdated Show resolved Hide resolved
@aabounegm
Copy link
Member Author

Sure, no problem, but that might take me some time (not sure if I will be able to manage it today)

@msujew
Copy link
Member

msujew commented Jul 14, 2024

@aabounegm All good, take your time. I appreciate it!

Copy link
Contributor

@spoenemann spoenemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! From a code organization standpoint, I'd prefer the two new service classes to be in a new file parser/indentation-aware.ts.

@aabounegm
Copy link
Member Author

Thanks for your comments, I have now added unit tests and moved the new classes to a new file as suggested.
Please feel free to nitpick, or to edit directly if you so prefer

Copy link
Member

@msujew msujew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aabounegm, looks pretty good to me 👍

@msujew msujew requested a review from spoenemann July 17, 2024 14:19
Copy link
Member Author

@aabounegm aabounegm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your fixes!

packages/langium/test/parser/indentation-aware.test.ts Outdated Show resolved Hide resolved
Copy link
Contributor

@spoenemann spoenemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Indentation tokenization based on terminal annotations
3 participants