Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse numbers in Alex's parser, not tokenizer #200

Closed
wants to merge 1 commit into from
Closed

Conversation

Ericson2314
Copy link
Collaborator

In different contexts within Alex's surface syntax, something like
"2340898" might be a string of characters or a number. The contexts are
are only distinguished at the grammar level, not the token level, so
this more or less (we could very layer-violation-y tricks) precludes
lexing entire number literals.

Instead of a number token, we have a digit token. This we treat as
"sub-token", making a DIGIT | CHAR non-terminal we use everywhere we
want to parse a character.

For number literals, we just parse a non-empty string of numbers, and
the left recursion makes the * 10 elegant.

Fixes #197

@Ericson2314
Copy link
Collaborator Author

If you could review this, @andreasabel, that would be great!

In different contexts within Alex's surface syntax, something like
"2340898" might be a string of characters or a number. The contexts are
are only distinguished at the grammar level, not the token level, so
this more or less (we could very layer-violation-y tricks) precludes
lexing entire number literals.

Instead of a number token, we have a digit token. This we treat as
"sub-token", making a `DIGIT | CHAR` non-terminal we use everywhere we
want to parse a character.

For number literals, we just parse a non-empty string of numbers, and
the left recursion makes the `* 10` elegant.

Fixes #197

natnum :: { Int }
: digit { $1 }
| natnum digit { $2 * 10 + $1 }
Copy link
Member

@andreasabel andreasabel Jan 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be $1 * 10 + $2.
Is this what breaks #141: https://github.com/simonmar/alex/runs/4911840593?check_suite_focus=true#step:22:116 ?
Anyway, fixed this in #201.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh yes, thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I realized another problem with recognizing numbers in the parser. natnum happily accepts digit separated by spaces, like 1 4 in r{1 4,1 4} is happily accepted as 14 repetitions of r now.
I do not think we want to allow that.
How about a middle ground, recognizing numbers in the scanner, but not storing them as Integer, but as String, so we can get them back into character sequences. I'll play with this solution in #199 a bit...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use lexer states to only lex numerals inside the multiplicity-brackets {nnn,mmm}.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, and both of those look interesting to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

alex 3.2.7 fails to parse definitions which can pass alex 3.2.6
2 participants