-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse numbers in Alex's parser, not tokenizer #200
Conversation
If you could review this, @andreasabel, that would be great! |
In different contexts within Alex's surface syntax, something like "2340898" might be a string of characters or a number. The contexts are are only distinguished at the grammar level, not the token level, so this more or less (we could very layer-violation-y tricks) precludes lexing entire number literals. Instead of a number token, we have a digit token. This we treat as "sub-token", making a `DIGIT | CHAR` non-terminal we use everywhere we want to parse a character. For number literals, we just parse a non-empty string of numbers, and the left recursion makes the `* 10` elegant. Fixes #197
|
||
natnum :: { Int } | ||
: digit { $1 } | ||
| natnum digit { $2 * 10 + $1 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be $1 * 10 + $2
.
Is this what breaks #141: https://github.com/simonmar/alex/runs/4911840593?check_suite_focus=true#step:22:116 ?
Anyway, fixed this in #201.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh yes, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I realized another problem with recognizing numbers in the parser. natnum
happily accepts digit separated by spaces, like 1 4
in r{1 4,1 4}
is happily accepted as 14 repetitions of r
now.
I do not think we want to allow that.
How about a middle ground, recognizing numbers in the scanner, but not storing them as Integer
, but as String
, so we can get them back into character sequences. I'll play with this solution in #199 a bit...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can use lexer states to only lex numerals inside the multiplicity-brackets {nnn,mmm}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, and both of those look interesting to me.
In different contexts within Alex's surface syntax, something like
"2340898" might be a string of characters or a number. The contexts are
are only distinguished at the grammar level, not the token level, so
this more or less (we could very layer-violation-y tricks) precludes
lexing entire number literals.
Instead of a number token, we have a digit token. This we treat as
"sub-token", making a
DIGIT | CHAR
non-terminal we use everywhere wewant to parse a character.
For number literals, we just parse a non-empty string of numbers, and
the left recursion makes the
* 10
elegant.Fixes #197