Parse numbers in Alex's parser, not tokenizer #200

Ericson2314 · 2022-01-23T09:04:10Z

In different contexts within Alex's surface syntax, something like
"2340898" might be a string of characters or a number. The contexts are
are only distinguished at the grammar level, not the token level, so
this more or less (we could very layer-violation-y tricks) precludes
lexing entire number literals.

Instead of a number token, we have a digit token. This we treat as
"sub-token", making a DIGIT | CHAR non-terminal we use everywhere we
want to parse a character.

For number literals, we just parse a non-empty string of numbers, and
the left recursion makes the * 10 elegant.

Fixes #197

Ericson2314 · 2022-01-23T09:05:21Z

If you could review this, @andreasabel, that would be great!

In different contexts within Alex's surface syntax, something like "2340898" might be a string of characters or a number. The contexts are are only distinguished at the grammar level, not the token level, so this more or less (we could very layer-violation-y tricks) precludes lexing entire number literals. Instead of a number token, we have a digit token. This we treat as "sub-token", making a `DIGIT | CHAR` non-terminal we use everywhere we want to parse a character. For number literals, we just parse a non-empty string of numbers, and the left recursion makes the `* 10` elegant. Fixes #197

andreasabel · 2022-01-23T18:14:49Z

src/Parser.y

+
+natnum	:: { Int }
+	: digit				{ $1 }
+	| natnum digit			{ $2 * 10 + $1 }


Should be $1 * 10 + $2.
Is this what breaks #141: https://github.com/simonmar/alex/runs/4911840593?check_suite_focus=true#step:22:116 ?
Anyway, fixed this in #201.

Ahh yes, thanks.

Oh, I realized another problem with recognizing numbers in the parser. natnum happily accepts digit separated by spaces, like 1 4 in r{1 4,1 4} is happily accepted as 14 repetitions of r now.
I do not think we want to allow that.
How about a middle ground, recognizing numbers in the scanner, but not storing them as Integer, but as String, so we can get them back into character sequences. I'll play with this solution in #199 a bit...

Maybe we can use lexer states to only lex numerals inside the multiplicity-brackets {nnn,mmm}.

Good point, and both of those look interesting to me.

Ericson2314 force-pushed the fix-197 branch from 1f3d20a to ee4daec Compare January 23, 2022 09:08

This was referenced Jan 23, 2022

WIP #197: add rules to interpret number literals as character sequences #199

Closed

Parse numbers in Alex's parser, not tokenizer #201

Closed

andreasabel reviewed Jan 23, 2022

View reviewed changes

Ericson2314 closed this Jan 23, 2022

andreasabel mentioned this pull request Jan 23, 2022

Fix #197 by only lexing numeric literals in multiplicity expressions. #202

Merged

Ericson2314 deleted the fix-197 branch January 23, 2022 22:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parse numbers in Alex's parser, not tokenizer #200

Parse numbers in Alex's parser, not tokenizer #200

Uh oh!

Ericson2314 commented Jan 23, 2022

Uh oh!

Ericson2314 commented Jan 23, 2022

Uh oh!

andreasabel Jan 23, 2022 •

edited

Loading

Uh oh!

Ericson2314 Jan 23, 2022

Uh oh!

andreasabel Jan 23, 2022

Uh oh!

andreasabel Jan 23, 2022

Uh oh!

Ericson2314 Jan 23, 2022

Uh oh!

Uh oh!

Parse numbers in Alex's parser, not tokenizer #200

Parse numbers in Alex's parser, not tokenizer #200

Uh oh!

Conversation

Ericson2314 commented Jan 23, 2022

Uh oh!

Ericson2314 commented Jan 23, 2022

Uh oh!

andreasabel Jan 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ericson2314 Jan 23, 2022

Choose a reason for hiding this comment

Uh oh!

andreasabel Jan 23, 2022

Choose a reason for hiding this comment

Uh oh!

andreasabel Jan 23, 2022

Choose a reason for hiding this comment

Uh oh!

Ericson2314 Jan 23, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andreasabel Jan 23, 2022 •

edited

Loading