Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Joop Leo's optimization? #24

Open
orlp opened this issue Dec 22, 2018 · 1 comment
Open

Joop Leo's optimization? #24

orlp opened this issue Dec 22, 2018 · 1 comment

Comments

@orlp
Copy link

orlp commented Dec 22, 2018

It is stated that YAEP doesn't use Joop Leo's optimization, and that right recursion is fast anyways. And because of the lookahead I do believe this to be true for some grammars, but not all LR(k) grammars. For example, the following LR(2) grammar:

S -> A 'a' 'b'
A -> 'a' A
A ->

Generates increasingly large set cores when parsing "aaa...aaab", leading to (at least) quadratic behavior.

Perhaps an optimization similar to Joop Leo's could be made, making YAEP linear time for all LR(k) grammars?

@vnmakarov
Copy link
Owner

Thank you for your feedback.

Yes, in brief, I consider implementation of Joop Leo's optimization
or something similar.

In fact, I believe Joop Leo's work is more important that one
described in Practical Earley Parser article. Saying that I don't
know when I actually will start to work on it and when it could be
ready. It is not a high priority project for me.

I tried to implement an Earley Parser which could be useful for
programming languages implementation. Therefore I spend more efforts
on implementing simple direct translation and a good error recovery. I
also don't believe that LR(k) parsers (where k > 1) are useful for such
tasks. Usually, if a programming language grammar is not LR(1), it is
also not LR(k) k > 2 (e.g. C or C++).

But still it would be nice to parse LR(k > 1)-language by Earley
parser in linear time.

Although I wanted to implement my Earley parser as a swiss knife for
parser implementations, I found that its usage requires a lot of
expertise for non-trivial parsers. It is easy to write an ambiguous
grammar (e.g. during modification of an original grammar) when you
have not intention to do this.

By the way, I have also yacc-like parser MSTA
(https://github.com/dino-lang/dino/tree/master/MSTA) which can
generates effective LR(k), k > 1 parsers. This compiler-compiler does
a lot optimizations, like extracting LALR and regular parts of grammar
and implements parsing of them more effectively. Deterministic parser
generators are more safe for the problem I mentioned above. So I'd
prefer to use MSTA than my Earley parser for a typical programming
language implementation.

But even more I prefer to write manual parsers with backtracking if
necessary because I like to use original grammars from the programming
language definition. Other approaches require a lot of grammar
modifications (e.g. original Javascript grammar is actually a grammar
generator or can be considered as context sensitive one).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants