Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of error recovery? #2

Open
thisiscam opened this issue Mar 19, 2016 · 3 comments
Open

Implementation of error recovery? #2

thisiscam opened this issue Mar 19, 2016 · 3 comments

Comments

@thisiscam
Copy link

Hi,

This project looks very interesting to me as I'm trying to build earley parsers with different strategies. @vnmakarov I was wondering if it's possible for you to briefly mention how the error handling algorithm works? Or a few pointers to where related code relies will also be useful!

Great project!

Thanks and Best!

@vnmakarov
Copy link
Owner

Sorry for the delay with the answer.

Core of YAEP source code is more than 10 years old. It is hard for me to remember the details.

In brief, it is a gready search technique with removing searches which are already known to result in worst cost recovery. One search step is the following

  1. goto next parse list state with . error in it. Shift error getting a new state
  2. Try to find N tokens which are accepted after the new state
  3. Calculate the recovery cost which includes # of tokens when we go to back to the state with . error
    and # tokens rejected before we accept N tokens

As I remember the algorithm does permit several error shifts during one search step but it does not add tokens (supposedly missed in the input) only rejecting existing ones.

Building abstract tree and error recovery are the most complicated parts of YAEP.

Actually I wanted to write an article about the algorithms when I wrote YAEP but I am not in a research business and hate to be rejected by some conference. If that time we had arxiv.org, I defintely worte the article for it. Still I am considering it but I have no time for this as it also requires a deep analysis what I wrote in YAEP.

@thisiscam
Copy link
Author

@vnmakarov Many thanks for your response!

A few question regarding some details just to make sure I understood them correctly:

  1. By "goto next parse list state", you meant backtrack to the closest earley sets that has an earley item with an . error ... rule am I right?
  2. "Try to find N tokens" is achieved by basically attempting to continue the parse and see if the parser cursor can move ahead N tokens? If it fails, we basically revert the entire parser and find another ". error" rule right?
  3. By "does not permit multiple error shift", do you mean if a "parse list state" contains multiple . error ...s, you only shift on one of them at a time?

@vnmakarov
Copy link
Owner

A few question regarding some details just to make sure I understood them correctly:

By "goto next parse list state", you meant backtrack to the closest earley sets that has an earley item with an . error ... rule am I right?

Yes. But as we can have nested error recovery steps, we go back to the closest state but not further the state with .error of the parent error recovery step

"Try to find N tokens" is achieved by basically attempting to continue the parse and see if the parser cursor can move ahead N tokens? If it fails, we basically revert the entire parser and find another ". error" rule right?

Successful matching N tokens means the end of error recovery. Shifting less than N tokens during an error recovery step we can have new states with .error. So in this case we can start nested error recovery steps inside of given error recovery step.

By "does not permit multiple error shift", do you mean if a "parse list state" contains multiple . error ...s, you only shift on one of them at a time?

Sorry, I wrote the opposite "does permit multiple error shift". As I wrote during one error recovery step we can have nested error recovery steps. It means possibility of multiple shifts by error in finally found error recovery

I should acknowledge the algorithm is complicated but it produces minimal cost error recovery when we permits more one shift by error during error recovery but does not permit inserting new tokens (probably missed in an input).

Inserting new tokens in some cases might result in a better error recovery (with smaller error recovery cost) but my algorithm does not do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants