Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Left-recursion allowed in circumstances that will always lead to a stack overflow. #54

Open
otac0n opened this issue Sep 21, 2014 · 2 comments

Comments

@otac0n
Copy link
Owner

otac0n commented Sep 21, 2014

expr -memoize
  = #STATE{ state["any"] = 1; } expr "a"
  / "b"

Since this changes the state of the cursor just before recursing into expr, and since memoization is used as an implementation detail of left-recursion, this alters the state of the left-recursion algorithm.

That's bad.

Basically, any left recursive call that can be preceded by a state change is a no-no.

Need to add a detector and a compiler error for this.

@otac0n otac0n added the bug label Sep 21, 2014
@otac0n otac0n self-assigned this Sep 21, 2014
@otac0n otac0n added this to the 3.1.1 milestone Sep 21, 2014
@otac0n
Copy link
Owner Author

otac0n commented Oct 12, 2014

I'm not sure if this is strictly computable, but my hunch is that it is.

Here's my half-baked algorithm for figuring this out:

  1. Build a control flow graph of the entire grammar.
  2. Find all nodes relating to #STATE expressions in the graph.
  3. Find all succeeding nodes from those state nodes.
  4. Mark these nodes as being reached by the aforementioned #STATE expression.
  5. Repeat the following process until no reachable nodes can be added:
    1. Find any zero width node that is reached by a #STATE expression
    2. Find the succeeding nodes from those nodes
    3. Mark these nodes as being reached by the #STATE expressions
  6. For each left-recursive rule
    1. Find all paths* through the control flow graph that reach a name node referring to the rule.
    2. Filter these paths to just those that contain only zero-width expressions
    3. Filter these paths to just those that end with a node that is reachable by any #STATE expression
    4. The remaining paths are guaranteed to stack overflow (unless the programmer adds code expressions to prevent it).

The definition of *all paths above is dubious, since there could easily be infinitely many paths through the graph. It may be sufficient (since we are talking about zero-width expressions) to use all strongly connected zero-width paths that touch a rule starting node and touch a name expression for that same rule.

In any case, this should only ever be a compiler warning, since code expressions can technically avoid it, but the workbench needs to be hardened against stack overflows.

@otac0n
Copy link
Owner Author

otac0n commented Oct 12, 2014

Oh yeah, and here is possibly the most mind-bending error case:

expr -memoize
  = expr expr "foo"
  / #STATE{ state["bar"] = 1; }

@otac0n otac0n modified the milestones: 3.1.1, 3.1.2 Mar 1, 2015
@otac0n otac0n removed this from the 3.1.2 milestone Aug 27, 2015
@otac0n otac0n removed their assignment Sep 4, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant