Implementation of a modified Earley parser in C# inspired by the Marpa Parser project.
Pliant is a table driven parser that implements the Earley algorithm. Two optimizations are added to handle issues with the original Earley implementation:
- Optimization from Joop Leo to efficiently handle right recursions.
- Bug fix from Aycock and Horspool to handle nullable predictions
Add a reference to the Pliant libarary using NuGet
PM> Install-Package Pliant
using Pliant.Builders;
using Pliant.Builders.Expressions;
using Pliant.Grammars;
using Pliant.Automata;
public static int Main(string[] args)
{
var digits = CreateDigitLexerRule();
var whitespace = CreateWhitespaceLexerRule();
ProductionExpression
Calculator = "Calculator",
Factor = "Factor",
Term = "Term",
Expression = "Expression",
Number = "Number";
Calculator.Rule
= Expression ;
Expression.Rule
= Expression + '+' + Term
| Term ;
Term.Rule
= Term + '*' + Factor
| Factor ;
Factor.Rule
= Number ;
Number.Rule
= digits;
var grammar = new GrammarExpression(
Calculator,
new []{ Calculator, Factor, Term, Expression, Number },
new []{ new LexerRuleModel(whitespace) })
.ToGrammar();
// TODO: Use the grammar in a parse.
}
private static BaseLexerRule CreateDigitLexerRule()
{
var startState = new DfaState();
var finalState = new DfaState(true);
var digitTransition = new DfaTransition(new DigitTerminal(), finalState);
startState.AddTransition(digitTransition);
finalState.AddTransition(digitTransition);
return new DfaLexerRule(startState, "[\\d]+");
}
private static ILexerRule CreateWhitespaceLexerRule()
{
var whitespaceTerminal = new WhitespaceTerminal();
var startState = new DfaState();
var finalState = new DfaState(true);
var whitespaceTransition = new DfaTransition(whitespaceTerminal, finalState);
startState.AddTransition(whitespaceTransition);
finalState.AddTransition(whitespaceTransition);
return new DfaLexerRule(startState, new TokenType("[\\s]+"));
}
calculator.pdl
Calculator
= Expression;
Expression
= Expression '+' Term
| Term;
Term
= Term '*' Factor
| Factor;
Factor
= Number ;
Number
= Digits;
Digits ~ /[0-9]+/ ;
Whitespace ~ /[\s]+/ ;
:start = Calculator;
:ignore = Whitespace;
Program.cs
using Pliant.Languages.Pdl;
using Pliant.Runtime;
using System;
using System.IO;
namespace PdlExample
{
public static int Main (string[] args)
{
// load the grammar definition file (make sure to copy it to output directory if debugging)
var path = Path.Combine(Directory.GetCurrentDirectory(), "calculator.pdl");
var content = File.ReadAllText(path);
// parse the grammar definition file
var pdlParser = new PdlParser();
var definition = pdlParser.Parse(content);
// create the grammar, parser and scanner for our calculator language
var grammar = new PdlGrammarGenerator().Generate(definition);
var parser = new ParseEngine(grammar);
var calculatorInput = "5+30 * 2 + 1";
var scanner = new ParseRunner(parser, calculatorInput);
// run the scanner
if(!scanner.RunToEnd())
{
Console.WriteLine($"error parsing at line {scanner.Line + 1} column {scanner.Column}");
}
// parse the calculator input
var rootNode = parser.GetParseForestRootNode();
// TODO: use the parse node in the calculator interpreter
}
}
Using the calculator grammar from above, we can parse input by constructing a parse engine and parse runner instance.
var input = "1 + 1 * 3 + 2";
// use the calculator grammar from above
var parseEngine = new ParseEngine(grammar);
// use the parse runner to query the parse engine for state
// and use that state to select lexer rules.
var parseRunner = new ParseRunner(parseEngine, input);
// when a parse is recognized, the parse engine is allowed to move
// forward and continues to accept symbols.
var recognized = false;
var errorPosition = 0;
while(!parseRunner.EndOfStream())
{
recognized = parseRunner.Read();
if(!recognized)
{
errorPosition = parseRunner.Position;
break;
}
}
// For a parse to be accepted, all parse rules are completed and a trace
// has been made back to the parse root.
// A parse must be recognized in order for acceptance to have meaning.
var accepted = false;
if(recognized)
{
accepted = parseRunner.ParseEngine.IsAccepted();
if(!accepted)
errorPosition = parseRunner.Position;
}
Console.WriteLine($"Recognized: {recognized}, Accepted: {accepted}");
if(!recognized || !accepted)
Console.Error.WriteLine($"Error at position {errorPosition}");
The process for creating a parse tree is the same as recognizing input. In fact, when running the ParseEngine, a Sparsley Packed Parse Forest (SPPF) is created in the background. The parse forest is presented in a specialized format to promote density and allow for computational complexity similar to that of running the recognizer alone.
The easiest way to use the parse forest is use a internal node tree visitor on the parse forest root with a SelectFirstChildDisambiguationAlgorithm instance controling disambiguation of thet forest.
If the parse is ambiguous, you may want to supply a custom IForestDisambiguationAlgorithm.
// get the parse forest root from the parse engine
var parseForestRoot = parseEngine.GetParseForestRootNode();
// create a internal tree node and supply the disambiguation algorithm for tree traversal.
var parseTree = new InternalTreeNode(
parseForestRoot,
new SelectFirstChildDisambiguationAlgorithm());
- berkeley earley cs164
- washington edu ling571
- unt cse earley
- wikipedia
- optimizing right recursion
- marpa parser
- joop leo - optimizing right recursions
- parsing techniques - a practical guide
- practical earley parsing
- detailed description of leo optimizations and internals of marpa
- theory of Marpa Algorithm
- parse tree forest creation
- cs theory stackexchange, leo optimization parse tree creation
- insights on lexer creation
- incremental reparsing
- An extension of Earley's Algorithm for extended grammars
- Finding nullable productions in a grammar
- Directly-Executable Earley Parsing (Aycock and Horspool, 2001)
- Context Senitive Earley Parsing
- stack overflow question on parse forests
- stack overflow question on nullable non terminals