Parsing with Zippers

Parsing with Zippers (PwZ) is an extension of Parsing with Derivatives (Might et al. 2011) using McBride derivatives (McBride 2001) and zippers (Huet 1997).

This repository is an implementation of two extensions to the algorithm presented in our paper, namely:

A global worklist
One-token downward lookahead

Building

The code is written in OCaml and was tested with OCaml 4.06.1.

There are two executables that can be built: worklist and lookahead which correspond to the algorithms implemented in pwZ_Worklist.ml and pwZ_WorklistWithLookahead.ml, respectively.

To build both, simply run make. They can also be built independently using make worklist or make lookahead. (Note that lookahead is simply a short name for the algorith which uses both a worklist and lookahead.)

To clean, run make clean or (if you wish to remove the executables) make clean-all.

Running

Once the executables are built, they can be run from the command line. They each expect a single string representing a sequence of tokens for a simple arithmetic grammar (detailed below). For example:

$ ./lookahead "6 * 9"

The program will print the resulting parse tree if the parse is successful or a blank line if there was not a successful parse.

Arithmetic Grammar

The included grammar is very straightforward. It is:

NUM     ::= <INT>
PAREN   ::= '(' EXPR ')'
ATOM    ::= NUM
          | PAREN

MULT    ::= TERM '*' TERM
T_ATOM  ::= ATOM
TERM    ::= MULT
          | T_ATOM

E_TERM  ::= TERM
ADD     ::= EXPR '+' EXPR
EXPR    ::= E_TERM
          | ADD

(<INT> represents any positive integer (a terminal), and items in single quotes are terminals.)

Some examples of valid expressions:

42
1 + 3
4 * 3 + 2
(3 + 4) * 7

This grammar is written in two files: arithGrammar_Worklist.ml and arithGrammar_WorklistWithLookahead.ml. The latter file is identical to the first except that each expression is also given a first array, where each element corresponds to a token tag in the grammar and determines whether that token is in the FIRST set of the respective expression.

Note that token tag 0 is the EOF, which is reserved for use by the parse function that wraps the algorithm. It should not be used by user-implemented tokens.

The token tags are defined in arithTags.ml.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
arithGrammar_Worklist.ml		arithGrammar_Worklist.ml
arithGrammar_WorklistWithLookahead.ml		arithGrammar_WorklistWithLookahead.ml
arithTags.ml		arithTags.ml
arithTokenizer.ml		arithTokenizer.ml
lookahead.ml		lookahead.ml
pwZ_Worklist.ml		pwZ_Worklist.ml
pwZ_WorklistWithLookahead.ml		pwZ_WorklistWithLookahead.ml
pwZ_WorklistWithLookahead_Help.ml		pwZ_WorklistWithLookahead_Help.ml
pwZ_Worklist_Help.ml		pwZ_Worklist_Help.ml
worklist.ml		worklist.ml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parsing with Zippers

Building

Running

Arithmetic Grammar

About

Releases

Packages

Languages

License

Ucombinator/parsing-with-zippers

Folders and files

Latest commit

History

Repository files navigation

Parsing with Zippers

Building

Running

Arithmetic Grammar

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages