-
Notifications
You must be signed in to change notification settings - Fork 12
From PEG to rsec
Luikore edited this page Jul 5, 2017
·
6 revisions
PEG stands for Parsing Expression Grammar.
PEG meaning rsec
ε the empty string parser.eof
a terminal (a ∈ Σ) 'a'.r
A non-terminal (A ∈ N) parser
e1 e2 sequence seq(e1, e2)
e1 / e2 prioritized choice e1 | e2
e? optional e._? [alias e.maybe]
e* zero-or-more e.star
e+ one-or-more e * (1..-1)
&e lookahead predicate &e
!e negative lookahead ^e
- Treetop allows you to define an arbitrary number of functions associated with a rule.
Rules in Treetop only have a value when a function is defined for them. Rsec can insert function
definitions into its result array (https://github.com/luikore/rsec/wiki/Tricks), but it has a default
return value for its rules:
- A string for a string terminal symbol
- A number for a prim terminal symbol
- A string for the parent non-terminals of sequences of terminal symbols.
- A list for a seq or a join operation
- A list for a repeat operator (star, maybe, *)
- This means that any repeat operators on characters define lists of characters, not strings; to obtain
strings, append
.map(&:join)
.
- The default return value is overridden through the map function. The map function is implicit if following a seq operator.
- The negative lookahead term in Treetop precedes its term; the negative lookahead term in Rsec follows its
term. So
(!x y)
in Treetop corresponds toseq('' ^ x , y)
in Rsec. - Treetop has a text_value operator. Rsec defaults to text_value not only for its string terminals, but for
non-terminals that are not sequences or repetitions. So a Treetop text_value operator, such as
{def content return elements[0].text_value end}
, is redundant in Rsec. - Treetop has an elements array defined for each rule. Rsec only requires an element count occasionally,
for sequence parsers and repetitions, and can access elements through the [] operator; e.g.
seq(''.r ^ x , y )[1]
extracts the y value after a negative lookahead (the negative lookahead term itself is empty). The block idiomatically already contains a list of arguments corresponding to the elements of the sequence parser; e.g.seq('BEGIN:'.r , ianaToken , /[\r\n]/) {|begintoken, token, crlf| ... }
- Treetop can merge nonterminals with a single child with the child, even if the single child is one of a
set of options. Rsec on the other hand expects any definitions of term results to be interspersed for choices.
So where Treetop would have
(x / y z / a b c) 1..1 { def content ....}
, with rules to detect which choice has been applied, Rsec will havex.map { ... } | seq(y, z) { ...} | seq(a, b, c) { ... }
- Treetop is indifferent about the order in which rules are defined. Rsec expects that rules will refer to
rules that have already been defined; any forward references or recursive references need to be enclosed as
lazy{}
. - Treetop is uniformly greedy. Rsec is usually greedy, but the
x.until
operator is equivalent to a non-greedy.*? x
. - Terminal symbols need to have an .r operator appended in order to be treated as Rsec elements, e.g. to have
their value propagated to non-terminals, or to have other operators such as
.star
applied to them.