Skip to content
Luikore edited this page Jul 5, 2017 · 6 revisions

PEG stands for Parsing Expression Grammar.

PEG         meaning	                        rsec
ε           the empty string                parser.eof
a           terminal (a ∈ Σ)               'a'.r
A           non-terminal (A ∈ N)           parser
e1 e2       sequence                        seq(e1, e2)
e1 / e2     prioritized choice              e1 | e2
e?          optional                        e._? [alias e.maybe]
e*          zero-or-more                    e.star
e+          one-or-more                     e * (1..-1)
&e          lookahead predicate             &e
!e          negative lookahead              ^e

From Treetop to rsec

  • Treetop allows you to define an arbitrary number of functions associated with a rule. Rules in Treetop only have a value when a function is defined for them. Rsec can insert function definitions into its result array (https://github.com/luikore/rsec/wiki/Tricks), but it has a default return value for its rules:
    • A string for a string terminal symbol
    • A number for a prim terminal symbol
    • A string for the parent non-terminals of sequences of terminal symbols.
    • A list for a seq or a join operation
    • A list for a repeat operator (star, maybe, *)
    • This means that any repeat operators on characters define lists of characters, not strings; to obtain strings, append .map(&:join).
  • The default return value is overridden through the map function. The map function is implicit if following a seq operator.
  • The negative lookahead term in Treetop precedes its term; the negative lookahead term in Rsec follows its term. So (!x y) in Treetop corresponds to seq('' ^ x , y) in Rsec.
  • Treetop has a text_value operator. Rsec defaults to text_value not only for its string terminals, but for non-terminals that are not sequences or repetitions. So a Treetop text_value operator, such as {def content return elements[0].text_value end}, is redundant in Rsec.
  • Treetop has an elements array defined for each rule. Rsec only requires an element count occasionally, for sequence parsers and repetitions, and can access elements through the [] operator; e.g. seq(''.r ^ x , y )[1] extracts the y value after a negative lookahead (the negative lookahead term itself is empty). The block idiomatically already contains a list of arguments corresponding to the elements of the sequence parser; e.g. seq('BEGIN:'.r , ianaToken , /[\r\n]/) {|begintoken, token, crlf| ... }
  • Treetop can merge nonterminals with a single child with the child, even if the single child is one of a set of options. Rsec on the other hand expects any definitions of term results to be interspersed for choices. So where Treetop would have (x / y z / a b c) 1..1 { def content ....}, with rules to detect which choice has been applied, Rsec will have x.map { ... } | seq(y, z) { ...} | seq(a, b, c) { ... }
  • Treetop is indifferent about the order in which rules are defined. Rsec expects that rules will refer to rules that have already been defined; any forward references or recursive references need to be enclosed as lazy{}.
  • Treetop is uniformly greedy. Rsec is usually greedy, but the x.until operator is equivalent to a non-greedy .*? x.
  • Terminal symbols need to have an .r operator appended in order to be treated as Rsec elements, e.g. to have their value propagated to non-terminals, or to have other operators such as .star applied to them.
Clone this wiki locally