The goal of this document is to explain how different parsers fit together. When will it backtrack? When will it not?
Say we have keyword "import"
:
String | Result |
---|---|
"import" |
OK{false} |
"imp" |
ERR{true} |
"export" |
ERR{true} |
In our OK{false}
notation, we are indicating:
- Did the parser succeed?
OK
if yes.ERR
if not. - Is it possible to backtrack? So when
keyword
succeeds, backtracking is not allowed anymore. You must continue along that path.
Say we have map func parser
:
parser |
Result |
---|---|
OK{b} |
OK{b} |
ERR{b} |
ERR{b} |
So result of map func parser
is always the same as the result of the parser
itself.
Say we have map2 func parserA parserB
:
parserA |
parserB |
Result |
---|---|---|
OK{b} |
OK{b'} |
OK{b && b'} |
OK{b} |
ERR{b'} |
ERR{b && b'} |
ERR{b} |
ERR{b} |
If parserA
succeeds, we try parserB
. If they are both backtrackable, the combined result is backtrackable.
If parserA
fails, that is our result.
This is used to define our pipeline operators like this:
(|.) a b = map2 (\keep ignore -> keep) a b
(|=) a b = map2 (\func arg -> func arg) a b
Say we have either parserA parserB
:
parserA |
parserB |
Result |
---|---|---|
OK{b} |
OK{b} |
|
ERR{true} |
OK{b} |
OK{b} |
ERR{true} |
ERR{b} |
ERR{b} |
ERR{false} |
ERR{false} |
The 4th case is very important! If parserA
is not backtrackable, you do not even try parserB
.
The either
function does not appear in the public API, but I used it here because it makes the rules a bit easier to read. In the public API, we have oneOf
instead. You can think of oneOf
as trying either
the head of the list, or oneOf
the parsers in the tail of the list.
Say we have andThen callback parserA
where callback a
produces parserB
:
parserA |
parserB |
Result |
---|---|---|
ERR{b} |
ERR{b} |
|
OK{b} |
OK{b'} |
OK{b && b'} |
OK{b} |
ERR{b'} |
ERR{b && b'} |
If both parts are backtrackable, the overall result is backtrackable.
Say we have backtrackable parser
:
parser |
Result |
---|---|
OK{b} |
OK{true} |
ERR{b} |
ERR{true} |
No matter how parser
was defined, it is backtrackable now. This becomes very interesting when paired with oneOf
. You can have one of the options start with a backtrackable
segment, so even if you do start down that path, you can still try the next parser if something fails. This has important yet subtle implications on performance, so definitely read on!
This parser is intended to give you very precise control over backtracking behavior, and I think that is best explained through examples.
Say we have map2 func (backtrackable spaces) (symbol ",")
which can eat a bunch of spaces followed by a comma. Here is how it would work on different strings:
String | Result |
---|---|
" ," |
OK{false} |
" :" |
ERR{true} |
"abc" |
ERR{true} |
Remember how map2
is backtrackable only if both parsers are backtrackable. So in the first case, the overall result is not backtrackable because symbol ","
succeeded.
This becomes useful when paired with either
!
Say we have the following parser
definition:
parser : Parser (Maybe Int)
parser =
oneOf
[ succeed Just
|. backtrackable spaces
|. symbol ","
|. spaces
|= int
, succeed Nothing
|. spaces
|. symbol "]"
]
Here is how it would work on different strings:
String | Result |
---|---|
" , 4" |
OK{false} |
" ," |
ERR{false} |
" , a" |
ERR{false} |
" ]" |
OK{false} |
" a" |
ERR{false} |
"abc" |
ERR{true} |
Some of these cases are tricky, so let's look at them in more depth:
" , a"
—backtrackable spaces
,symbol ","
, andspaces
all succeed. At that point we haveOK{false}
. Theint
parser then fails ona
, so we finish withERR{false}
. That meansoneOf
will NOT try the second possibility." ]"
—backtrackable spaces
succeeds, butsymbol ","
fails. At that point we haveERR{true}
, sooneOf
tries the second possibility. After backtracking,spaces
andsymbol "]"
succeed withOK{false}
." a"
—backtrackable spaces
succeeds, butsymbol ","
fails. At that point we haveERR{true}
, sooneOf
tries the second possibility. After backtracking,spaces
succeeds withOK{false}
andsymbol "]"
fails resulting inERR{false}
.
Notice that in the previous example, we parsed spaces
twice in some cases. This is inefficient, especially in large files with lots of whitespace. Backtracking is very inefficient in general though, so if you are interested in performance, it is worthwhile to try to eliminate as many uses of backtrackable
as possible.
So we can rewrite that last example to never backtrack:
parser : Parser (Maybe Int)
parser =
succeed identity
|. spaces
|= oneOf
[ succeed Just
|. symbol ","
|. spaces
|= int
, succeed Nothing
|. symbol "]"
]
Now we are guaranteed to consume the spaces only one time. After that, we decide if we are looking at a ,
or ]
, so we never backtrack and reparse things.
If you are strategic in shuffling parsers around, you can write parsers that do not need backtrackable
at all. The resulting parsers are quite fast. They are essentially the same as LR(k) parsers, but more pleasant to write. I did this in Elm compiler for parsing Elm code, and it was very significantly faster.