Prol's grammar is straightforward in BNF:
Clause ::= Struct "."
| Struct ":-" Terms "."
;
Term ::= Atom
| Var
| Struct
;
Atom ::= lower ident*
| digit+
| symbol+
| "'" (char - "'" | "'" "'") "'"
;
Var ::= (upper | "_") ident* ;
Struct ::= Atom "(" Terms? ")" ;
Terms ::= Term ("," Term)* ;
lower ::= [a-z] ;
upper ::= [A-Z] ;
digit ::= [0-9] ;
symbol ::= [!@#$%&*=+{}[\]^~/\|<>.;-] ;
ident ::= lower | upper | digit | "_" ;
(Whitespace handling omitted to improve readability)
Let's see some example elements:
- Var:
X
,A1
,Elem
,_
,_list_1
; - Atom:
- starting with lowercase:
a
,b51
,anAtom
,a_10
; - only digits:
123
; - only symbols:
.
,<->
,[]
,\==
; - quoted:
''
,'a b'
,'I won''t fail!'
;
- starting with lowercase:
- Struct:
f()
,'f'()
,.(Head, Tail)
,f(g(a), h(t(b)))
- Clause:
vowel(a).
,consonant(X) :- letter(X), not(vowel(X)).
Given a list of atoms representing text, we want to check that it matches the grammar specification. We need to compose rules into others, which can be done in a simple way using difference lists. We can build a check that the L0-L4 difference list contains a clause by composing it with checks for struct and terms:
clause(L0, L4) :-
struct(L0, L1),
L1 = .(':', .('-', L2)),
terms(L2, L3),
L3 = .('.', L4).
We can also inline the definitions for L1
and L3
, getting the more
idiomatic option
clause(L0, L4) :-
struct(L0, .(':', .('-', L2))),
terms(L2, .('.', L4)).
To check for vars, we check that the first char is an uppercase char,
and the remaining are all identifier chars with idents/2
.
var(.(Ch, L0), L1) :-
upper(Ch),
idents(L1, L2).
idents(L, L).
idents(.(Ch, L0), L1) :-
ident(Ch),
idents(L0, L1).
ident(Ch) :- lower(Ch).
ident(Ch) :- upper(Ch).
ident(Ch) :- digit(Ch).
ident('_').
Finally, we need to list all facts about chars:
% upper/1
upper('A').
upper('B').
..
upper('Z').
% lower/1
lower(a).
lower(b).
..
lower(z).
% digit/1
digit(0).
..
digit(9).
% symbol/1
symbol(!).
symbol(@).
..
symbol(-).
The form we wrote for the grammar allows us only checking that a given list of atoms matches the grammar. We can also build a tree of the parsed elements while parsing, to represent the program structure -- what is called an abstract syntax tree (AST).
First, an example of what we want. Given the text f(x):-gh(a,X).
we
build a list of atoms with their chars.
The predicate clause/3
receives a new argument Tree
where the AST is built:
?- Chars = .(f, .('(', .('X', .(')', .(':', .('-', .(g, .(h, .('(', .(a, .(',', .('X', .(')', .('.', [])))))))))))))),
clause(Tree, Chars, []).
Tree = clause(
struct(.(f, []),
.(var(.('X', [])),
[])),
.(struct(.(g, .(h, [])),
.(atom(.(a, [])),
.(var(.('X', [])),
[]))),
[])).
Or, more visually:
Tree = clause(.,.)
| '-----------------[ struct(.,.)
struct(.,.) | '--[ atom(.)
| '--[ var(.) ] "gh" |
"f" | "a"
"X" , var(.)
|
"X"
]
]
That is, a clause
has two elements: the head and the body (a list of terms).
A struct
has two terms: its name and its args (also a list of terms).
atom
and var
both have only their names.
Names are represented above within double quotes, but in the AST you see that
they are actually atom lists.
The clause
predicate becomes:
clause(Tree, L0, L2) :-
struct(Head, L0, .(':', .('-', L1))),
terms(Body, L1, .('.', L2)),
Tree = clause(Head, Body).
and var
becomes:
var(Tree, .(Ch, L0), L1) :-
upper(Ch),
idents(Idents, L1, L2),
Tree = var(.(Ch, Idents)).
idents([], L, L).
idents(Idents, .(Ch, L0), L1) :-
ident(Ch),
idents(Chars, L0, L1),
Idents = .(Ch, Chars).
With an AST for the code, we parse the text and build data structures from it at the decode_*
functions in grammar.py.
We are able to build the exact same structures as the grammar built to parse it! Cool, huh?