-
Notifications
You must be signed in to change notification settings - Fork 0
GF crash course
A GF grammar consists of one Abstract syntax and 1+ concrete syntaxes.
- The abstract syntax defines a set of possible abstract syntax trees (ASTs).
- The concrete syntax (for language L), defines a mapping from ASTs to "linearisation structures".
- These linearisations can be strings, but also tuples, records, finite types and functions over finite types.
The abstract syntax defines the available types (called "categories"
-- cat
), and the functions (fun
) with which we can create
ASTs. Each function has a type signature just as in Haskell.
abstract Mini = {
cat S; VP; NP; Verb; Det; Noun;
fun
mkS : NP -> VP -> S;
mkVP : Verb -> NP -> VP;
mkNP : Det -> Noun -> NP;
love, hate : Verb;
a, all : Det;
elk, deer : Noun;
}
With this grammar, the following are valid ASTs:
deer : Noun
(mkVP love (mkNP a deer)) : VP
(mkS (mkNP all elk) (mkVP love (mkNP a deer))) : S
Notes:
- Higher-order function types are allowed, but not very useful in practise (because of restrictions in the concrete syntax).
- Dependent types are possible, and sometimes useful. But nevertheless not very much used (because of restrictions in the concrete syntax).
- There are some more keywords, but they are for the GF gurus.
There can be several concrete syntaxes for every abstract syntax.
Every concrete syntax (for language L), defines a mapping from ASTs to
"linearisation structures". Linearisations (lin
) are typed
(lincat
), and can be strings, but also tuples, records, finite
parameters and functions over finite parameters (called "tables").
Parameter types have to be defined in the concrete syntax, and they
are similar to Haskell data
definitions, but restricted:
param Number = Sg | Pl;
param Gender = Neu | Utr;
param Agreement = Agr Number Gender | NoAgr;
The restriction is that recursive parameter types are not allowed, which means that there are always a finite number of parameters of every parameter type.
Strings (Str
) are really lists of tokens. They are concatenated with
++
- the string literal
"one two three"
is really a string with three tokens "one two three" == "one" ++ "two" ++ "three"
- i.e., it's like running Haskell
words
on the literal
Structure | linearisation type | construction | introspection |
---|---|---|---|
strings | Str |
"this is a string" and concatenation s1 ++ s2
|
(not possible) |
parameters | Agreement |
Agr Sg Utr |
case-expressions |
tuples | Str * Str * Number |
<"first", "second", Sg> |
projections: tpl.p1 , tpl.p2 , tpl.p3
|
records | {s:Str; n:Agreement} |
{s:"hello world"; n=NoAgr} |
projections: rec.s , rec.n
|
tables | Number=>Str |
table {Sg=>"elk"; Pl=>"elks"} |
selection: tbl!Sg , tbl!Pl , tbl!(rec.n)
|
Syntactic sugar for tables:
- wildcards:
table {Agr _ _ => "yes"; NoAgr => "no"}
- variable binding:
table {Agr num _ => "number" ++ table{Sg=>"one";Pl=>"two"} ! num; NoAgr => "nope"}
- lambda syntax:
\\x => ...x...
is the same astable{x => ...x...}
Case expressions:
case x of {NoAgr => "no"; Agr n g => ...n...g...}
- case expressions are compiled away
concrete MiniEng of Mini = {
param
Number = Sg | Pl;
lincat
S = {s : Str};
NP, Det = {s : Str; num : Number};
VP, Verb, Noun = {s : Number => Str};
lin
mkS np vp = {s = np.s ++ vp.s!np.num};
mkVP verb np = {s = \\num => verb.s!num ++ np.s};
mkNP det noun = {s = det.s ++ noun.s!det.num; num = det.num};
love = {s = table {Sg => "loves"; Pl => "love"}};
hate = {s = table {Sg => "hates"; Pl => "hate"}};
a = {s = "a"; num = Sg};
all = {s = "all"; num = Pl};
elk = {s = table {Sg => "elk"; Pl => "elks"}};
deer = {s = table {_ => "deer"}};
}
concrete MiniSwe of Mini = {
param
Number = Sg | Pl;
Gender = Neu | Utr;
lincat
S, NP, VP, Verb = {s : Str};
Det = {s : Gender => Str; num : Number};
Noun = {s : Number => Str; gen : Gender};
lin
mkS np vp = {s = np.s ++ vp.s};
mkVP verb np = {s = verb.s ++ np.s};
mkNP det noun = {s = det.s!noun.gen ++ noun.s!det.num};
love = {s = "älskar"};
hate = {s = "hatar"};
a = {s = table {Neu => "ett"; Utr => "en"}; num = Sg};
all = {s = \\_ => "alla"; num = Pl};
elk = {s = table {Sg => "älg"; Pl => "älgar"}; gen = Utr};
deer = {s = \\_ => "rådjur"; gen = Neu};
}
All the restrictions on concrete syntax means that parsing is always possible. Furthermore, GF (without dependent or higher-order types) is equivalent to Parallell Multiple Context-Free Grammar, meaning that parsing has polynomial time complexity.
(See more about this in my PhD thesis)