Skip to content

GF crash course

Frederik Hanghøj Iversen edited this page Oct 22, 2018 · 2 revisions

GF crash course for Haskell programmers

A GF grammar consists of one Abstract syntax and 1+ concrete syntaxes.

  • The abstract syntax defines a set of possible abstract syntax trees (ASTs).
  • The concrete syntax (for language L), defines a mapping from ASTs to "linearisation structures".
  • These linearisations can be strings, but also tuples, records, finite types and functions over finite types.

Abstract syntax

The abstract syntax defines the available types (called "categories" -- cat), and the functions (fun) with which we can create ASTs. Each function has a type signature just as in Haskell.

abstract Mini = {
cat S; VP; NP; Verb; Det; Noun;
fun
  mkS : NP -> VP -> S;
  mkVP : Verb -> NP -> VP;
  mkNP : Det -> Noun -> NP;
  love, hate : Verb;
  a, all : Det;
  elk, deer : Noun;
}

With this grammar, the following are valid ASTs:

  • deer : Noun
  • (mkVP love (mkNP a deer)) : VP
  • (mkS (mkNP all elk) (mkVP love (mkNP a deer))) : S

Notes:

  • Higher-order function types are allowed, but not very useful in practise (because of restrictions in the concrete syntax).
  • Dependent types are possible, and sometimes useful. But nevertheless not very much used (because of restrictions in the concrete syntax).
  • There are some more keywords, but they are for the GF gurus.

Concrete syntax

There can be several concrete syntaxes for every abstract syntax. Every concrete syntax (for language L), defines a mapping from ASTs to "linearisation structures". Linearisations (lin) are typed (lincat), and can be strings, but also tuples, records, finite parameters and functions over finite parameters (called "tables").

Parameter types

Parameter types have to be defined in the concrete syntax, and they are similar to Haskell data definitions, but restricted:

param Number = Sg | Pl;
param Gender = Neu | Utr;
param Agreement = Agr Number Gender | NoAgr;

The restriction is that recursive parameter types are not allowed, which means that there are always a finite number of parameters of every parameter type.

Strings

Strings (Str) are really lists of tokens. They are concatenated with ++

  • the string literal "one two three" is really a string with three tokens
  • "one two three" == "one" ++ "two" ++ "three"
  • i.e., it's like running Haskell words on the literal

Linearisation structures

Structure linearisation type construction introspection
strings Str "this is a string" and concatenation s1 ++ s2 (not possible)
parameters Agreement Agr Sg Utr case-expressions
tuples Str * Str * Number <"first", "second", Sg> projections: tpl.p1, tpl.p2, tpl.p3
records {s:Str; n:Agreement} {s:"hello world"; n=NoAgr} projections: rec.s, rec.n
tables Number=>Str table {Sg=>"elk"; Pl=>"elks"} selection: tbl!Sg, tbl!Pl, tbl!(rec.n)

Syntactic sugar for tables:

  • wildcards: table {Agr _ _ => "yes"; NoAgr => "no"}
  • variable binding: table {Agr num _ => "number" ++ table{Sg=>"one";Pl=>"two"} ! num; NoAgr => "nope"}
  • lambda syntax: \\x => ...x... is the same as table{x => ...x...}

Case expressions:

  • case x of {NoAgr => "no"; Agr n g => ...n...g...}
  • case expressions are compiled away

Example English concrete syntax

concrete MiniEng of Mini = {
param
  Number = Sg | Pl;
lincat
  S = {s : Str};
  NP, Det = {s : Str; num : Number};
  VP, Verb, Noun = {s : Number => Str};
lin
  mkS np vp = {s = np.s ++ vp.s!np.num};
  mkVP verb np = {s = \\num => verb.s!num ++ np.s};
  mkNP det noun = {s = det.s ++ noun.s!det.num; num = det.num};
  love = {s = table {Sg => "loves"; Pl => "love"}};
  hate = {s = table {Sg => "hates"; Pl => "hate"}};
  a = {s = "a"; num = Sg};
  all = {s = "all"; num = Pl};
  elk = {s = table {Sg => "elk"; Pl => "elks"}};
  deer = {s = table {_ => "deer"}};
}

Example Swedish concrete syntax

concrete MiniSwe of Mini = {
param
  Number = Sg | Pl;
  Gender = Neu | Utr;
lincat
  S, NP, VP, Verb = {s : Str};
  Det = {s : Gender => Str; num : Number};
  Noun = {s : Number => Str; gen : Gender};
lin
  mkS np vp = {s = np.s ++ vp.s};
  mkVP verb np = {s = verb.s ++ np.s};
  mkNP det noun = {s = det.s!noun.gen ++ noun.s!det.num};
  love = {s = "älskar"};
  hate = {s = "hatar"};
  a = {s = table {Neu => "ett"; Utr => "en"}; num = Sg};
  all = {s = \\_ => "alla"; num = Pl};
  elk = {s = table {Sg => "älg"; Pl => "älgar"}; gen = Utr};
  deer = {s = \\_ => "rådjur"; gen = Neu};
}

Parsing

All the restrictions on concrete syntax means that parsing is always possible. Furthermore, GF (without dependent or higher-order types) is equivalent to Parallell Multiple Context-Free Grammar, meaning that parsing has polynomial time complexity.

(See more about this in my PhD thesis)