Skip to content

Transforming a GF grammar

Frederik Hanghøj Iversen edited this page Oct 22, 2018 · 7 revisions

Transforming a GF grammar so that it recognises more sentences

This is an explanation of my thoughts behind issue #4.

There is also a GF crash course for Haskell programmers.

Making a GF grammar recognise ungrammatical sentences

An example grammar in English and Swedish is given later in this document. Both of them has inflection on Number (and Gender for Swedish). Now I want to make this grammar recognise sentences with disagreement in Number (but keep the Gender agreement). Here's how I imagine an automatic transformation could be done.

Transformation

  • Input: a multilingual GF grammar; a parameter to be merged
  • Output: a GF grammar where the parameter is conflated to Any
  1. (Manually) Decide which parameters should be merged. In this example we want to merge Sg and Pl...

     param Number = Sg | Pl;
    

    ...into a single parameter:

     param Number = Any;
    
  2. All functions with a linearisation table involving Sg or Pl...

     -- English concrete syntax
     love = {s = table {Sg => "loves"; Pl => "love"}};
     elk = {s = table {Sg => "elk"; Pl => "elks"}};
     deer = {s = table {_ => "deer"}};
    
     -- Swedish concrete syntax
     love = {s = "älskar"};
     elk = {s = table {Sg => "älg"; Pl => "älgar"}; gen = Utr};
     deer = {s = table {_ => "rådjur"}; gen = Neu};
    

    ...should be split into several functions, one each for Sg and Pl:

     -- English concrete syntax
     love_Sg = {s = table {Sg => "loves"}};
     love_Pl = {s = table {Pl => "love"}};
     elk_Sg = {s = table {Sg => "elk"}};
     elk_Pl = {s = table {Pl => "elks"}};
     deer = {s = table {_ => "deer"}};
    
     -- Swedish concrete syntax
     love = {s = "älskar"};
     elk_Sg = {s = table {Sg => "älg"}; gen = Utr};
     elk_Pl = {s = table {Pl => "älgar"}; gen = Utr};
     deer = {s = table {_ => "rådjur"}; gen = Neu};
    
  3. The corresponding functions in the abstract GF grammar should also be split.

     -- Abstract syntax
     love_Sg, love_Pl : Verb;
     elk_Sg, elk_Pl : Noun;
    

    Some additional functions in a language also have to be split, if they are split in the other language. E.g., love has to be in Swedish because it was split in English:

     -- Swedish concrete syntax
     love_Sg = {s = "älskar"};
     love_Pl = {s = "älskar"};
    

    Note that not all nouns in the example have to be split. E.g., deer does not inflect for Number in either English or Swedish, so it's not necessary to split it.

  4. Change all occurrences of Sg and Pl into Any:

     -- English concrete syntax
     a = {s = "a"; num = Any};
     love_Sg = {s = table {Any => "loves"}};
     love_Pl = {s = table {Any => "love"}};
     elk_Sg = {s = table {Any => "elk"}};
     elk_Pl = {s = table {Any => "elks"}};
    
     -- Swedish concrete syntax
     a = {s = table {Neu => "ett"; Utr => "en"}; num = Any};
     elk_Sg = {s = table {Any => "älg"}; gen = Utr};
     elk_Pl = {s = table {Any => "älgar"}; gen = Utr};
    

With these transformation steps, the grammars should be able to recognise things like "all deer loves a elks" == "alla rådjur älskar en älgar".

Doing this automatically

Now, this is my idea. Issue #4 is about implementing this grammar transformation automatically, as an add-on to GF.

Important notes:

  1. GF grammars in general make use of grammar libraries, such as the RGL, so the linearisation types are much more complex than in this example. But that should not change the basic idea of the transformation.

  2. GF abstract syntax also allows dependent types, but we can assume that the grammar does not contain any dependent types.

  3. For simplicity we can assume that both languages have exactly the same Number parameters. I.e., we disallow this for now:

     -- English concrete syntax
     param Number = Sg | Pl;
    
     -- Russian concrete syntax
     param Number = Sg | Pl | Dual;
    
  4. GF concrete syntax has a lot of "sugaring" constructions, such as operations, lambdas, let-constructs. All these are compiled away somewhere by the GF compiler, into a "canonical" form. Unfortunately, the GF source code is not very well commented...

    Here is the information I've got so far:

    What you need to do is to insert a new phase between the typechecker and the backend code generation. If you look at GF.CompileOne.compileSourceModule, then there is this sequence:

    generateGFO <=< ifComplete (backend <=< middle) <=< frontend
    

    Your new phase seems to fit between backend and middle.

Example GF grammar

Abstract syntax

abstract Mini = {
cat S; VP; NP; Verb; Det; Noun;
fun
  mkS : NP -> VP -> S;
  mkVP : Verb -> NP -> VP;
  mkNP : Det -> Noun -> NP;
  love, hate : Verb;
  a, all : Det;
  elk, deer : Noun;
}

English concrete syntax

concrete MiniEng of Mini = {
param
  Number = Sg | Pl;
lincat
  S = {s : Str};
  NP, Det = {s : Str; num : Number};
  VP, Verb, Noun = {s : Number => Str};
lin
  mkS np vp = {s = np.s ++ vp.s!np.num};
  mkVP verb np = {s = \\num => verb.s!num ++ np.s};
  mkNP det noun = {s = det.s ++ noun.s!det.num; num = det.num};
  love = {s = table {Sg => "loves"; Pl => "love"}};
  hate = {s = table {Sg => "hates"; Pl => "hate"}};
  a = {s = "a"; num = Sg};
  all = {s = "all"; num = Pl};
  elk = {s = table {Sg => "elk"; Pl => "elks"}};
  deer = {s = table {Sg => "deer"; Pl => "deer"}};
}

Swedish concrete syntax

concrete MiniSwe of Mini = {
param
  Number = Sg | Pl;
  Gender = Neu | Utr;
lincat
  S, NP, VP, Verb = {s : Str};
  Det = {s : Gender => Str; num : Number};
  Noun = {s : Number => Str; gen : Gender};
lin
  mkS np vp = {s = np.s ++ vp.s};
  mkVP verb np = {s = verb.s ++ np.s};
  mkNP det noun = {s = det.s!noun.gen ++ noun.s!det.num};
  love = {s = "älskar"};
  hate = {s = "hatar"};
  a = {s = table {Neu => "ett"; Utr => "en"}; num = Sg};
  all = {s = \\_ => "alla"; num = Pl};
  elk = {s = table {Sg => "älg"; Pl => "älgar"}; gen = Utr};
  deer = {s = \\_ => "rådjur"; gen = Neu};
}