Æthel
++ Æthel is a semantic compositionality dataset for Dutch. It + consists of a lexicon of supertags for about 900k words in + context, together with validated derivations for some 70k sample + sentences, associating them with programs (lambda terms) for + their meaning composition. Æthel’s types and derivations are + obtained by means of an extraction algorithm applied to the + syntactic analyses of LASSY Small, the gold standard corpus of + written Dutch. +
++ More info can be found under + About and + References. The + notations are explained under + Notation. +
++ You can use the interface below to search for a word or lemma. + Once you have retrieved a sample, you can inspect it to isolate + and look into a word, a type, or a word-type pair. You can then + look for other words that occur with the same type, or other + occurrences of the same word-type pair. +
-@if (status$ | async) { - -} -@else { -- The Æthel dataset is temporarily unavailable. -
-} - -@if (submitted | async) { -Loading samples...
- } @else { --
- @for (sample of samples; track $index) {
+
-
-
+
{{ $index + 1 }}
+@for (phrase of sample.phrases; track phrase.index) { {{ phrase.display }} @@ -14,7 +12,7 @@
Go to sample @@ -22,5 +20,16 @@
}
-
+ @for (sample of visibleSamples(); track $index) {
Loading samples...
+ } + @else if (!allSamplesLoaded()) { + }Welcome to ParsePort
-Here are some introductory words.
\ No newline at end of file +ParsePort
+ ++ Welcome to ParsePort, a growing collection of NLP-related + parsers and parsed corpora developed at Utrecht University. +
+References
+References
+-
+
- + Kogkalidis, K. (2023). + + Dependency as Modality, Parsing as Permutation . Netherlands Graduate School of Linguistics + (E. W. Beth Dissertation Prize, FoLLI). + -
- + Kogkalidis, K., Moortgat, M., & Moot, R. (2023). + SPINDLE: Spinning Raw Text into Lambda Terms with Graph + Attention. + In D. Croce & L. Soldaini (Eds.), + Proceedings of the 17th Conference of the European + Chapter of the Association for Computational + Linguistics: System Demonstrations + (pp. 128–135). Association for Computational Linguistics. + -
- + Kogkalidis, K., Moortgat, M., & Moot, R. (2020). + Neural Proof Nets. + In R. Fernández & T. Linzen (Eds.), + Proceedings of the 24th Conference on Computational + Natural Language Learning + (pp. 26–40). Association for Computational Linguistics. + -
- + Kogkalidis, K., Moortgat, M., & Moot, R. (2020). + ÆTHEL: Automatically Extracted Typelogical Derivations + for Dutch. + In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, + T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, + H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), + Proceedings of the Twelfth Language Resources and + Evaluation Conference + (pp. 5257–5266). European Language Resources Association. + -
- + Kogkalidis, K., & Moortgat, M. (2023). + Geometry-Aware Supertagging with Heterogeneous Dynamic + Convolutions. + In E. Breitholtz, S. Lappin, S. Loaiciga, N. Ilinykh, & + S. Dobnik (Eds.), + Proceedings of the 2023 CLASP Conference on Learning + with Small Data (LSD) + (pp. 107–119). Association for Computational Linguistics. + +
- + Van Noord, G. et al. (2013). + Large Scale Syntactic Annotation of Written Dutch: + Lassy. In P. Spyns & J. Odijk (Eds.), + Essential Speech and Language Technology for Dutch, + Results by the STEVIN-programme, Theory and Applications + of Natural Language Processing + (pp. 147–164). Springer. + +
Spindle
-- Kogkalidis, K. (2023). Dependency as Modality, Parsing as Permutation. Netherlands Graduate School of Linguistics. -
+- Kogkalidis, K., Moortgat, M., & Moot, R. (2023). SPINDLE: Spinning Raw Text into Lambda Terms with Graph Attention. In D. Croce & L. Soldaini (Eds.), Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations (pp. 128–135). Association for Computational Linguistics. -
+- Kogkalidis, K., Moortgat, M., & Moot, R. (2020). Neural Proof Nets. In R. Fernández & T. Linzen (Eds.), Proceedings of the 24th Conference on Computational Natural Language Learning (pp. 26–40). Association for Computational Linguistics. -
+- Kogkalidis, K., Moortgat, M., & Moot, R. (2020). ÆTHEL: Automatically Extracted Typelogical Derivations for Dutch. In N. Calzolari, F. Béchet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, A. Moreno, J. Odijk, & S. Piperidis (Eds.), Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 5257–5266). European Language Resources Association. -
- -- Kogkalidis, K., & Moortgat, M. (2023). Geometry-Aware Supertagging with Heterogeneous Dynamic Convolutions. In E. Breitholtz, S. Lappin, S. Loaiciga, N. Ilinykh, & S. Dobnik (Eds.), Proceedings of the 2023 CLASP Conference on Learning with Small Data (LSD) (pp. 107–119). Association for Computational Linguistics. -
-Source code
+-
+
- + Æthel (GitHub): backend code for the definition and representation of + proof-derivations, and their extraction from Lassy- and + Alpino-style parses. + +
- + Spindle (GitHub): backend code for the neural proof search engine. + +
- + ParsePort (GitHub): frontend and backend code for this web application. + +
Sample sentence
+{{ sample.sentence }}
-@if (sampleResult$ | async; as sample;) { -Sample sentence
-{{ sample.sentence }}
-# | +Phrase | +Type | +Search in Æthel | +
---|---|---|---|
+ c{{ i }} + | ++ {{ item.word }} + + | ++ + | ++ @if (showButtons(phrase.items)) { + | + + } +
# | -Phrase | -Type | -Search in Æthel | -
---|---|---|---|
- c{{ i }} - | -- {{ item.word }} - | -- - | -- @if (showButtons(phrase.items)) { - - - - } - | -
Export results
+About
- Spindle (“Spindle Parses Into Dependency Enhanced Lambda Expressions”) is a neurosymbolic parser for Dutch. +About Spindle and Æthel
++ Spindle (“Spindle Parses Into Dependency Enhanced Lambda + Expressions”) is a neurosymbolic parser for Dutch. Spindle is + trained on Æthel (“Automatically Extracted Theorems from Lassy”), a + large dataset of machine-verified derivations extracted from LASSY + Small, the gold standard treebank of written Dutch. +
-What am I seeing?
-- Spindle outputs are abstract syntactic derivations in the implication-only fragment of intuitionistic linear logic (LP), extended with a family of residuated pairs of modal operators (LP◇,□). -
+What am I seeing?
++ Spindle outputs and Æthel samples are abstract syntactic derivations + in the implication-only fragment of intuitionistic linear logic + (LP), extended with a family of residuated pairs of modal operators + (LP◇,□). +
-- This might sound scary, but it's really not that bad. - In practice, you give Spindle a Dutch sentence, and it tells you how it has been composed, explicating: -
--
-
- The (possibly higher-order) function-argument relations between constituent words and phrases -
- The dependencies between a "head" an the elemens dependent on it, which can be either complements or adjuncts. -
+ This might sound scary, but it's really not that bad. In practice, + you give Spindle a Dutch sentence, and it tells you how it has been + composed, explicating: +
+-
+
- + The (possibly higher-order) function-argument relations between + constituent words and phrases + +
- + The dependencies between a "head" an the elemens dependent on + it, which can be either complements or adjuncts. + +
- Which phrases can combine and how is fully determined by the formulas (aka types) assigned to the words that make them up. Logical implications in these formulas denote functions, the modalities (indexed with dependency labels) encode grammatical roles. Words combine to form larger phrases by virtue of a fixed set of inference rules for the type-forming operations. -
-- A parse of a phrase is then nothing more and nothing less than a proof: a sequence of valid derivation steps leading from axioms—the formulas/types assigned to the elementary building blocks, i.e. words—to the well-formed complex expression that constitutes the eventual phrase. - You might have encountered this idea in the literature as the parsing as deduction paradigm. -
++ Which phrases can combine and how is fully determined by the + formulas (aka types) assigned to the words that make them up. + Logical implications in these formulas denote functions, the + modalities (indexed with dependency labels) encode grammatical + roles. Words combine to form larger phrases by virtue of a fixed set + of inference rules for the type-forming operations. +
++ A parse of a phrase is then nothing more and nothing less + than a proof: a sequence of valid derivation steps leading + from axioms—the formulas/types assigned to the elementary building + blocks, i.e. words—to the well-formed complex expression that + constitutes the eventual phrase. You might have encountered this + idea in the literature as the + parsing as deduction paradigm. +
-- The interesting bit is that intuitionistic proofs also are compositional meaning instructions (or functional programs), courtesy of the Curry-Howard-de Bruijn correspondence. - What this means is that the proofs produced by Spindle are also meaning recipes (more formally λ-terms); these you can actually execute provided you have the basic ingredients—meanings for the constants/words—and a concrete interpretation for the recipe instructions, i.e. the inference steps. -
++ The interesting bit is that intuitionistic proofs also are + compositional meaning instructions (or functional programs), + courtesy of the Curry-Howard-de Bruijn correspondence. What this + means is that the proofs produced by Spindle are also + meaning recipes (more formally λ-terms); these you can + actually execute provided you have the basic ingredients—meanings + for the constants/words—and a concrete interpretation for the recipe + instructions, i.e. the inference steps. +
-Does this mean that Spindle is never mistaken?
+Does this mean that Spindle is never mistaken?
++ No. Spindle can be (and frequently is) wrong. Sometimes it can't + come up with a proof, in which case it knows it's wrong and openly + admits it. Other times it does come up with a proof, but the proof + does not capture the sentence's actual structure (either because + some word got the wrong formula, or because a logically correct but + linguistically implausible inference path has been followed). + Fortunately, every so often, Spindle is also correct: it returns a + proof that should appease both linguists and logicians. +
-- No. Spindle can be (and frequently is) wrong. Sometimes it can't come up with a proof, in which case it knows it's wrong and openly admits it. Other times it does come up with a proof, but the proof does not capture the sentence's actual structure (either because some word got the wrong formula, or because a logically correct but linguistically implausible inference path has been followed). Fortunately, every so often, Spindle is also correct: it returns a proof that should appease both linguists and logicians. -
+How does this even work?
++ This works thanks to a sophisticated interplay between a logical + core (a miniature type checker) and two machine learning components. +
++ Machine learning component (1) is a supertagger: it takes a Dutch + text (a sequence of words) as input and converts it into a sequence + of formulas of LP◇,□; +
++ Machine learning component (2) is a greedy theorem prover that uses + the extracted formulas and their representations in order to resolve + the entire proof in a single step (without ever backtracking). The + output of the machine learning components is an (unverified) + candidate proof; it is passed to the type checker which + either accepts it (in which case you get to see it), or rejects it + (in which case you do not). +
-How does this even work?
-- This works thanks to a sophisticated interplay between a logical core (a miniature type checker) and two machine learning components. -
-- Machine learning component (1) is a supertagger: it takes a Dutch text (a sequence of words) as input and converts it into a sequence of formulas of LP◇,□; -
-- Machine learning component (2) is a greedy theorem prover that uses the extracted formulas and their representations in order to resolve the entire proof in a single step (without ever backtracking). - The output of the machine learning components is an (unverified) candidate proof; it is passed to the type checker which either accepts it (in which case you get to see it), or rejects it (in which case you do not). -
+Ok, how does this really work?
++ Please check out the literature at the relevant links section. +
-Ok, how does this really work?
- -- Please check out the literature at the relevant links section. -
- - -Credits
-- Spindle has been designed and developed as a part of the PhD thesis of Konstantinos Kogkalidis. -
-- The online interface is due to the Research Software Lab, Centre of Digital Humanities at Utrecht University. -
-- Funding was provided by the NWO project “A composition calculus for vector-based semantic modelling with a localization for Dutch” (grant nr. 360-89-070). -
- -Contact
-- For comments/complaints you can contact konstantinos(funny-a-symbol)riseup(sentence-end-marker)net -
+Credits
++ Spindle has been designed and developed as a part of the PhD thesis + of Konstantinos Kogkalidis. +
++ The online interface is due to the Research Software Lab, Centre of + Digital Humanities at Utrecht University. +
++ Funding was provided by the NWO project “A composition calculus for + vector-based semantic modelling with a localization for Dutch” + (grant nr. 360-89-070). +
+Contact
++ For comments/complaints you can contact + konstantinos(funny-a-symbol)riseup(sentence-end-marker)net +
+Notation
--
-
- - Types - -
- atomic: T; complex: given types A, B we have
-
-
-- -A ⊸ B linear implication -- -♢d A diamond with dependency label d -- -□d A box with dependency label d -
-
-
-
- Terms -
- atomic: c constants (i.e. words), x variables; complex: given terms M, N we have
-
-
-- -M N ⊸ elimination, function application -- -λx.M ⊸ introduction, abstraction -- -▽d M diamond elimination -- -△d M diamond introduction -- -▼d M box elimination -- -▲d M box introduction -
-
Notation
+-
+
- Types +
-
+ atomic: T; complex: given types
+ A, B we have
+
+
++ +A ⊸ B +linear implication ++ ++ ♢d A + +diamond with dependency label d ++ ++ □d A + +box with dependency label d +
+
-
+
- Terms +
-
+ atomic: c constants (i.e. words),
+ x variables; complex: given terms M,
+ N we have
+
+
++ +M N +⊸ elimination, function application ++ +λx.M +⊸ introduction, abstraction ++ ++ ▽d M + +diamond elimination ++ ++ △d M + +diamond introduction ++ ++ ▼d M + +box elimination ++ ++ ▲d M + +box introduction +
+
Spindle
+Spindle
-Spindle is a neurosymbolic typelogical parser for Dutch.
-- Upon entering a Dutch phrase, Spindle returns its analysis in the - form of a lambda term that records the steps of its derivation. - Derivations are driven by the type formulas assigned to words. These - formulas determine how words combine into larger phrases. -
-- More info and can be found under - About and - References. The notations used are - explained under Notation. -
- @if (spindleReady$ | async) { +Spindle is a neurosymbolic typelogical parser for Dutch.
+ ++ Upon entering a Dutch phrase, Spindle returns its analysis in the + form of a lambda term that records the steps of its derivation. + Derivations are driven by the type formulas assigned to words. These + formulas determine how words combine into larger phrases. +
+ ++ More info and can be found under + About and + References. The notations used are + explained under Notation. +
+ + @if (spindleReady$ | async) {Spindle
class="input" [class.is-danger]=" spindleInput.touched && spindleInput.invalid - " + " type="text" [formControl]="spindleInput" (keydown.enter)="$event.preventDefault(); parse()" @@ -40,107 +44,49 @@Spindle
- Spindle is temporarily unavailable. -
-} + } -Term:
- -- - c{{ i }} - - | -- {{ item.word }} - | -- |
Spindle is temporarily unavailable.
+ } -Export results
- +Term:
+ ++ + c{{ i }} + + | ++ {{ item.word }} + | ++ |