Combinatory Categorical Grammar (CCG; Steedman, 2000) is a highly lexicalized formalism. The standard parsing model of Clark and Curran (2007) uses over 400 lexical categories (or supertags), compared to about 50 part-of-speech tags for typical parsers.
Example:
Vinken | , | 61 | years | old |
---|---|---|---|---|
N | , | N/N | N | (S[adj]\ NP)\ NP |
The CCGBank is a corpus of CCG derivations and dependency structures extracted from the Penn Treebank by Hockenmaier and Steedman (2007). Sections 2-21 are used for training, section 00 for development, and section 23 as in-domain test set. Some work
Model | Accuracy | Paper / Source |
---|---|---|
Vaswani et al. (2016) | 88.32 | Supertagging with LSTMs |
Lewis et al. (2016) | 88.1 | LSTM CCG Parsing |
Xu et al. (2015) | 87.04 | CCG Supertagging with a Recurrent Neural Network |
Kummerfeld et al. (2010), with additional unlabeled data | 85.95 | Faster Parsing by Supertagger Adaptation |
Clark and Curran (2007) | 85.45 | Wide-Coverage Efficient Statistical Parsing with CCG and Log-Linear Models |
Model | Accuracy | Paper / Source |
---|---|---|
Xu et al. (2015) | 82.49 | CCG Supertagging with a Recurrent Neural Network |
Kummerfeld et al. (2010), with additional unlabeled data | 81.7 | Faster Parsing by Supertagger Adaptation |
Model | Bio specifc taggers? | Accuracy | Paper / Source |
---|---|---|---|
Kummerfeld et al. (2010), with additional unlabeled data | Yes | 82.3 | Faster Parsing by Supertagger Adaptation |
Rimell and Clark (2008) | Yes | 81.5 | Adapting a Lexicalized-Grammar Parser to Contrasting Domains |
Xu et al. (2015) | No | 77.74 | CCG Supertagging with a Recurrent Neural Network |
Kummerfeld et al. (2010), with additional unlabeled data | No | 76.1 | Faster Parsing by Supertagger Adaptation |
Rimell and Clark (2008) | No | 76.0 | Adapting a Lexicalized-Grammar Parser to Contrasting Domains |
For Supertagging evaluation on CCGBank, performance is only calculated over the 425 most frequent labels. Models are evaluated based on accuracy.
Model | Accuracy | Paper / Source |
---|---|---|
Clark et al. (2018) | 96.1 | Semi-Supervised Sequence Modeling with Cross-View Training |
Lewis et al. (2016) | 94.7 | LSTM CCG Parsing |
Vaswani et al. (2016) | 94.24 | Supertagging with LSTMs |
Low supervision (Søgaard and Goldberg, 2016) | 93.26 | Deep multi-task learning with low level tasks supervised at lower layers |
Xu et al. (2015) | 93.00 | CCG Supertagging with a Recurrent Neural Network |
Clark and Curran (2004) | 92.00 | The Importance of Supertagging for Wide-Coverage CCG Parsing (result from Lewis et al. (2016)) |
There has been interest in converting CCG derivations to phrase structure parses for comparison with phrase structure parsers (since CCGBank is based on the PTB).
Model | Accuracy | Paper / Source |
---|---|---|
Kummerfeld et al. (2012) | 96.30 | Robust Conversion of CCG Derivations to Phrase Structure Trees |
Zhang et al. (2012) | 95.71 | A Machine Learning Approach to Convert CCGbank to Penn Treebank |
Clark and Curran (2009) | 94.64 | Comparing the Accuracy of CCG and Penn Treebank Parsers |