MetaTOC stay on top of your field, easily

Bayesian Tree Substitution Grammars as a Usage-based Approach

,

Language and Speech

Published online on

Abstract

Tree substitution grammar (TSG) is a generalization of context-free grammar (CFG) that permits non-terminals to rewrite as fragments of arbitrary size, instead of just depth-one productions. We discuss connections between the TSG framework and the larger family of usage-based approaches to language, showing how TSG allows us to make some of the claims of these approaches sufficiently concrete for computational modeling.

A fundamental difficulty in defining a TSG is to determine the set of fragments for the grammar, because the set of possible fragments is exponential in the size of the parse trees from which TSGs are typically learned. We describe a model-based approach that learns a TSG using Gibbs sampling with a non-parametric prior to control fragment size, yielding grammars that contain mostly small fragments but that include larger ones as the data permits. We evaluate these grammars on two tasks (parsing accuracy and grammaticality classification), and find that these Bayesian TSGs achieve excellent performance on two tasks relative to a set of heuristically extracted TSGs spanning the spectrum of representations, from a standard depth-one context-free Treebank grammar to explicit approximations of the Data-Oriented Parsing model.