A Minimum Encoding Inference Approach to Theoretical Syntax

Chomsky (1986) has influentially argued that syntactic theories should be psychological theories of a person's knowledge of language, and as such they should be able to account for how children acquire language. Chomsky has proposed that the structure of all languages is determined by an innate universal grammar, and that variation between the syntactic systems of different languages can be accounted for by the setting of a small number of parameters determining syntactic structure. Syntactic theory is concerned then mainly with searching for abstract ways of describing languages, in order that underlying similarities can be observed, and parametric differences identified.

However, this view of the core of languages as consisting of rigid innate and universal structures does not seem in accord with much of the available evidence. Languages change gradually over time, children learn languages gradually over a period of several years, and there is a wide range of attested structures in the world's languages, so it seems that learning may be a much more important component in language acquisition than most linguists assume.

The key argument for universal grammar is that children do not receive enough information about the structure of language in order to determine its form. Gold (1967) proved that languages are not 'learnable in the limit' when the learner only has access to positive examples of that language, unless the range of possible grammars which the learner considers are greatly restricted by some system such as universal grammar.

Pinker (1989) has illustrated one aspect of this learnability problem with a discussion of verb argument structures. He argues that a child who hears constructions such as (1a) and (1b) would then form a rule such that when they observed (1c) they would incorrectly assume that (1d) was grammatical. As children do not generally receive explicit correction as to which constructions are not grammatical they would have no way of recovering from this kind of error.

(1) a. John gave a dish to Sam.
       b. John gave Sam a dish.
       c. John donated a painting to the museum.
       d. *John donated the museum a painting.

However Dowman (1998) created a system which was able to learn simple Context Free Phrase Structure Grammars for subsets of a number of languages, despite such arguments as to their learnability. If grammars are assumed to be fundamentally statistical, then it is possible to make inferences about which constructions are unlikely to be absent simply due to chance, and so which postulated constructions are probably incorrect. However, simply using statistical grammars is not enough, as a child learning a language must know at what level to make generalizations, as it is possible to create any number of ad hoc grammars to describe any corpus of data. This problem was solved by using Minimum Coding Length as a metric of the desirability of alternative grammars. (Minimum Coding Length (Ellison, 1992) is similar to Minimum Message Length, and consists of finding the shortest encoding of a grammar specified in terms of the individual symbols which it contains (in this case syntactic categories and words), and then the data specified in terms of the grammar.) Simply applying expectation maximization ultimately produces a grammar which allows the observed sentences and no others as grammatical, as long as the system has no limit on the complexity of grammars.

Programs such as this, and similar grammatical inference systems, such as Stolcke (1994), demonstrate that languages containing many of the features of natural languages, such as gender agreement, verb subcategorizaton, and infinite recursion, can be learned without a universal grammar. Instead the systems need only a learning bias as to the general form which grammars are expected to take, which in the case of Dowman (1998) consisted of a requirement that languages be described using binary branching and non branching phrase structure rules. In the light of this evidence, Gold's concept of learnability in the limit, which requires that a learner be certain of the correct grammar, does not seems interesting from a psychological point of view, where probabilistic inferences may be more useful and robust.

However, the key implication of this research is that it enables us to take a very different perspective on syntactic theory. Instead of always searching for more abstract grammars in order to find underlying universals, it is now possible simply to learn rules specifying the structures present in given languages. It may be that the best syntactic theory is that which assumes the least level of abstraction necessary to account for productivity, seeing as it is more plausible that less abstract grammars could be produced and comprehended at the high speeds necessary during spoken conversation. Hurford (1987) has argued that not all regularities in language need be explained within an ontogenetic account. Some constructions may result from historical processes, and their internal structure need not be analyzed by individual speakers of a language, something which is easy to incorporate into a Minimum Encoding Inference Model of Language, but which causes problems for Chomskyan theories.

By adopting a minimum encoding inference approach to syntactic theory it is possible to gain a new perspective on the problem of explaining acquisition, and one which radically changes our criteria for determining what form a syntactic theory should take. Whilst most work on the machine learning of language has been aimed at producing language technology systems, it seems that machine learning methodologies are essential to making real progress in syntactic theory. While so far analyses of natural language have concentrated on fairly restricted aspects of structure, there is a big potential for applying machine learning techniques to all areas of syntactic theory. Syntactic theories which more closely model the language knowledge of individuals can be applied in explaining social variation, historical change and in clinical linguistics, as well as providing a basis for natural language processing systems.

References

Chomsky, N. (1986). Knowledge of Language, Its Nature, Origin and Use. New York: Praeger.

Dowman, M. (1998). A Cross-linguistic Computational Investigation of the Learnability of Syntactic, Morpho-syntactic, and Phonological Structure. Research Report, University of Edinburgh, Center for Cognitive Science.

Ellison, T. M. (1992). The Machine Learning of Phonological Structure. Doctor of Philosophy Thesis, University of Western Australia.

Gold, E. M. (1967). Language Identification in the Limit. Information and Control, 10:447-474.

Hurford, J. (1987). Language and Number The Emergence of a Cognitive System. Oxford: Basil Blackwell.

Pinker, S. (1989). Learnability and Cognition The Acquisition of Argument Structure. Cambridge, Massachusetts: MIT Press.

Stolcke, A. (1994). Bayesian Learning of Probabilistic Language Models. PhD dissertation, University of California Berkeley.