Here a very different approach to syntactic acquisition is considered, making use of statistical inferences. Feldman et al (1969) proved that so long as a grammatical system was statistical, and so utterances are produced with frequencies related to the grammar, then languages are learnable. They note that proofs such as Gold's rely on the possibility of unrepresentative 'pathological' (p52) sequences of utterances being encountered. Horning's proof used Bayes theorem, which states that the most probable grammar with respect to a corpus is that which maximises the product of the a priori probability of the grammar, and the a posteriori probability of the corpus given the grammar. If grammars are statistical then they will assign probabilities to possible sentences, and so the probability of the corpus can be determined. Minimal Description Length can be used to evaluate the probability of grammars, assigning greater a priori probabilities to simpler grammars than to more complex ones. Brent (1999) has applied this approach as a model of how children learn to segment unbroken speech into individual words, Goldsmith (to appear) has shown how it can be used to learn morphological analyses of words in several Indo-European languages, and Dowman (to appear) has demonstrated how it can account for the acquisition of verb subcategorizations.
Here research is reported which investigated whether in practice syntactic systems could be inferred without a priori constraints on possible structures. Simple grammars were created, representing subsets of the syntactic systems of several languages, including such features as gender agreement and recursion in complement clauses. It was then possible for a computational model, whose only a priori knowledge of language was a simple phrase structure grammar formalism, to learn the original grammar using the Bayesian principle. This result suggests that so long as language is learned statistically, there is enough information implicit in utterances to enable the underlying grammar to be determined. These results suggest that syntactic acquisition may involve a much greater component of learning, and less innate structure, than is often assumed.
Chomsky, N. (1986). Knowledge of Language Its Nature Origin and Use. New York; Praeger.
Dowman, M. (to appear). Addressing the Learnability of Verb Subcategorizations with Bayesian Inference.
Feldman, J. A., Gips, J., Horning, J. J., & Reder, S. (1969). Grammatical Complexity and Inference (Tech. Rep. CS 125). Stanford, CA: Stanford University: Computer Science Department.
Gold, E. M. (1967). Language Identification in the Limit. Information and Control, 16: 446-474.
Goldsmith, J. (to appear). Unsupervised Learning of the Morphology of a Natural Language.