Simulating Language 2015, lecture 11 pre-reading

The reading for lecture 11 is Culbertson, Smolensky and Legendre (2011). This is a short and sweet version of a (probably too) much longer paper, but it should give you the flavor of the basic question, method and results. We’ll go more in depth in the lecture and lab. If you’d like to read the longer paper (actually published in a journal rather than the proceedings of a conference), there’s a link to it in the references section below. Like the paper of Kenny’s you read a few weeks back, this is one of the first papers I ever wrote, so please forgive the amateurish writing!

Anyway, this paper, and the accompanying model that we’ll talk about in the lecture and lab builds on a lot of the topics you’ve already covered in the course so far. The general idea is that the structure of language systems reflects the biases of learners. In this particular case, we hypothesized these biases from language typology; if you look at a large survey of languages, you can see that the order of noun modifiers (adjectives and number words) and nouns does not appear to vary freely. Some possible patterns are very common (e.g. both modifiers after the noun), others are very rare (e.g., adjectives before the noun, but numerals after). In the paper, we used an artificial language learning (ALL) task to show that people’s biases parallel those frequency differences. The ALL task capitalizes on regularization by giving learners noisy, variable input, and measuring the extent to which they regularize (i.e., get rid of that variation). It turns out that the extent to which learners regularize depends on the ordering pattern they’re trying to learn. Patterns they find easy to learn get regularized, patterns they find harder to learn don’t.

We can straightforwardly use Bayesian inference to model the underlying biases learners bring to this experiment that influence the grammars they actually acquire (as evidenced by the phrases they produce when they are tested). You’ve already seen how the regularization bias can be modeled as a Bayesian prior (using the beta distribution). We’ll just need to add another kind of prior to the mix, one that prefers different patterns over others. The form of the likelihood will also be familiar, it’s the binomial. This is the same as you’d use to determine the probability of a certain number of heads when you’re tossing a coin, or a certain number of utterances that use one word instead of an alternative to describe some object.

References

Culbertson, J., Smolensky, P., and Legendre, G. (2011). Testing Greenberg’s Universal 18 using the Mixture Shift Paradigm for artificial language learning. In Proceeding of NELS 40, pages 133–146. GLSA, Amherst, MA.

Culbertson, J., Smolensky, P., and Legendre, G. (2012). Learning biases predict a word order universal. Cognition, 122:306–329.

Culbertson, J. and Smolensky, P. (2012). A Bayesian model of biases in artificial language learning: The case of a word-order universal. Cognitive Science, 36(8):1468–1498.