The reading for this lecture is Kirby, Dowman & Griffiths (2007), where we use a Bayesian iterated learning model to explore how the prior bias of learners and the transmission bottleneck influence the kinds of languages we can expect to see. The paper also attempts to explain how this links to the debate on language innateness and language universals; we also mention (in the abstract and conclusions) the implications this might have for the evolution of the language faculty, which is something we will return to in lecture 14.
You are already pretty well equipped to handle the technical aspects of this paper – remember that it’s just Bayesian iterated learning, you know what iterated learning is, and you know how Bayesian inference works. The paper is quite compact though, so there are a couple of things that might stand a little extra explanation.
- On page 5242 we say “It is well known that the stationary distribution over states in the Markov chain is proportional to the first eigenvector of the transition matrix, providing the Markov chain is ergodic. (That is, so long as each state is reachable from every other state in a number of steps that has no fixed period.) Normalizing the first eigenvector so that it totals one thus reveals the probability of a learner speaking any particular language once iterated learning has converged on a stationary distribution; essentially, the expected distribution of languages emerging from cultural evolution.”That’s a bit of a mouthful, but the very last bit is the key message: the “stationary distribution” is “the expected distribution of languages emerging from cultural evolution”. In other words, if you ran an iterated learning model for a while and counted up the various languages types you got over the course of the simulation, that would give you a distribution over language types (3 of type 1, 7 of type 2, 123 of type 3, etc). If you ran the iterated learning simulation for long enough (potentially a *really really* long time), you would discover that this distribution over language types settled down – the language might change from time to time, but the various language types would be represented in the same distribution (always roughly 3% type 1, 7% type 2, 53% type 3, etc). That is the stationary distribution – the product of a long process of cultural evolution. This business with eigenvectors is a little bit of mathematical jiggery-pokery which means you can skip the process of running long simulations, and calculate the stationary distribution just by looking at the probability of one language changing into another at each episode of learning (the transition matrix).
- We use a slightly fancier prior than any of the ones we have used so far, but intuitively it’s rather simple. As we explain in the text, we model a language as a set of meanings associated with signal classes - the example we give in the text is that the meanings might be different verb stems, and the signal classes might be different ways of forming the past tense for each verb. A regular language uses the same signal class for each verb, whereas an irregular language might use a different signal class for some verbs, or even a different signal class for every verb. We write the languages as a sequence of letters: e.g. aaaa is a language where there are 4 meanings (e.g. verbs) and every one is associated with signal class a (e.g. forms its past tense using ending a); abcd is a highly irregular language where every meaning is expressed using a distinct signal class. Our prior assumes that regular languages have higher prior probability; there is a parameter (called alpha – isn’t it always?) that controls the strength of this preference, with low alpha creating a very strong preference in favour of regular languages (top panel of Fig 3) and high alpha creating a very very weak preference for regular languages (a virtually, but not actually, flat prior, as in the bottom panel of Fig 3).
- m in Fig 3 refers to the number of data points each learner learns their language from, i.e. the bottleneck on transmission: m=3 means that each learner only gets 3 meaning-signal class pairs, m=10 means they get 10.
About the authors
Mike Dowman was a post-doc here in Edinburgh back in the mid 2000s.
Tom Griffiths is a Prof at Berkeley, and one of the leading figures in the Bayesian revolution in the cognitive sciences – what Tom doesn’t know about Bayesian inference probably isn’t worth knowing. He is responsible for introducing Bayesian learning to iterated learning models, which I think has been an important development for the field in improving the clarity and rigour of our models.
Once you have done the reading, have a go at the post-reading quiz and look at my comments.
References
Kirby, S., Dowman, M. and Griffiths, T. (2007) Innateness and culture in the evolution of language. Proceedings of the National Academy of Sciences, 104, 5241-5245.
