Commentary on Michael Arbib's target article in Behavioral and Brain Sciences ``From monkey-like action recognition to human language: An evolutionary framework for neurolinguistics''

(Other ``Supplemental Commentaries'' on that target article, and a link to Michael Arbib's ``supplemental response''.)


James R Hurford
Language Evolution and Computation Research Unit
School of Philosophy, Psychology and Language Sciences
University of Edinburgh
Adam Ferguson Building, George Square, Edinburgh, EH8 9LL
+44 131 650 3959


Two points incidental to the target article's themes are picked up, relating in different ways to the distinction between actions and objects. (1) Actions and objects are mentally represented, not as different types or categories, but with varying degrees of involvement of motor and perceptual activation. (2) The meanings signalled by earliest forms of language were not necessarily action-object dyads.

There is much to agree with in the target article (henceforth TA). The broad speculative scenario for language evolution is quite plausible, in the same way as Jackendoff's (2002) scenario. Leaving the TA's arguments about specific neural structures to the neuroscientists, I discuss alternatives to two incidental details of the broad evolutionary story which are not essential to its central thread. The main themes could survive, indeed be made more plausible, if these two incidental aspects are jettisoned. Both points to be discussed relate, in different ways, to the TA's assumption of a basic categorical distinction between actions and objects.

The TA states that it disagrees "with Hurford's suggestion that there is a mirror system for all concepts -- actions, objects, and more -- which links the perception and action related to each concept." This is not an accurate representation of Hurford's stated views, and I don't believe there is actually any disagreement here. Hurford's actual view is quoted here:

"Mirror neurons are, by definition, only involved in the representations of actions, such as grasping and walking. Therefore, adhering to a narrow definition of mirror neuron, we cannot claim that the mental representations of objects, such as apples and screwdrivers, involve mirror neurons. Apples and screwdrivers are not actions. But it seems likely that representations of objects involve some congruence between motor and sensory neurons, similar to that found in the representations of actions." (Hurford, 2004)
Gallese, in an excellent article on the neural representation of concepts, summarizes evidence for a view consistent with Hurford's, and, probably, Arbib's:
"Several brain-imaging experiments have indeed shown that observation, silent naming and imaging the use of man-made objects leads to the activation of the ventral premotor cortex (Perani et al. 1995; Grafton et al. 1997; Martin et al. 1996; Chao & Martin 2000), a brain cortical region normally considered to be involved in the control of action and not in the representation of objects. The properties of these objects, that is, their relational specifications (how they are supposed to be handled, manipulated and used), appear to comprise a substantial part of their representational content." (Gallese, 2003: 1235)

At the level of generality of the present discourse, Gallese's, Arbib's and Hurford's views of the neural instantiation of concepts seem consistent. The idea of an object's affordances, which Arbib uses, focusses on the meeting-point of perceptual and motor processes. What is relevant to a human about an apple is not merely its abstract near-spherical shape but its concomitant graspability. I can agree with Arbib's view of the neural instantiation of a concept as "a graded set of activations of the schema network", where this network relates both perceptual and motor schemata. I would emphasize the fuzziness implicit in the term 'graded'. Different basic schemata will be activated to different degrees in the representation of different concepts. For instance, bringing to mind the concept of a doorknob would be expected to involve a certain grasping motor schema more strongly than the concept of a soccer ball, although a basic perceptual schema for spherical object is involved in both.

Unfortunately, researchers from widely different disciplines approach the notion of a 'concept' with radically different baggage. I think I start from a position similar to Arbib's and Gallese's, certainly not wanting to make symbolization by words an essential component of concepthood. Many concepts are higher-level entities than the basic schemata of schema theory. A concept may be seen as a (perhaps hyper-bell-shaped) distribution of activation in a highly multimensional space involving both premotor and perceptual brain regions. The possibility of both motor and perceptual components of object concepts seems implicit in Arbib's assertion that "You can pantomime an object either by miming a typical action by or with the object or by tracing out the characteristic shape of the object."

Arbib concedes that "only rarely (as in the case of certain basic actions such as grasp or run or certain expressions of emotion) will the perceptual and motor schemas be integrated into a 'mirror schema'." But how rare is rare? And how basic is basic? If GRASP and RUN are allowed this special status, then why not EAT, DRINK, BREATHE, EXCRETE, URINATE, HOLD, LIFT, DROP, PUNCH, KICK, SIT, STEP, STOOP, KNEEL, LIE, TURN, WRIGGLE, WAVE, and so on? Surely our understanding of all these involves both sensory information about what they look like when others do them and motor information about how to do them ourselves.

Finally on this point, the TA refers to "Hurford's suggestion that there is a mirror system for all concepts". On the contrary, my skepticism about the natural unity of such an idea as 'mirror system' may be seen from such statements as "a wide range of animal behaviours probably involve arrangements more or less like mirror neurons, depending on how far one is prepared to stretch the term. ... There are prototypical, clear central cases of mirror-neuron-like arrangements, and there are cases partially resembling them in relevant ways" (Hurford, 2004), and "mirror neurons occupy one corner of a continuous, extremely diverse, space of possible neuronal arrangements" (Hurford, 2004). Again, despite his assertion of disagreement, it is possible that this view is consistent with Arbib's, as he also refers to "some not-quite-mirror neurons in the region of STSa in the superior temporal sulcus."

The next point of contention with the TA concerns its adoption of an analytic evolutionary route to compositional syntax. Both the analytic and the synthetic routes to compositionality end up with a system comprising atomic terms and rules for combining them. Rules of combination are synthetic, in that they put atomic terms together to make larger expressions. Syntax is essentially synthetic by definition. The issue is whether this synthetic state of affairs was arrived at, in the primeval origins of languages, by a period in which the reverse 'analytic' process (taking things apart, fractionation) happened.

The synthetic account is the simpler account. The analytic story postulates original holistic expressions, then proceeds via fractionation to atomic terms. The synthetic story simply postulates original atomic terms. Both accounts require the appearance of syntactic rules of combination. The TA argues 'Imagine that a tribe has two unitary utterances concerning fire which, by chance, contain similar substrings which become regularized so that for the first time there is a sign for "fire".' This is obviously less simple that the alternative account: imagine that a tribe has a sign for 'fire'. Wray (2000) has argued that, on the contrary, considerations of simplicity favour the analytic account, because holistic expressions are still very common in modern languages, a carry-over, it is argued, from holistic pre-modern communication. But this neglects the fact that modern holistic expressions, whose meanings are admittedly not functions of the meanings of their parts, are nevertheless composed of recognizable subelements, such as the three separate recognizable words in 'kick the bucket'. The holistic expressions of modern languages arose through a process of historical fossilization of previously productive constructions. This is not to deny, of course, that modern humans retain a strong disposition to mix holistic and compositional expressions in their language.

What were the first meanings? Wray (2000) asserts "... there is no place in such a protolanguage for individual words with a referential or descriptive function" (294). The TA at best only partly agrees: a property of protolanguage is said to be "The ability to associate symbols with an open class of episodes, objects, or actions". This sounds like the possibility of protolanguage symbols with a referential function. Given this assertion, one might expect there to be elementary symbols for such simple meanings as MAN, FIRE, TIGER, COLD, RAIN. I suspect that the penchant for the more complex story comes from a tacit acceptance of the schoolbook idea of defining sentences as expressing whole meanings. The idea of a 'whole meaning' is never satisfactorily defined independently of grammatical subject-predicate structure, which makes its use in the definition of sentence hopelessly circular. But the premise lurks that a 'whole meaning' must have two parts, corresponding to the two parts of a subject-predicate sentence. There seems also to be a further assumption that these two semantic entities are significantly contentful, bearing some descriptive meaning, as opposed, say, to being purely deictic. The TA suggests "... the immediate hominid precursors of Homo sapiens would have been able to perceive a large variety of action-object frames". This gives unmerited priority to a particular kind of meaning, involving two substantial elements, a nameable action and a nameable object.

I accept that a simple whole meaning, of the kind destined to find eventual linguistic expression, has two parts, a predicate and an argument, formally PREDICATE(x). This is argued at length in Hurford (2003), where it is also argued that these two separate semantic elements can be identified with separate neural pathways. But note that the formal element which is the argument of the predicate is a variable, not a constant. Thus, I claim that basic meanings are such as FIRE(x), paraphraseable as "There's a fire", DEAD(x) "It is dead", DADDY(x) "It is Daddy", COME(x) "It is coming this way". The deictic elements 'there' and 'it' in these paraphrases can be glossed for modern purposes as 'the place/thing/animal we are jointly attending to'. In simple communication, any object of joint attention need not be mentioned explicitly, as it is provided by the context. A large vocabulary of 'one-word' descriptive symbols could have been very useful, even without, or before, the advent of syntactic rules for combining them into an even more useful recursive communication system.


Chao, L.L. & Martin, A. (2000) Representation of manipulable man-made objects in the dorsal stream. Neuroimage 12:478-484.

Gallese, Vittorio (2003) A neuroscientific grasp of concepts: from control to representation. Philosophical transactions of the Royal Society of London, B 358:1231-1240.

Grafton, S.T., Fadiga, L., Arbib, M.A. & Rizzolatti, G. (1997) Premotor cortex activation during observation and naming of familiar tools. Neuroimage 6:231-236.

Hurford, James R. (2003) The neural basis of predicate-argument structure, Behavioral and Brain Sciences 26(3):261-283.

Hurford, James R. (2004) Language beyond our grasp: what mirror neurons can, and cannot, do for language evolution. In Evolution of communication systems: a comparative approach, eds. Kimbrough Oller, Ulrike Griebel and Kim Plunkett, MIT Press.

Jackendoff, Ray (2002) Foundations of language: brain, meaning, grammar, evolution. Oxford University Press.

Martin, A., Wiggs, C.L., Ungerleider,L.G. & Haxby, J.V. (1996) Neural correlates of category-specific knowledge. Nature 379:649-652.

Perani, D., Cappa, S.F., Bettinardi, V., Bressi, S., Gorno-Tempini, M., Matarrese, M., & Fazio, F. (1995) Different neural systems for the recognition of animals and man-made tools. Neuroreport 6:1637-1641.

Wray, Alison (2000) Holistic utterances in protolanguage: the link from primates to humans, in The evolutionary emergence of language: social function and the origins of linguistic form, eds. Chris Knight, Michael Studdert-Kennedy, & James R. Hurford. Cambridge University Press.