James R Hurford

(Last revised, May 2001. To appear in The Transition to Language, edited by Alison Wray, Oxford University Press.
Note: This HTML version may differ slightly from the printed version; the printed version is the `authorized' version.)


1 Introduction

`We should search for the ancestry of language not in prior systems of animal communication but in prior representational systems.' (Bickerton, 1990:23 [emphasis added, JRH])

This quotation makes a negative point and a positive point, given added emphasis above. The idea that language, and by implication much of its current complex structure, arose from pre-linguistic representational systems has attracted attention and not much criticism. A goal of evolutionary linguistics is to explain the origins of the structure found in language. It can be agreed that little of the distinctively complex structure of modern languages can be attributed to ancestry in animal communication systems1. But how much of the complex structure of modern languages can be attributed to ancestry in pre-linguistic representational systems? Sampson (1997) expressed a view opposed to Bickerton's.

` ... it is not plausible that our internal representation of statements, which we use in order to reason and draw inferences in other modes, will map in a simple element-by-element fashion into the words with which we express those statements in speech. ... Nobody really has the least idea what is physically going on in the head when we reason, but I agree that whatever goes on is likely to relate in a fairly abstract way to the words of spoken utterances, which are adapted to the necessary linearity of speech and to the fact that speaker and hearer are working with separate models of reality.' Sampson, 1997:100)

Pinker and Bloom (1990) make a similar point:

`It is occasionally suggested that language evolved as a medium of internal knowledge representation for use in the computations underlying reasoning. But although there may be a languagelike representational medium -- ``the language of thought,' or ``mentalese'' (Fodor 1975) -- it clearly cannot be English, Japanese, and so on. Natural languages are hopeless for this function: They are needlessly serial, rife with ambiguity (usually harmless in conversational contexts, but unsuited for long-term knowledge representation), complicated by alternations that are relevant only to discourse (e.g. topicalization), and cluttered with devices (such as phonology and much of morphology) that make no contribution to reasoning.' (Pinker & Bloom, 1990:714)

This chapter will provide extended illustration of these views briefly expressed by Sampson and Pinker & Bloom. The chapter is intended as a counterblast to the view that language has more to do with mental representation than with communication, whether now, as emphasized by Chomsky (e.g. Chomsky 1980) or in its origins, as emphasized by Bickerton (e.g. Bickerton 1990, 1998). All the facts that I mention are extremely well known to linguists, but I hope this will be useful in drawing attention to what cannot be explained by way of mental representation, as well as reminding non-linguists that syntax is not all just common sense. Thus, in this paper, I argue for the two following related propositions:

A corollary of these propositions, not pursued in detail here, is:

In the next section, to start this argument, independent characterizations of non-linguistic mental representations and the structure of language are set out. The following sections conduct a survey of the central layers of the structure of any language, its phonology, morphology and syntax, arguing in all cases that the structuring concerned plays no role in the representation of thought, but defines, or constitutes, the mapping of thoughts onto linguistic expressions.


2 Mental Representation

In a polemic passage, Chomsky (1980:229-230) disparages the idea of communication as the essential function of language, preferring to see language as enabling the expression of thought. I will not quibble over the term `essential' here; I will use `communication' and `expression of thought' interchangeably in this paper, but the latter term has the virtue of highlighting a clear separation between language and thought. Linguistic form, in this view, is something different from thought itself, which is `expressed' in language. Thought which remains unexpressed does not take linguistic form. Much of our thought is of this unexpressed kind, i.e. not in language. Yet unexpressed thought is not formless or contentless, and so one can speak meaningfully of it as a kind of representation.

It is assumed here that the existence of nonlinguistic representations is unproblematic, contrary to the views of a few philosophers (e.g. Stich (1983), Judge (1985), Schiffer (1989), Horst (1996)). Beyond the assumption of their existence, no particularly strong further assumptions are made here about mental representations. For example, the view of nonlinguistic representations taken here is compatible with, but not dependent on, distributed connectionist views of how to code the input to the expression of thought. But the argument pursued here will naturally emphasize dissimilarities between language structure and the structure of nonlinguistic mental representation.

Nonlinguistic mental representations are possessed by animals and prelinguistic infants for remembering and thinking about events in the world. They are derived from extero- and intero-perception, such as perceptions of light, heat, touch, sound, thirst and hunger. Nonlinguistic mental representations are often referred to as constituting the `language of thought' (as in Fodor, 1975) or `mentalese'. The language metaphor, implicit in both Fodor's title and the `-ese' suffix, is attractive because it alludes implicitly to the complex structure of thought. But the language metaphor is also misleading. Fodor's Language of Thought clearly does not have much of the structure of a public language, such as French or Swahili. Indeed, it is exactly the non-language-like features of nonlinguistic mental representations that are at the core of my argument here. The essential differences between an internal (cognitive) representation system and a communication system are as follows.

A communication system maps external forms (such as speech sounds or manual signs), via mental structures, to meanings (where many, if not all, meanings relate to external objects, events or situations). A communication system is typically public, shared by many individuals2.

A representation system lacks the mapping to external forms, and merely provides mental structures which relate to, or denote, external situations. There would be no practical advantage in having a representation system which was not in some way related to the world outside the mind possessing it.

Thus a communication system properly includes a representation system. There are elements in a communication system that are not part of the inherent representation system. Analogously, there are elements in a computer system which relate only to keyboard and screen functions and not to the core business of computation. Any aspects of a communication system which pertain only to the mapping between external forms (i.e. sounds or signs) and the internal cognitive representation system are not part of the representation system per se.

Nonlinguistic mental representations are non-temporal; all parts of the representation of a remembered event are simultaneously present to the mind. Nonlinguistic mental representations are multi-dimensional; for example, they are often diagrammed on paper as networks, with hierarchical relationships between the parts, and/or as composed of features (which can be seen as dimensions). Nonlinguistic mental representations do not exist in the same medium as the external forms to which they are mapped by the structure of a language; specifically, they are non-acoustic and non-manual. With nonlinguistic mental representations, no issue of ambiguity arises; they are what they are (although mental representations may be vague or general).

By contrast, utterances are temporal. Utterances in spoken language are acoustic events, and in sign language, manual events. The raw unprocessed speech signal which reaches the eardrum is a complex sound wave, no more than a temporal sequence of variations in air pressure. The variations are more or less strong surges and declines in pressure, with periods of stillness. At any instant in time, the only information immediately available in this signal is the relative strength of the change in air-pressure, a single (positive or negative) number. At bottom, the whole rich linguistic fabric of an utterance, from phonemic oppositions (e.g. what makes a `b' different from an `s') through syllables, morphemes, words and phrases to clauses and sentences, is signalled by this temporal sequence of air-pressure variations. Thus utterances are linear or one-dimensional sequences of events; any perceived imposition of further dimensions on the signal (e.g. by intonation) arises from knowledge of the mapping between utterances and the nonlinguistic mental representations of their meanings. The term `one-dimensional' emphasizes that the events or `landmarks' in the temporal sequence are distinguished by their values on a single parameter, that of relative pressure. (In sign language, admittedly, some degree of simultaneity is present in the manual signals.) Utterances are frequently ambiguous; as computational linguists know to their cost, ambiguity, especially local ambiguity, is rife in language. Ambiguity arises at all levels of linguistic structure. For instance, the utterance `I'm coming to get you' is ambiguous between a threat and a promise of help; the sentence Visiting relatives can be boring can be understood as describing at least two different situations; the word list, like many other English words, has many senses; phonetically, in English a plosive where voicing commences simultaneously with release can be interpreted as either `voiced', as in beer, or `voiceless', as in spear. These are examples contributing to the many-to-one mapping between nonlinguistic representations and linguistic strings. (In fact, given the existence of synonymy and paraphrase, the overall mapping is many-to-many.)

The problem of expressing multi-dimensional mental representations as one-dimensional sequences of sounds is analogous to any problem involving dimension-squashing. Consider the two-dimensional picture in Figure 1:

Figure 1.

Now, is this a picture of a solid cube or of the inside of a room (showing the ceiling, far wall, and right-hand wall)? As you look at it, the interpretation will probably switch back and forth. The picture is ambiguous, because it has squashed three dimensions into two. Consider now how much further information is lost by trying to depict a cube (or the inside of a room, if you will) in just one dimension. The diagonal line in the Figure 2 is an attempt to represent the solid body one-dimensionally, with the blobs on the line as distinguishable points purporting to correspond to the vertices in the two-dimensional picture.

Figure 2.

As a linguistic example of the dimension-squashing problem faced by the expression of thought, consider argument selection, as discussed by Dowty (1991). Noun phrases in some sentences reflect the Agent (or Patient) role more clearly than the noun phrases in other sentences. For instance, Jim is more clearly an active (i.e. agentive) participant in Jim kicked the editor than in Jim admires the editor. And likewise, the editor is more clearly affected (i.e. fulfils the Patient role) by the situation described by the first sentence than in the situation described by the second. This induces Dowty to postulate the `proto-roles' Proto-Agent and Proto-Patient. A participant in an event can conform to a greater or lesser degree to the set of criteria defining Proto-Agent or Proto-Patient. Another way of putting this is to say that the concepts of Agent and Patient are not atomic, but are clusters of values in a multi-dimensional space. Participants in mentally represented events which conform most closely to the Proto-Agent criteria are near the centre of the Agent region of this multi-dimensional space; likewise, participants which conform most closely to the Proto-Patient criteria are near the centre of the Patient region of this multi-dimensional space. Participants which do not meet many of the criteria are on the outskirts of the regions. Dowty's six criteria are as follows:

The 6-dimensional space of Agenthood and Patienthood

(after Dowty (1991))

proto-AGENT proto-PATIENT
Volitional involvement Non-volitional involvement
Sentience or perception No sentience/perception
Causing event Causally affected
No change of state Change of state
Moving Stationary
Independent existence No (independent) existence

Figure 3

Agent and Patient are thus seen as complex multidimensional notions3. Let us grant, as proponents of the view I am opposing4. typically do, that our prelinguistic ancestors would have represented the events in their social and material environment with concepts hardly less complex than this. They would have had mental representations incorporating information along all these six dimensions, i.e. about who deliberately did what to whom, who felt what, what caused what to happen, what changed somehow, what moved, and what suddenly appeared. I agree that all such information would have been represented by prelinguistic creatures. The problem is to convey all this in speech. In basic cases, languages solve the problem by rules of argument selection, which map points in this six-dimensional space onto two grammatical polar points of a transitive clause, namely Subject and Object. In the vast majority of languages, if not all, the organization of basic transitive clauses has the Subject preceding the Object. Thus the word-string typically signals information compressed out of the six-dimensional Agent-Patient space by linear order of two major parts of the string.

Figure 4.

The syntactic relations Subject and Object are universally available, and probably universally used, in the grammars of languages5. They do not belong to nonlinguistic (or prelinguistic) mental representations, but are rather part of the solution to the problem of mapping `propositional structures onto a serial channel' (Pinker and Bloom, 1990:713).

In the following sections, a survey will be presented of other such aspects of the structure of language which are parts of the solution to the expression problem, rather than aspects of the pre-existing mental representations. Languages are very complex, highly structured communication systems. The view that linguistic structure derives from representation systems existing prior to language can only be sustained to the extent that there is no structure that is only part of the communicative aspect of a language system. How much of language structure is purely representational, and how much of it is part of the mapping to external forms? One cannot quantify such questions, but the answer advocated here is that almost all of the complex structure of languages belongs to their expressive aspect, and very little to their purely representational aspect.


3 Language Structure 1: Phonology and morphology

Linguists know that each of the over 6000 languages in the world is an extremely complex system, with a great wealth of detailed structure at all levels. In an interdisciplinary book such as this, whose readers include anthropologists, psychologists, and biologists with an interest in the evolution of language, it is worth briefly rehearsing some of the details of what linguistic structure consists of.

A language is a system of mappings between meanings and external forms (usually sounds but also manual signs), as diagrammed in Figure 5.

Figure 5.

The bidirectional arrows indicate that the map between meanings and sounds can be followed in either direction --- from meanings to sounds when speaking, and from sounds to meanings when listening. The language structure itself (the map) is neutral between speaking and hearing.


3.1 Double Articulation

Note the so-called `double articulation' (or `duality of patterning') in the above figure, the separation into two distinct levels of organization, phonology and morphosyntax. What this means is that a language has two quite distinct kinds of rules for `putting things together', and these rules deal in quite different basic units. Phonology, or phonological rules, puts sounds (individual consonants and vowels) together to make syllables and basic units of meaning, called morphemes. Morphosyntactic rules take these elements and put them together to make sentences. The linguistic structure of a sentence is basically a two-tier structure, to put the matter at its simplest. This duality of patterning is a fundamental universal characteristic It has no motivation in a purely representational system, but plausible arguments can be advanced for its communicative adaptiveness.

It is important, in communication, to be able to locate the boundaries of the elements composing the one-dimensional speech string. In written English, spaces serve this function. In speech, syllables have characteristic structure, with beginnings and ends, facilitating the location of their boundaries by a hearer. The beginning-end, or onset-coda, structure of syllables, plus the requirement for syllables to be distinctive, is the basic framework on which the rich phonological systems of languages elaborate. Phonology is a syntax of sounds, without any concomitant semantics. Individual consonants and vowels have no meanings of their own, and hence have no counterparts in non-linguistic mental representations, but they play a solid part in an organizational level of language structure, namely phonology.


3.2 Phonological Structure

The business of phonetics is simply the physical description of sounds (in either articulatory or acoustic terms). The business of phonology, on the other hand, as outlined in the box of Figure 6 below, is to describe how different languages organize their sounds. Such organization includes: which sounds to use, out of the hundreds possible (e.g. English uses sounds not used in French and vice versa); how to combine sounds (e.g. some languages don't allow two consonants together); how sounds are modified in various contexts (e.g. English vowels are longer before voiced consonants than before voiceless ones). To give some impression of the complexity of the phonological aspects of linguistic structure, given below in Figure 6 is a summary of the table of contents of a textbook on phonology.

Phonological structure, what's in it?
A typical overview (from Katamba (1989))

The phoneme, including Distinctiveness: phonemes and allophones, and Phonological symmetry. [19 pages]
Distinctive features, including Major class features, Cavity features, Tongue body features, Tongue root features, Laryngeal features, Manner features, Prosodic features, Segment structure redundancy. [25 pages]
Phonological representations [19 pages]
Phonological processes, including Assimilation, Direction of assimilation, Assimilation processes, Palatalisation, Labialisation, Voice assimilation, Place of articulation assimilation, Manner of articulation assimilation, Nasalisation, Dissimilation. [19 pages]
Naturalness and Strength, including Natural segments, natural classes and natural processes. Phonological strength hierarchies. [19 pages]
Interaction between rules, including Introduction to rule formalisation and ordering, Linear rule ordering. [17 pages]
The abstractness of underlying representations. [19 pages]
The syllable, including The representation of syllable structure, the CV-tier, A generative CV-phonology model of syllable structure, Syllabification, Functions of the syllable, The syllable as the basic phonotactic unit, The syllable as the domain of phonological rules, The syllable and the structure of complex segments, Compensatory lengthening, The syllable as indispensable building block for higher phonological domains, Syllable weight, Abstract segments, Extrasyllabicity. [33 pages]
Multi-tiered phonology, including Tone languages, The representation of tone, Contour tones, Tone stability, Melody levels, Tone and intonation, Pitch accent, Vowel harmony, Nasalisation, Morphemic tier. [30 pages]
Stress and intonation, including Stress, Metrical phonology, Metrical trees and grids, Extrametricality, Quantity sensitivity, Intonation, Accentuation function, Intonation and illocutionary force, Grammatical function of intonation, Attitudinal functions of intonation, Discourse function of intonation. [33 pages]

Figure 6

This example is quite typical of phonology textbooks (although they all have their own characteristic differences of emphasis). Clearly, phonological structure is complex and requires a lot of explaining. The phonological component of a language comprises a very significant proportion of its structure. Phonological structure is (part of) the mapping between internal representations of meanings and their external expressive forms. A purely representational system has no mapping to external expressive form. Obviously, then, all of phonological structure belongs in the aspect of linguistic structure dealing with the (public) expression of thought.


3.3 Morphology versus syntax

On the morphosyntactic side of the duality of patterning, the universal distinction between morphology and syntax (however that is drawn) plays no role in non-communicative representation. This distinction rests on the discrimination by languages of a level of words, which are small-to-middle-sized units distinct from both semantically or grammatically atomic morphemes and higher level syntactic units such as phrases. Here is an example of somewhat complex morphology from Turkish:

sevildirememek `not to be able to cause to be loved'

sev il dir e me mek
`love' Passive Causative Ability Negative Infinitive

The Turkish expression is a single word consisting of six separate morphemes stuck together (`agglutinated'). The same meaning in English would be expressed by a phrase consisting of at least the same number of separate words. This common meaning, which can be diagrammed as some structured configuration of the elements `love', Passive, Causative, Ability, Negative, Infinitive is mapped onto linguistic strings quite differently in English and Turkish. The two languages segment the linguistic string into words in radically different ways. The criteria by which we can satisfy ourselves that we are dealing with a single word in Turkish, and not a lot of small ones, are empirically relatively clear. For example in Turkish, all the vowels within a word must harmonize with each other according to strict rules, and the placement of stress is calculated on the basis of word-level units. But word-level units differ in size and number of elements from one language to another. In uninflected languages (e.g. Vietnamese) there is a greater correspondence than in other languages between word-sized chunks and what might be basic units of meaning, or conceptual representation. Segmentation into words is not inherent in nonlinguistic representations. This is an aspect of linguistic structure which is part of the expressive apparatus for mapping nonlinguistic representations onto strings.

A morpheme is standardly defined as the `minimal unit of meaning or grammatical function' (Yule, 1985:60) in a language. This carefully disjunctive definition shows that even the basic building blocks of morphosyntax cannot all be taken to serve as plausible candidates for elements of nonlinguistic representation, existing before syntactic organization. Of the six morphemes in the Turkish example above, at least two, Passive and Infinitive, are clearly grammatical morphemes, rather than lexical (i.e. semantically contentful), and are thus part of the solution to the expression problem, rather than elements of nonlinguistic mental representation.

The volume of work on the morphological structure of language is not far short of the volume of work on phonological structure. Suffice it to say here that, within morphology, various structural features, such as the layering of inflectional morphemes outside derivational morphemes, and the inventory of structural devices used in word-formation (affixation, suppletion, fusion, cliticization, reduplication, compounding) also play no purely representational role. These processes, by which semantically and grammatically functional minimal elements are assembled into word-level units, and which vary from language to language, are not motivated by any structural characteristic which can plausibly be attributed to nonlinguistic (or prelinguistic) representations.


4 Language structure 2: Syntax

Syntax remains the central focal area of the structure of language. And it is in syntax that the most emphatic claims have been made for deriving modern linguistic structure from the pre-existing structure of mental representations.

`Events, Agents, Themes and Goals ... already formed part of the primate inventory of ``Things that there are in the world'' '. (Bickerton, 1998:351)

A creature's knowledge of events, agents, themes and goals belongs to what Bickerton, and linguists more generally, call a `theta-analysis component'. Descriptive linguists conceive of this component as an integral part of the system mapping meanings to their external expression in modern human languages. For Bickerton, this component also pre-existed the emergence of language, and the move from protolanguage to syntactic language came about as a result of new cerebral connections being established between this theta-analysis component and the mental apparatus representing the phonetic structure of words (which also already existed, as part of protolanguage).

`The creation of such connections would have enabled information to pass through the theta-analysis component before it reached phonetic representation. Information passing through this area would have been automatically sorted into units consisting of an action and its participants obligatorily represented -- exactly those clausal units that constitute the basic units of syntax.' (ibid:352)

` ... the linkage of theta-analysis with other elements involved in protolanguage would not merely have put in place the basic structure of syntax, but would also have led directly to a cascade of consequences that would, in one rapid and continuous sequence, have transformed protolanguage into language substantially as we know it today.' (ibid:353)

Two claims are made in these quotations. Firstly it is claimed that the pre-existing theta-analysis representations were clear enough to give rise unambiguously to clausal structures. The second claim is the sweeping one that once clausal structure had emerged, all the rest of what we now see as syntactic structure followed automatically. The second claim has been weakened in a recent publication:

`in the millenia that followed the birth of syntax [ie. the appearance of compulsory argument structure], our ancestors must have been competing with one another to produce devices that would make that syntax more readily parsable, hence easier to understand automatically' (Calvin & Bickerton 2000:146)

Although the more recent account still leans heavily on non-linguistic predicate-argument structures to yield basic syntax, it does invoke some extra factors, such as a hierarchy of roles, a `procedure for joining [hence linearizing] meaningful units' and `a process of binary attachment'. (Calvin & Bickerton 2000:215, 218-19) There is no space here for a full discussion of these hypothesized additional factors, but note that it remains to be explained how they originated and how much of modern syntactic structure they account for, anyway. I will argue that predicate-argument structure alone is not enough to guarantee the evolution of all the syntactic complexity of modern developed languages. In order to do that, we need to look at the relationship between internal representation and linguistic expression.


4.1 Nonlinguistic representation and lexical subcategorization

Let us grant, for the sake of this argument, that a when a nonlinguistic creature witnesses an event, it can form a rich mental representation of this event or state of affairs in terms of categories such as Agent, Action, and Patient, roughly `who did what to whom'. In particular, let us grant that the action or state involved is clearly categorized as some kind of mental predicate, though not, at this prelinguistic stage, a predicate associated with any external word or sign. Given this much, it is for Bickerton, as seen above, a simple step to basic clause structure, apparently nothing more than clothing the internal predicate and its arguments, `units consisting of an action and its participants obligatorily represented', in phonetic form.

Syntactic theory takes the information about the obligatory arguments of a predicate as the basis for wellformedness at all levels of linguistic structure6. But the step from internal prelinguistic representation to lexical entry is not straightforward. Conceptual representation does not fully determine lexical subcategorization. I will argue this with a number of examples.

One of the most frequently cited facts about the subcategorization of verbs in English is that the verb put obligatorily takes a direct object and a locative phrase, as shown by these examples:

Morgan puts the ball on the ground

*Morgan puts the ball

In many contexts, the verb place is a synonym of put, so that Morgan places the ball on the ground is used to describe exactly the same event as the first sentence above. But it is common to hear, in rugby commentaries, sentences such as Morgan places the ball. In such cases, it is always understood that the player places the ball on the ground, but the locative phrase can be omitted. Here we have the same event, which must be represented by the same internal mental predicate, getting externalized in two different verbs, which have different obligatory arguments.

As another example, take the case of the verbs rob and steal. A sentence with rob always entails some corresponding sentence with steal, and vice versa. In this sense, they are synonyms. A robbing event is always a stealing event, and vice versa. If a prelinguistic creature mentally represents the event of X stealing food from Y, it also necessarily and simultaneously represents the event of X robbing Y of food. Yet rob and steal have different lexical subcategorizations. Rob takes an obligatory argument referring to the person from whom something is illegally taken; an argument referring to this person is not obligatory with steal. Conceptual representation does not fully determine lexical subcategorization.

Another example is the pair win and beat. I once saw two young children racing each other on a beach. The one who reached the goal first shouted triumphantly *I beat!, where he should have said I won! or I beat you!. The child had a mental representation of a winning/beating event, involving a competition with another person. But he chose the wrong verb, or omitted the obligatory argument of the verb which he did choose. The fact that one of these verbs requires an obligatory object, where the other does not, is an arbitrary fact about English verbs, and cannot be predicted from the nature of the event itself. If a prelinguistic creature mentally represents an event of X winning in a contest with Y, then it necessarily and simultaneously represents an event of X beating Y in a contest. They are one and the same event. But the different verbs which can be used to describe this event differ in the requirements they impose on syntactic structure.


4.2 Hierarchical (tree) structure

Syntactic theories vary widely in flavour, but all agree that sentences have hierarchical `tree' structures, of which that diagrammed in Figure 7 is a commonplace, if somewhat old-fashioned, example (from Culicover, 1976:110).

Figure 7.

Such structures are justified by considerations such as the simplicity of the rule-sets generating them and all other well-formed sentences of the language. Note first the presence of syntactic node labels, such as NP, PN, AUX. These labels, although they can be systematically related, in a many-to-many mapping, to semantic or conceptual categories, are clearly not themselves semantic in nature. A completely nonlinguistic creature envisaging that John will visit Mary (say for the purpose of anticipating John's next move in the competition for Mary as a sexual partner) would have no use for any of these labels.

Note also the English organization, reflected in this tree structure, of tense. Clearly, the modal verb will signals future time, but it is the empty element `Pres' which receives the label TENSE. If the modal verb were absent, as in the sentence John visits Mary, the TENSE node would dominate the -s suffix, which does indicate present time (amongst other things). The simple conclusion is that the signalling in English of past and present time by verbal suffixes, and of future time by a modal verb, shows a mismatch between morphosyntactic structure and any a priori plausible representation of the timing of events. Such quirky mismatches between morphosyntactic structure and conceptual, or logical, structure, are common across languages.

Note next the segmentation of the sentence into higher-level constituents, in particular the `VP' constituent. Most syntacticians are convinced that the structure of clauses in most languages includes a VP constituent, which contains any grammatical objects of the clause, but does not include its subject. The main motivation for this is that it helps to explain how specific verbs have considerable influence on the form of their objects, but little or no influence on the form of their subjects. For instance, all English verbs take a subject: thus no specific rules are required stating which verbs take subjects and which do not. But verbs do fall into different categories with respect to the number and type of objects that they take: for example, English sleep has to be specified as taking no object (i.e. as intransitive), hit is specified as taking one object (i.e. as monotransitive), and give is specified as taking two objects (i.e. as ditransitive). Furthermore, verbs which take whole clauses as their objects differ in the form they allow such clauses to take: compare I want Mary to come with the illformed *I want that Mary will come, as opposed to *I hope Mary to come versus I hope that Mary will come. It seems unlikely that the structure of prelinguistic thought included a VP-like `constituent' which bracketed a 2-place predicate with just one of its arguments, but not the other. Analyses in standard predicate logic of 2-place relations treat both arguments of a predicate as elements at the same level, as in VISIT(john,mary). It seems likely that the asymmetric treatment of arguments in the typical morphosyntax of languages does not reflect any aspect of the prelinguistic structuring of thought, but stems from some consideration arising in the expression of thought in linear form7.

In several respects some pre-existing hierarchical structure of thought is directly reflected in the tree structures assigned to sentences by linguists. In many cases, the trees drawn by linguists over sentences are motivated by semantic considerations, unlike the case of the VP constituent. There are many familiar instances of structural ambiguity, such as those involving such things as attachment of modifiers (as in a list of teachers broken down by age and sex, or old men and women), and combinations of different conjunctions (e.g. John or Mary and Bill). In such cases, syntacticians draw alternative semantically motivated tree structures over expressions, reflecting their different readings.

Such facts cannot be taken to show that pre-existing hierarchically organized conceptual structure gave rise to hierarchical syntactic structure. To argue that, one would have first to establish the logical independence of the two alleged sorts of hierarchical structure (semantic and syntactic) and then point to the parallelism between them. Genuinely syntactic hierarchical structure is discerned on the basis of the constituents that are posited in grammars designed to account for the distribution of words and morphemes in wellformed strings. But in fact, the tree structures drawn over sentences in such cases of ambiguity are in parts motivated only semantically. The tree structures are actually hybrid representations, showing both syntactic information (such as the VP constituent and the syntactic category labels like NP, etc.) and semantic information about the conceptual groupings of the elements of the expressed thought.

Non-semantic evidence for particular hierarchical structures could involve phonological phrasing or distributional facts about particular substrings, as in the case of the arguments for a VP constituent. Very often, phonological phrasing indicates a hierarchical organization of sentences which is at odds with their semantics. For example, if one says a moderately long sentence like This is the cat that caught the mouse that ate the cheese that lay in the house that Jack built, one will typically insert pauses or intonation boundaries after cat, mouse and cheese. But this phrasing cuts across the appropriate semantic analysis of the sentence, in which, for example the cheese that lay in the house that Jack built and the mouse that ate the cheese that lay in the house that Jack built are whole referring expressions, one identifying a particular piece of cheese, and the other identifying a particular mouse.

The aspect in which independently motivated hierarchical syntactic structure seems most closely to align with plausible pre-existing hierarchical semantic structure is in the nesting of subordinate clauses within main clauses (and within `higher' subordinate clauses). Syntax is essentially clause-based. If one were given the task of analyzing a large corpus of sentences, without information about their meanings, it is likely that the analysis would recognize units at the level of the clause, purely on distributional grounds, with hierarchical nesting of subordinate clauses.


4.3 Functional syntactic categories

The specific grammatical markers which identify clause boundaries, and affect the internal form of clauses, are functional elements such as the complementizer that, as in I know that you're here, or the infinitive marker to, as in I want to go. These provide distributional evidence, independent of semantic considerations, for hierarchical phrase structure which often happens to mirror semantically motivated hierarchical structure. However, these indicators of syntactic structure (e.g. that and to) are not themselves plausible elements of pre-existing mental representations. Rather it seems most likely that they exist just because of the dimension-squashing problem thrown up by the expression of thought. An example in an everyday practical context is:

`If we are told that a pipe has burst in the loft and we start talking about how to deal with the burst pipe, it does not seem likely that our reasoning machinery will contain little bits representing ``a'' and ``the''.' (Sampson, 1997:100)

The prevalence of grammatically functional elements (as opposed to content words) is a hallmark of fully grammatical human language. In most languages, no sentence is grammatically complete without at least one grammatical element, signalling the structure of the sentence. It is hard to make a completely rigid distinction between grammatically functional elements and `pure content' words, because many words (e.g. conjunctions, pronouns and prepositions) combine grammatical function with conveying some content. But at a rough estimate, functional elements in English account for about 40% of a typical text. The following is a list of the most frequent word forms from the 100-million-word British National Corpus, accounting for over 40% of the word forms in modern English text.

the, is/was/be/are/'s/were/been/being/'re/'m/am, of, and, a/an, in/inside (preposition), to (infinitive verb marker), have/has/have/'ve/'s/had/having/'d, he/him/his, it/its, I/me/my, to (preposition), they/them/their, not/n't/no (interjection), for, you/your, she/her, with, on, that (conjunction), this/these, that (demonstrative)/those, do/did/does/done/doing, we/us/our, by, at, but (conjunction), 's (possessive), from, as, which, or, will/'ll, said/say/says/saying, would, what, there (existential), if, can, all, who/whose, so (adverb/conjunction), go/went/gone/goes, more, other/another, one (numeral).

For most of these word forms, it seems unlikely that prelinguistic mental representations would contain anything corresponding closely to them. Yet these these word forms are the major workhorses of English grammar. Similar observations apply to any language.


4.4 Movement, Binding, and Agreement

4.4.1 Movement

Many models of grammar postulate `movement' of elements of a grammatical structure from one location in the structure to another. The term `movement' is a metaphor. No physical movement happens, but the metaphor seems useful in describing, for example, how to form question sentences from the corresponding statement sentences or echo-questions in English, for example.

Figure 8.


Stylistic movement also occurs for purposes of topicalization, as in the following sentence where the noun phrases the dog and the cat are moved from their usual postverbal positions to the fronts of their respective clauses:

The DOG we shut in the kitchen, but the CAT we left outside

English makes extensive use of movement rules, but many languages use movement even more productively, effectively achieving by movement many of the effects which English achieves by variations in intonation.

Such phenomena are clearly ways of signalling pragmatic information, such as whether a sentence is intended to elicit information from a hearer, or whether the hearer is assumed to focus attention on one particular referent. With nonlinguistic representations, there is no question of a hearer. Such mental representations are private. Even if it could be conceived that a creature could privately `pose itself a question' or focus its attention on a particular aspect of some mental representation, it is clear that the structural device described by linguists as `movement' could play no part. Movement is not inherent in pre-linguistic representations. It is part of the apparatus for mapping pre-linguistic representations onto strings.

It follows that universal constraints on movement phenomena in languages, such as the Subjacency Principle, which have attracted much theoretical attention in syntax, are also part of the apparatus for mapping pre-linguistic representations onto strings.


4.4.2 Binding

Pronouns are `bound' to their antecedents. This means that they are interpreted as having the same referent. Reflexive pronouns provide an example:

John grooms himself

Here, the pronoun himself obligatorily refers to the same entity in the world as John. The rules for the binding of reflexive pronouns to their antecedents are known to be dependent on details of clause structure; roughly, a reflexive pronoun is bound to a preceding noun phrase only if it occurs in the same simple clause. The antecedence relation cannot violate clause boundaries, as evidenced by the ungrammaticality of *John hopes that himself will win.

A nonlinguistic creature's mental representation of John grooming himself involves only one entity, John. The (English) expression of this event as a string requires repeated mention of John, using himself for the second mention, with a consequent requirement to signal the binding of the second mention to the first.

Non-reflexive pronouns are not bound by such tight rules as reflexives, and this gives rise to the ambiguity characteristic of public language. Usually there is ambiguity over whether such a pronoun refers back to an entity recently mentioned in the discourse or to some other entity known by the hearer to be in the contextual frame. For instance, in John saw Mary look at him the pronoun him might or might not be bound by John. That is, him could refer to John or to someone else. But a nonlinguistic creature's mental representation of either of the possible situations described by this sentence would presumably not be ambiguous, or it would not be a representation of the event.

Pronouns, and the quite elaborate grammatical apparatus determining how they can be bound, are not elements of pre-linguistic mental representations. They are part of the apparatus for mapping nonlinguistic representations onto strings.


4.4.3 Agreement (concord)

Agreement (or concord) phenomena are very common in languages. An English example is:

Figure 9.

In many languages, the agreement rules are very elaborate, involving number, gender and case. Agreement is a purely morphosyntactic phenomenon, and serves the purpose of marking those constituents which are bound together in close grammatical relationships. Such close grammatical relationships often reflect closeness in the conceptual representation, but clearly in the mental representation itself such closeness is inherent and does not stand in need of marking. Agreement is part of the apparatus for mapping pre-linguistic representations onto strings.


5 In conclusion

The fundamental universal structural characteristic known as `duality of patterning', whereby languages are organized at two levels of structure, namely phonology and morphosyntax, has no motivation in a purely representational system, but plausible arguments can be advanced for its communicative adaptiveness. Obviously, all of phonological structure belongs in the communicative aspect of linguistic structure. On the morphosyntactic side of the duality of patterning, the universal distinction between morphology and syntax (however that is drawn) plays no role in non-communicative representation.

Within syntax, some of the main devices that play no role in prelinguistic representation have been surveyed above. Many of the complex structural phenomena that have attracted study, such as case-marking, anaphor-antecedent relationships, switch-reference devices, control by verbal predicates of the interpretation of their complement clauses, transformations of various sorts (e.g. passivization, topicalization, question formation) and the constraints on such processes, play no role in non-communicative representation. Linear ordering of elements, with which much of syntax is concerned, likewise plays no non-communicative role. Also fundamental to syntactic structure are lexical classes somewhat autonomous of semantics, such as Noun, Verb, Adjective and Preposition8; to the extent that such classes are autonomous, they play no role in semantic representation. Other commonly found grammatico-lexical categories, such as grammatical gender (Noun classes), would seem to serve no representational purpose, although they may contribute to the redundancy of utterances, thereby serving a communicative purpose. Grammatical agreement (concord), which is widespread, also clearly plays no purely representational role.

Some aspects of linguistic structure may indeed plausibly be derived from nonlinguistic, representational, structure. These include some (but not all) aspects of hierarchical organization in syntax. But the broad conclusion from the above survey of non-representational aspects of linguistic structure is that attempts to derive linguistic structure, in an evolutionary account, from previously existing cognitive representational structure must fail, for a large slice of linguistic structure. The claim that there was a `cascade of consequences that would, in one rapid and continuous sequence, have transformed protolanguage into language substantially as we know it today' (Bickerton, 1s98:353) is very vague. The idea that linguistic structure derives from pre-existing mental representations is no doubt true for some very basic aspects of linguistic structure. But we should not close the investigation prematurely on the sources of linguistic structure, in all its multifarious richness. It is not at all clear what exact `cascade of consequences' could have led to all the aspects of linguistic structure which I have highlighted in this chapter. Of the few linguists who ponder evolutionary questions, many take the position, not that grammatical structure reflects prior mental representation per se, but that it results in part from the need to interface representational structure with a phonetic output in order to make communication possible. This is the position taken, sketchily, by Newmeyer (1991:6-8), for example. Where explanation by derivation from pre-existing mental structure fails, it may be feasible to seek evolutionary explanations (broadly conceived) for much (though not all) of the typical structure of languages in the demands of communication in the human environment.


6 Key Further Readings


Overall introductions to language structure:

On `Mentalese':


7 References

Bickerton, Derek, 1990, Language and Species, Chicago: University of Chicago Press.

Bickerton, Derek, 1998, `Catastrophic evolution: the case for a single step from protolanguage to full human language'. In Hurford, James R., Michael Studdert-Kennedy and Chris Knight (eds) Approaches to the Evolution of Language: Social and Cognitive Bases, Cambridge: Cambridge University Press. 341-358.

Calvin, William H., and Derek Bickerton, 2000, Lingua ex machina : reconciling Darwin and Chomsky with the human brain, Cambridge, MA: MIT Press.

Chomsky, Noam, 1980, Rules and Representations, Oxford: Basil Blackwell.

Chomsky, Noam, 1981, Lectures on Government and Binding, Dordrecht, The Netherlands: Foris Publications.

Culicover, Peter, 1976, Syntax, New York: Academic Press.

Dowty, David, 1991, `Thematic proto-roles and argument selection', Language,67,3:547-619.

Fodor, Jerry A., 1975, The Language of Thought, New York: Crowell.

Horst, Steven W., 1996, Symbols, Computation and Intentionality: A critique of the computational theory of mind, Berkeley: University of California Press.

Judge, Brenda, 1985, Thinking about things: A philosophical study of representation, Edinburgh: Scottish Academic Press.

Katamba, Francis, 1989, An Introduction to Phonology, London: Longman.

Newmeyer, Frederick, 1991, `Functional explanation in linguistics and the origins of language', Language and Communication, 11:3-28.

Pinker, Steven, and Paul Bloom, 1990, `Natural Language and Natural Selection', Behavioral and Brain Sciences, 13,4:707-727.

Sampson, Geoffrey, 1997, Educating Eve: the `Language Instinct' Debate, London: Cassell.

Schiffer, Stephen, 1989, Remnants of Meaning, Cambridge, MA: MIT Press.

Stich, Stephen, 1983, From Folk Psychology to Cognitive Science, Cambridge, MA: MIT Press.

Ujhelyi, Maria, 1998, `Long-call structure in apes as a possible precursor for language', In Hurford, James R., Michael Studdert-Kennedy and Chris Knight (eds) Approaches to the Evolution of Language: Social and Cognitive Bases, Cambridge: Cambridge University Press. 177-189.

Yule, George, 1985, The Study of Language, Cambridge: Cambridge University Press.



  1. This is not to deny the possible contribution of some limited prelinguistic ability to string units together into longer calls, as documented, for instance, by Ujhelyi (1998).

  2. But this is not crucial, as for example the last living speaker of a dying language can still be said to possess a communication system.

  3. A somewhat technical, though important, point is that Dowty appears to claim that there are two clusters (his proto-Agent and proto-Patient) in a six-dimensional space. Accepting the six dimensions as approximately correct, it remains an empirical question whether the participants in perceived worldly events do tend to fall into two clusters, or whether they are in fact more evenly distributed all over the six-dimensional space. The appearance of just two clusters may be no more than a projection back into the analysis of the mental representation system of a syntactic distinction between subjects and objects; if this is so, `proto-subject' and `proto-object' would be better labels than Dowty's. In any case, the central point made here, about the squashing of information on six dimensions into a linear string, remains.

  4. E.g. Bickerton (1990, 1998).

  5. The discussion here relates to transitive sentences, in which both a subject and an object are present. The fact that in ergative languages the subject of an intransitive clause is assigned the same case as the object of a transitive clause does not affect this argument. What this fact shows is that languages may solve some aspects of the mapping problem differently in intransitive clauses, i.e. where only one participant is involved, who may have various of the Agent or Patient properties.

  6. This is stated in the Projection Principle.
    `Representations at each syntactic level (i.e. LF, and D- and S-structure) are projected from the lexicon, in that they observe the subcategorization properties of lexical items.' Chomsky (1981:29)

  7. Perhaps, as is sometimes suggested, grammatical subjects are fossilized topic constituents, placed at the front of the sentence for communicative salience.

  8. As is well known, some nouns denote actions, rather than objects (e.g. action, assassination. The same basic concept can sometimes be expressed by both a verb and an adjective (e.g. fear/afraid, like/fond). In many languages there are no adjectives, and concepts expressed in English by adjectives are expressed by verbs.