Random Boolean Nets and Features of Language

Random Boolean Nets and Features of Language

James R Hurford,
Language Evolution and Computation Research Unit,
Linguistics Department, University of Edinburgh


(A version of this appeared in Special Issue of IEEE Transactions on Evolutionary Computation on Evolutionary Computation and Cognitive Science, pp.111-116.
Note: This HTML version may differ slightly from the printed version; the printed version is the `authorized' version.)

Abstract

This paper describes an attempt to cast several essential, quite abstract, properties of natural languages within the framework of Kauffman's random Boolean nets. These properties are: complexity, interconnectedness, stability, diversity, and underdeterminedness. Specifically, in the research reported here, a language is modelled as an attractor of a Boolean net. (Groups of) nodes in the net might be thought of as linguistic principles or parameters as posited by Chomskyan theory of the 1980s. According to this theory, the task of the language learner is to set parameters to appropriate values on the basis of very limited experience of the language in use. The setting of one parameter can have a complex effect on the settings of other parameters. A random Boolean net is generated and run to find an attractor. A state from this attractor is degraded, to represent the degenerate input of language to the language learner, and this degraded state is then input to a net with the same connectivity and activation functions as the original net, to see whether it converges on the same attractor as the original. In practice, many nets fail to converge on the original attractor, and degenerate into attractors representing complete uncertainty. Other nets settle at intermediate levels of uncertainty. And some nets manage to overcome the incompleteness of input and converge on attractors identical to that from which the original inputs were (de)generated. Finally, an attempt was made to select a population of such successful nets, using a genetic algorithm, where fitness was correlated with an ability to acquire several different languages faithfully. It has so far proved impossible to breed such successful nets, lending some plausibility to the Chomskyan suggestion that the human language acquisition capacity is not the outcome of natural selection.
This paper puts together two sets of ideas that have not until now kept close company. One set of ideas is the Chomskyan metatheory of Universal Grammar (UG) and language acquisition, as developed in numerous publications over the past 40 years, from Chomsky (1955) through Chomsky (1981) to Chomsky (1986). The other set of ideas is the theory of complexity, and specifically the theory of random Boolean nets, as developed by Kauffman (1993, 1995).

The advantage of putting these two sets of ideas together is that it relates the Chomskyan picture to a model whose properties are somewhat well understood, and which, moreover, is simple enough and well specified enough to lend itself to computational implementation. No empirical claims are made, but the hope is that readers who have previously understood one, but not the other, of the two sets of ideas juxtaposed here will be able to see the formerly unfamiliar framework (whichever one it is) in a new and illuminating light.

In the first two sections, the two sets of ideas are sketched separately; the third section gives an interpretation of random Boolean nets in terms of knowledge, acquisition, and transmission of language.

What aspects of language to model?

Consider the following six striking (yet uncontroversial) features of natural languages.

We will briefly discuss each of these features in turn.

Complexity. Each human language is an extremely complex system. No complete grammar of any language has ever been written. One of the largest grammars of English, (Quirk et al., 1972), which is 1102 pages long, is still incomplete in detail.

Linguists are still constantly finding subtle and complex patterns of behaviour/judgement, even in English, the most-studied language, thatstubbornly resist encapsulation in any theoretical framework. This is the stuff of syntacticians' journal articles.

Interconnectedness. The idea that ``Une langue est un systeme ou tout se tient'' [``A language is a system in which everything holds onto everything else''] is such a hoary truism that its origins are lost in the mists of linguistic historiography1.

The interconnectedness of facts in a language shows itself in many ways. From opposite ends of the theoretical linguistic spectrum, both Greenbergian conditional universals and Chomskyan parameters exemplify this interconnectness.

Note the Boolean nature of the more complex examples here.

Fidelity of transmission. The English of 100 years ago is still intelligible now. In a community not subject to social upheaval, the differences between the language of one generation and the next are minimal.

Stability. An individual's language behaviour and linguistic judgements vary only slightly over time (again, in a community not subject to social upheaval).

Diversity. There are about 6000 different languages in the world, mutually unintelligible. Putting aside vocabulary differences, probably no two languages have exactly the same grammatical system. Chomsky suggested that the number of grammatically distinct possible languages is finite. ``When the parameters of UG are fixed, a core grammar is determined, one among finitely many possibilities, lexicon apart.'' (Chomsky, 1981:137)

Learnability from incomplete data. A newborn child can learn any language perfectly. The well-known `Poverty of Stimulus' argument states that the knowledge of language (in the form of solid intuitions of wellformedness) possessed by adults is underdetermined by the examples to which they were exposed as children. The question provoked by this has been referred to as `Plato's Problem', and expressed as ``how can we know so much on the basis of so little experience?''

The data to which the language-acquiring child is exposed are susceptible to infinitely many generalizations, most of which our linguistic intuitions immediately dismiss as far-fetched. Indeed it is precisely the solidity of many of these intuitions that seems to prevent some students from seeing the point of the poverty of stimulus argument and its implication for the innateness of certain general facts of grammar. A typical example involves the formation of interrogative sentences in English, in which the first auxiliary verb of the main clause is `moved' around to the front of the subject of the sentence, as in:

The fact that it is raining should deter us.
Should the fact that it is raining deter us?

But not

*Is the fact that it raining should deter us

In the ungrammatical example, the first auxiliary verb in the string, is, has been moved incorrectly to the front of the sentence. We know that this is wrong; but how do we know? It cannot be just from example, because most of the examples that we hear are equally compatible with the hypothesis that interrogatives are formed by moving the first auxiliary verb in the string.

Summaries of the Poverty of Stimulus argument can be found in many introductory texts on syntactic theory (e.g. Cook and Newson, 1996:81-85); there are many more advanced discussions of it in the literature (e.g. Chomsky, 1980; Crain, 1991; Garfield, 1994; Wexler, 1991). For counterargument, see Sampson (1997:38-45).

Random Boolean Networks

A Boolean network can be described in terms of its nodes, connections, activation functions, and states We will introduce these briefly in turn.

Nodes: The number of nodes in a net is expressed conventionally by the variable N. (N = 10, ... , 1000, ... 1000000, ... ) Nodes are set to bit values --- {0,1}. Each node is assigned a (random) Boolean activation function (see below).

Connections (unidirectional) between nodes: Each node takes input from some specified number of other nodes. The number of connections leading into a node is conventionally expressed by the variable K. In a net, all nodes may take the same number of inputs, (K = 2, 3, 4, ... ), or, a possibility less often explored, varying numbers of inputs.

Activation functions: Nodes are activated by Boolean functions of the values of the nodes inputting to them. For a node with K inputs, there are 2(2K) possible Boolean activation functions. For example with 1 input, possible functions are `FALSE' (00), `COPY' (01), `NEGATE' (10), or `TRUE' (11), as shown in the table below:
Input `FALSE' `COPY' `NEGATE' `TRUE'
0 0 0 1 1
1 0 1 0 1

States: The state of a network at a given time is the set of its node-settings. There are 2N possible states of a net.

Boolean networks are dynamic. They are set in motion by the following steps.

Boolean networks have attractors. The set of states around which a net cycles repeatedly is an attractor. A given net may have many different attractors, depending on its initial state. An attractor is also called a ``limit cycle''.

Boolean networks have basins of attraction. The set of states from which a net will always end up in a particular attractor is that attractor's basin of attraction. An attractor is a subset of its basin of attraction (typically a proper subset).

Here are some simple examples, where N = 20, and K = 2, with connections and activation functions chosen randomly:

STEP
1[0 1 0 0 0 1 0 0 0 1 0 0 1 0 1 1 0 1 0 1]
2[0 1 0 0 0 0 1 0 1 1 0 0 0 1 1 0 0 1 0 1]
3 [0 0 0 0 0 1 1 0 1 1 1 1 1 0 1 0 0 1 0 1]
4 [0 1 1 0 0 1 0 0 1 1 1 1 1 0 1 0 1 1 0 1]
5 [0 1 0 0 0 0 0 0 1 1 1 0 0 0 1 0 1 1 0 1]
6[0 1 0 0 0 0 0 0 1 1 1 0 0 0 1 0 1 1 0 1]
Here, from the initial random state at step 1, the net has converged on a one-state attractor at step 5: the state at step 5 and all subsequent steps is:
[0 1 0 0 0 0 0 0 1 1 1 0 0 0 1 0 1 1 0 1]

STEP
1 [1 1 0 0 1 1 1 0 1 0 0 0 1 0 1 1 0 0 0 1]
2 [0 0 1 0 0 1 0 1 1 1 1 1 1 0 1 1 0 1 0 0]
3 [1 1 0 0 1 1 1 0 0 0 0 1 1 0 1 0 0 1 1 1]
4 [0 1 1 0 0 1 0 1 0 1 1 1 1 0 1 1 1 1 0 0]
5 [1 1 0 0 1 1 1 0 1 0 0 1 1 0 1 0 0 1 1 0]
6 [0 1 1 0 0 1 0 1 0 1 1 1 1 0 1 1 1 1 0 0]
Here, from the initial random state at step 1, the net has converged on a two-state attractor: the state at step 6 is the same as at state 4. The net will oscillate between the following two states forever.
[0 1 1 0 0 1 0 1 0 1 1 1 1 0 1 1 1 1 0 0]
[1 1 0 0 1 1 1 0 1 0 0 1 1 0 1 0 0 1 1 0]

General Properties of RBNs.

Interpretation of Nets as Language Knowledge and Language Acquisition

Having set out some basic, though striking, properties of natural languages and outlined the workings of random Boolean nets, I will now suggest a way of interpreting such nets in terms of the linguistic properties mentioned in the first section.

Modifying networks to model acquisition from incomplete primary linguistic data. The language learner is not exposed to examples of all the features of a language, but neverthless acquires them, as mentioned above in connection with the Poverty of Stimulus. We model this as follows:

Applying these ideas, we can give some examples of learning from incomplete data. In the examples below, again N = 20 and K = 2, with randomly chosen connectivity and activation functions. The initial random state contains a number of `?'s, indicating that the learner has experienced no data that would lead to settings of those nodes. We give first an example of unsuccessful learning, in which the net reaches an attractor that still has some nodes set to `?'.

(UNSUCCESSFUL LEARNING)

STEP
0 [0 1 1 0 0 0 0 1 0 1 ? ? 1 ? ? 1 ? ? 0 1]
1 [0 1 1 0 ? 0 ? ? ? ? ? ? 1 0 1 0 0 0 0 0]
2 [? 1 1 0 ? 0 ? ? ? 1 ? 1 1 ? 1 ? 0 ? 1 0]
3 [? 1 ? 0 ? 0 ? ? 0 1 ? 1 1 ? 1 ? 0 ? 1 1]
4 [0 ? ? 0 ? 0 ? ? 0 1 ? 1 1 ? 1 ? 0 0 ? 1]
5 [0 ? 1 0 ? 0 ? ? 0 1 ? 1 ? ? 1 ? ? 0 ? ?]
6 [0 1 1 0 ? 0 ? ? ? ? ? 1 1 ? 1 ? ? ? ? ?]
7 [? 1 1 0 ? 0 ? ? 0 ? ? 1 1 ? 1 ? 0 ? ? ?]
8 [0 1 ? 0 ? 0 ? ? 0 1 ? 1 1 ? 1 ? 0 0 ? ?]
9 [0 ? 1 0 ? 0 ? ? 0 1 ? 1 1 ? 1 ? 0 0 ? ?]
10 [0 1 1 0 ? 0 ? ? 0 1 ? 1 1 ? 1 ? ? 0 ? ?]
11 [0 1 1 0 ? 0 ? ? 0 ? ? 1 1 ? 1 ? 0 0 ? ?]
12 [0 1 1 0 ? 0 ? ? 0 1 ? 1 1 ? 1 ? 0 0 ? ?]
13 [0 1 1 0 ? 0 ? ? 0 1 ? 1 1 ? 1 ? 0 0 ? ?]
Here, the acquired one-state attractor (step 12 and subsequent steps), with remaining gaps in knowledge, is:
[0 1 1 0 ? 0 ? ? 0 1 ? 1 1 ? 1 ? 0 0 ? ?]
It is possible, however, for a net starting in a state with some gaps in its knowledge (`?'s), to arrive at an attractor state from which all gaps have been eliminated. Here is an example (again N = 20 and K = 2, with randomly chosen connectivity and activation functions):

(SUCCESSFUL LEARNING)

STEP
0 [0 0 1 1 ? 0 0 ? ? 0 0 0 0 ? 0 1 ? ? 1 ?]
1 [0 0 ? 1 1 0 ? 1 ? 0 0 0 0 1 0 ? 0 0 ? ?]
2 [0 0 1 1 1 0 0 ? 0 ? 0 0 0 1 0 1 0 0 1 ?]
3 [0 0 1 1 1 0 0 1 0 0 0 0 0 1 0 1 ? 0 1 1]
4 [0 0 1 1 1 0 0 1 0 0 0 0 0 1 0 1 0 0 1 1]
5 [0 0 1 1 1 0 0 1 0 0 0 0 0 1 0 1 0 0 1 1]
Here, the acquired one-state attractor is:
[0 0 1 1 1 0 0 1 0 0 0 0 0 1 0 1 0 0 1 1]

Modelling adult-to-child language transmission. With these ideas in place, it is possible to explore the further application of the RBN model to language, in particular to the transmission of languages across generations in a community. This is done in a quite idealized way here, as if the only input to a child learner is from a single adult. At the level of abstraction at which we are working here, this seems unlikely to be a harmful idealization. We go through the following steps:

Variable parameters of the model. For the purpose of experimenting with this model, certain parameters can be set to alternative values.

A human-like net? It is possible to hand-tailor a net in such a way that the `adult' net gravitates to a wide variety of different attractors, and the `child' net reliably manages to gravitate to the same attractor as the adult, after initialization with incomplete data from the adult attractor. I give an example below.

A schematic diagram might make this clearer.
Fig. 1. Hand-made net with 15 self-connected nodes and 185 nodes taking two inputs each from nodes chosen randomly from the 15 self-connecting nodes.

This net was run 1000 times, from random initial states. In the child's PLD (the initial state), values of the 15 specified nodes were copied from a state in the adult attractor, and all 185 other nodes set to `?'. The results were as follows:

984 distinct adult attractors
984 distinct PLDs
984 distinct acquired attractors
1000 faithful acquisitions
Results with K = 3, 4, 5, 6 were essentially the same. This models substantial diversity of learnable languages and fidelity of transmission between generations.

But the method of degrading the input data used above is implausibly rigid. Another experiment was carried out, using probabilistic PLD-production. Here, the nodes were deemed to be rank-ordered by frequency; thus node 1 was deemed the most frequent, and node 200 the least frequent in use. Then the probability of a node in the child's initial state being set to 1 or 0 (as opposed to being left as a `?') was an inverse function of its frequency ranking. The probability of a node being set to 1 or 0 is given in the following graph:


Fig.2. Probabilistically distributed primary linguistic data. The probability with which nodes in the learner's trigger input are set to 1 is very high for a few nodes, and declines for the rest.

With this probabilistic input, the following results were obtained:

200 distinct adult attractors
200 distinct PLDs
200 distinct acquired attractors
182 faithful acquisitions
This achieves slightly better diversity of learnable languages, and slightly worse fidelity of transmission between generations.

Can such ``good'' nets be bred? It has been shown here that a Boolean net can be constructed by hand which, given the interpretation proposed, approximates reasonably well to a human language acquirer, in respect of the range of learnable languages, the fidelity of learning, the stability of the acquired state, and so forth. Chomsky's position on the human language capacity is that it is biologically given, and yet unlikely to have been specifically selected for by natural selection. I have briefly tested these ideas by trying to `breed' a particular net specification (in terms of connectivity and activation functions) with an evolutionary algorithm. In this algorithm, a net is specified in terms of a list of `genetic' loci, each allocated an allele from an available range. For instance, the alleles at one locus might code for the nodes from which node 17 receives its input; the alleles at another locus would code for the activation function of node 17. Each net in a population of nets is specified by a complete genome. The population size varied from 25 nets to 100, depending on the experiment. In the experiments, a heterogeneous population of nets with initially random connectivity and activation functions was evaluated according to a fitness function that rewarded for ability to acquire several different languages faithfully. Each generation, the more successful nets according to this fitness function were bred, and some mutation of the connectivity and activation functions took place. Selection was by tournament selection among groups of four. So far, the results have been negative. It has not been possible to `breed' a random Boolean net that performs as well, in terms of diversity of learnable languages and fidelity of intergenerational transmission, as the hand-fixed net described above. This may result from Boolean nets inhabiting a rugged fitness landscape, in which adaptation is unlikely. And it may possibly, upon further investigation, tend to confirm the Chomskyan view that the human capacity for language acquisition is not the result of natural selection. No such conclusion can yet be firmly drawn, however, because this paper has not exhausted all possible ways of identifying a dynamical system with the language organ, or of successfully creating such a system through an evolutionary process.

Endnote

  1. See Koerner's 18-page chapter (Koerner, 1999:183-202), discussing various attributions, including to de Saussure and Meillet, but not managing to track down a convincingly original source of this bon mot.
  2. L.A.D. stands for `Language Acquisition Device', an innate `mental organ' posited by Chomsky to explain the feat of language acquisition.

References

Chomsky,
Noam, 1955, The Logical Structure of Linguistic Theory, mimeographed, Harvard; 1975, Plenum, New York.

Chomsky,
Noam, 1980, Rules and Representations, Columbia University Press, New York.

Chomsky,
Noam, 1981, Lectures on Government and Binding, Foris Publications, Dordrecht, Holland.

Chomsky,
Noam, 1986, Barriers, MIT Press, Cambridge, Mass.

Cook,
Vivian, and Mark Newson, 1996, Chomsky's Universal Grammar: an introduction (second edition), Blackwell Publishers, Oxford.

Crain,
Stephen, 1991, ``Language acquisition in the absence of experience'', Behavioral and Brain Sciences, 14.

Garfield,
J.L., 1994, ``Innateness'', In S. Guttenplan (ed.) A Companion to the Philosophy of Mind, Blackwell Publishers, Oxford.

Kauffman,
Stuart A, 1993, The Origins of Order : self organization and selection in evolution, Oxford University Press, Oxford.

Kauffman,
Stuart A, 1995, At Home in the Universe: the search for laws of self-organization and complexity, Oxford University Press, Oxford.

Koerner,
E.F.K., 1999, Linguistic Historiography: Projects and Prospects, Studies in the History of the Language Sciences, Volume 92, John Benjamins, Amsterdam.

Quirk,
Randolph, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik, 1972, A Grammar of contemporary English, Longman, London.

Sampson,
Geoffrey, 1997, Educating Eve: The `Language Instinct' debate, Cassell, London.

Wexler,
Kenneth, 1991, ``On the argument from the poverty of the stimulus'', In A Kasher (Ed.) The Chomskyan Turn, Blackwell Publishers, Oxford.