In Search of Inflection Joanna Moy and Suresh Manandhar Department of Computer Science, University of York, UK joanna@cs.york.ac.uk Previous work (Moy and Manandhar, 2003) describes an attempt to demonstrate the emergence of case in a population of minimally equipped learning agents, based on Kirby's "Iterated Learning Model" (Kirby, 2000). The emergence of grammars with a primitive form of case was demonstrated: separate noun categories to express subject and object of a sentence. However, category types are not strongly restricted to a single syntactic role, nor are they inflectional i.e. "subject" and "object" forms of a particular noun can not be broken down into common stem plus affix. These limitations were deemed to be due to the details of the model employed. The current work is an attempt to address these limitations. The semantic representation used for utterances made by agents in the original simulation is a vector in which thematic role is implied by position, for example [loves, john, mary] indicates that the predicate (in the first position) is "loves", the agent (in the second position) is "john", and the patient (in the third) is "mary". However, the parts of speech produced by making generalizations between utterances are independent of position, and thus independent of thematic role. If a rule is created indicating that the string "j,o,h,n" has the meaning "john", it does not specify whether this string represents agent or patient in the utterance in which it was observed. Thus noun categories cannot effectively be restricted to expressing a single thematic role, which might prove a disadvantage in attempts to simulate the emergence of a proper case system. The semantic representation was therefore modified to give it a nested structure, in which each element specifies explicitly both its thematic role and its value, so that the vector [loves,john,mary] becomes [[pred,loves],[agt,john],[pat,mary]]. The system was modified to handle these nested structures, and to be able to make generalizations between parts of speech as well as between sentences. Thus, once a substring meaning [agent,john] and another meaning [agent,pete] have been induced, any similarity between the two can be attributed to the morpheme specifying that the noun is an agent. However, the original model induces grammars from its input by making generalisations on the minimal differences between strings. Therefore, if presented with two strings "johnlovesmary" and "johnloveskate" the minimal difference between the two strings, the substrings "mary" and "kate", is attributed to difference in meaning. This poses problems for languages which incorporate inflectional affixes indicating case: these inflections will be the same in every sentence, and thus will not be noted. We will describe current work to investigate whether the emergence of inflectional affixes can be encouraged if the inducer is rewritten to look at minimal ~similarities~ between strings (rather than differences), which will allow inflections to be captured when a noun is learnt, and used in conjunction with the new nested semantic representation described above. References: Joanna Moy and Suresh Manandhar. Modelling the Emergence of Case. Language Evolution and Computation Workshop, 15th European Summer School in Logic Language and Information, 2003. Simon Kirby. Syntax without natural selection: How compositionality emerges from vocabulary in a population of learners. In Chris Knight, Michael Studdert-Kennedy, and James Hurford, editors, The Evolutionary Emergence of Language: Social Function and the Origins of Linguistic Form. Cambridge Univerisity Press, 2000.