In Search of Inflection

Joanna Moy and Suresh Manandhar

Department of Computer Science, University of York, UK

joanna@cs.york.ac.uk

Previous work (Moy and Manandhar, 2003) describes an attempt to
demonstrate the emergence of case in a population of minimally
equipped learning agents, based on Kirby's "Iterated Learning Model"
(Kirby, 2000). The emergence of grammars with a primitive form of case
was demonstrated: separate noun categories to express subject and
object of a sentence.  However, category types are not strongly
restricted to a single syntactic role, nor are they inflectional
i.e. "subject" and "object" forms of a particular noun can not be
broken down into common stem plus affix.  These limitations were
deemed to be due to the details of the model employed.

The current work is an attempt to address these limitations.  The
semantic representation used for utterances made by agents in the
original simulation is a vector in which thematic role is implied by
position, for example [loves, john, mary] indicates that the predicate
(in the first position) is "loves", the agent (in the second position)
is "john", and the patient (in the third) is "mary". However, the
parts of speech produced by making generalizations between utterances
are independent of position, and thus independent of thematic role.
If a rule is created indicating that the string "j,o,h,n" has the
meaning "john", it does not specify whether this string represents
agent or patient in the utterance in which it was observed. Thus noun
categories cannot effectively be restricted to expressing a single
thematic role, which might prove a disadvantage in attempts to
simulate the emergence of a proper case system.

The semantic representation was therefore modified to give it a
nested structure, in which each element specifies explicitly both its
thematic role and its value, so that the vector [loves,john,mary]
becomes [[pred,loves],[agt,john],[pat,mary]].  The system was modified
to handle these nested structures, and to be able to make
generalizations between parts of speech as well as between
sentences. Thus, once a substring meaning [agent,john] and another meaning
[agent,pete] have been induced, any similarity between the two can be
attributed to the morpheme specifying that the noun is an agent.

However, the original model induces grammars from its input by making
generalisations on the minimal differences between strings.
Therefore, if presented with two strings "johnlovesmary" and
"johnloveskate" the minimal difference between the two strings, the
substrings "mary" and "kate", is attributed to difference in meaning.
This poses problems for languages which incorporate inflectional
affixes indicating case: these inflections will be the same in every
sentence, and thus will not be noted. We will describe current work to
investigate whether the emergence of inflectional affixes can be
encouraged if the inducer is rewritten to look at minimal
~similarities~ between strings (rather than differences), which will
allow inflections to be captured when a noun is learnt, and used in
conjunction with the new nested semantic representation described
above.

References:

Joanna Moy and Suresh Manandhar. Modelling the Emergence of
Case. Language Evolution and Computation Workshop, 15th European
Summer School in Logic Language and Information, 2003.

Simon Kirby.  Syntax without natural selection: How compositionality
emerges from vocabulary in a population of learners. In Chris Knight,
Michael Studdert-Kennedy, and James Hurford, editors, The Evolutionary
Emergence of Language: Social Function and the Origins of Linguistic
Form.  Cambridge Univerisity Press, 2000.