A Commentary on the BBS target article "The neural
basis of predicate-argument structure" by James R. Hurford
Word
Counts:
Abstract:
61
Total: 1413
Michael
A. Arbib
Computer
Science Department, Neuroscience Program, and USC Brain Project
Hurford argues
that propositions of the form PREDICATE(x) represent conceptual structures
which preexist language and which can be explicated in
terms of neural structure. I disagree, arguing that such predicates are descriptions
of limited aspects of brain function, not available as representations in
the brain to be exploited in the frog or monkey brain and turned into language
in the human.
A numbered paragraph is
based on the corresponding Section of the target article; unnumbered paragraphs
convey my comments.
1.2.
The basic ontological elements are whole events or situations and the
participants of these events. The event described by A man
bites a dog could be represented as
$e, x, y, bite(e),
man(x), dog(y), agent(x), patient(y) (1)
I don't think this works.
We need to replace agent x by agent(x, e) to indicate in which
event x plays the stipulated role; similarly for y. For Hurford,
the discussion of episodes is an aside to his concentration on 1-place
predicates, but I suggest that the crux for a prelinguistic
representation is the event and the "action-object frame" A(x,y) - agent x is doing A to object y - and its variations. Rizzolatti
and Arbib (1998) examined
whether or not
a ‘prelinguistic grammar’ can be assigned to the
control and observation of actions. If this is so, the notion that evolution
could yield a language system ‘atop’ of the action system becomes much more
plausible. (p.191)
This talk of a ‘prelinguistic grammar’ was not meant to imply that gestures
may be a primitive form of grammar for our approach was semantic rather than
syntactic:
… we
might say that the firing of ‘mirror’ F5 neurons is part of the code for a
declarative case structure, for example,
Declaration: grasp-A(Luigi, raisin)
which is a
special case of grasp-A(agent, object), where grasp-A is a specific kind of
grasp, applied to the raisin (the object) by Luigi (the agent). … this is an ‘action description’, not a linguistic
representation. (p.192. Italics added.)
Being able to grasp a
raisin is different from being able to say “I am grasping a raisin”, and the
neural mechanisms that underlie the doing and the saying are different.
However, the case structure lets us see a commonality in the underlying
representations, thus helping us understand how a mirror system for grasping
might provide an evolutionary core for the development of brain mechanisms that
support language.
2.
Representations of the form PREDICATE(x) are taken to stand for the mental
events involved when a human attends to an object in the world and classifies
it perceptually as satisfying the predicate in question.
More specifically, the
notion is that a person may attend to a limited number of objects, and x then
stands as an index for one of those objects. Thus a scene might be represented
by a conjunction
P1(X1) & P2(X2)
& P3(X3) & P4(X4) (2)
where each Xj indexes some region of the scene and Pj(Xj) indicates that the object at that location possesses
property Pj. There leads to another point which (I
think) weakens Hurford’s critique of Rizzolatti and Arbib (1998):
4. An
example of a scene-description might be
APE(x)
& STICK(y) & MOUND(z) & HOLE(w) & IN(w,z) & PUT(x,y,w) (3)
translating to An
ape puts a stick into a hole in a mound.
The inclusion of PUT(x,y,w) in (3) reinforces the point that Hurford’s
focus on unary predicates does not do justice to describing animals which
perceive to act, with acts dependent on relations between objects. The key
question remains: “How do we go from predicates which we may use to describe
internal behavior to neural representations which themselves abstract from the
activity levels and parameterizations of schemas and their underlying neural
networks, and instead provide abstractions which may in turn be refined to
yield the cognitive and semantic forms which drive the production and
perception of the phonological forms of language?”
In discussing the
possible neural basis of (2), Hurford (§4) cites
papers from 1984 onward. However, I would claim some priority in this area with
the slide-box metaphor (Didday & Arbib, 1971; Arbib, 1972): In the days before computer graphics, movie
cartoons were drawn using cels, which I there
called slides. Since the cartoon might run for
seconds without the background changing, one may draw this background just
once. In the middle ground, there might be a tree about which nothing changes
for a while except its position relative to the background. It could thus be
drawn on a separate slide and repositioned as needed. In the foreground, key
details might change for each frame. The slides could then be photographed
appropriately positioned in a slide-box for each frame, with only a few
parameter changes (including minimal redrawing) required between successive
frames. The slide-box metaphor suggested a similar strategy might be used in
the brain, with long-term memory (LTM) corresponding to a "slide
file" and working or short-term memory (STM) corresponding to the
"slide-box." The act of perception was compared to using sensory
information to update slides already in the slide-box and to retrieve other
slides as appropriate, experimenting to determine whether a newly retrieved
slide fits sensory input "better" than one currently in the slide-box
which, in the brain, corresponds to a mass of neural tissue linking sensory and
motor systems. A crucial point was that retrieval of a slide provided access to
a wealth of information about the object it represented, including appropriate
courses of action.
I cite this background
to stress that (3) is a pale approximation of the slide-box metaphor, which is
in turn a pale approximation to the multi-level modeling methodology that
unifies the functional schemas of schema theory (Arbib,
1981; Arbib et al. 1998) with the dynamics of
detailed neural networks. For example, one schema in the visuomotor system of a frog (Arbib,
1987) might correspond to a pattern of neural activity signaling the likelihood
of a small moving object in a region x1 of the visual field, another schema
might signal the likelihood that a large moving object is moving with velocity
v in region x2, while a third might indicate the likelihood that a barrier of
extent w is located around region x3. Thus, rather than being predicates that
return 0 or 1, they are functions or likelihood distributions over a
multi-dimensional parameter space. Moreover, the frog’s actual course of action
(the “choice” of motor schema to guide action, and the setting of control
parameters for that action) cannot be directly inferred from these schemas, but
rather depends on the interplay of the activity of their neural instantiations
as they play upon the brainstem, determining whether the frog will snap at its
apparent predator, jump to escape an apparent predator (modulating its
direction of escape on the perceived trajectory of the predator), and whether
or not it will attempt to detour around a barrier in doing so.
In summary, (2) is a
fine answer to the question “What objects does the animal see, and where does
it see them?”, and Hurford provides an interesting
analysis of relevant neural data. Moreover, I think it useful to debate whether
representations in the ventral stream are "more prelinguistic"
than those in the dorsal stream. But I answer the question of my title
“Predicates: External Description or Neural Reality?” by saying that the predicates
like (2) are, in general, our external descriptions, not the animal's neural
reality. It is a highly evolved skill of humans to be able to name an indicated
object, and I suggest that PREDICATE(x) is best seen as a description of human
naming behavior, rather than as a conceptual structure preexisting language
that is part of the causality of neural circuits.
Arbib, M. A.
(1972) The Metaphorical Brain: An
Introduction to Cybernetics as Artificial Intelligence and Brain Theory,
Wiley-Interscience:
Arbib, M.A.
(1981) Perceptual Structures and Distributed Motor Control, in Handbook of
Physiology, Section 2: The Nervous System, Vol. II, Motor Control, Part 1
(V. B. Brooks, Ed.), American Physiological Society, pp.1449-1480.
Arbib, M.A.
(1987) Levels of Modeling of Visually Guided Behavior (with peer commentary and
author's response), Behavioral and Brain Sciences, 10:407-465.
Arbib, M.A.,
Érdi, P. and Szentágothai,
J. (1998) Neural Organization: Structure, Function, and Dynamics,
Didday, R.L.,
and Arbib, M.A. (1971) The
Organization of Action-Oriented Memory for a Perceiving System I. The Basic
Model, Journal of Cybernetics, 1:3-18.
index model.
Cognition 32:65-97.
Rizzolatti, G.,
and Arbib, M.A. (1998) Language within our grasp. Trends
in Neuroscience 21,5:188-94.
Acknowledgements
Preparation
of this Commentary was supported in part by a Fellowship from the Center for
Interdisciplinary Research of the