Like the Greek project, we locate the present project squarely within the developing "Laboratory Phonology" tradition (e.g. Beckman and Kingston 1990, Pierrehumbert et al 2000) in which data from the phonetics laboratory are used to investigate not the physical foundations of speech production and perception, but the nature of the phonological structures that underlie speech. We were also very aware of the potential implications of our findings for improving the prosody of synthetic speech.
(1) to provide sound and extensive empirical data on F0 target alignment in common intonation patterns in English and Dutch.
This objective has clearly been met, and has been extended to consideration of certain cases in German as well, for reasons discussed in section 2b below. Among the topics we have addressed are:
(2) to shed light on the interaction of phonological and phonetic effects in F0 alignment.
Again, this objective has been thoroughly addressed, most notably for the interaction of phonological "length" and phonetic "duration" in determining peak alignment in high accents, both nuclear and prenuclear, and for the interaction of "prominence" and "finality" in determining the alignment of postnuclear F0 valleys. It is clear from our findings that both phonological and phonetic factors affect the details of alignment.
(3) to clarify several theoretical issues in the analysis of intonational phonology, the most important being (1) the phonology of phrase tones and (2) the nature of the segmental landmarks with which tones may be aligned.
This objective has also been amply met.
(4) to determine the most appropriate means of identifying F0 targets that are not clear maxima or minima but only "elbows", as a general methodological contribution to F0 research.
This objective was abandoned, for two reasons. First, it became clear that we could profitably devote the entire period of the grant to issues that could be studied with reference only to clear maxima and minima. Second, results that emerged from the Greek study after we applied for the present project showed that if we based our analyses of alignment on F0 "elbows" located by eye by expert labellers, our conclusions were the same as if we based them on two different types of automatic analysis. In the longer term, we have no doubt that labour-intensive studies of the sort we have carried out will need to be replaced by more automatic procedures based on much larger databases of speech, but in the meantime we saw little benefit in pursuing small methodological improvements in an area where fundamental questions of substance are being usefully explored. In place of this objective, we have dealt with an entirely separate methodological objective that we feel is important right now (see section 2b immediately below).
(5) to produce two small "Map Task" corpora of spontaneous dialogue, one in English and one in Dutch, in which prosodic variables are manipulated in the choice of landmark names.
A Dutch Map Task corpus consisting of 8 conversations involving 4 speakers (two male and two female), with full orthographic transcription, has been offered to the ESRC data archive. We decided against producing an English map task corpus as the extent of the alignment differences between different varieties of English became clear: our other objectives for English were more appropriately met by controlled experiments involving read speech. For more details on the Map Task and on the corpus see sections 3 and 6.
2b. Changes to objectives
The procedures and techniques involved in the reading and perception experiments were fairly standard and we had experience with them from previous work. These procedures include constructing controlled speech materials, segmenting utterances from spectrographic displays, and (for the perception experiments) digital resynthesis of intonation. For the most part the reading experiments had factorial designs which were analysed by Analysis of Variance (ANOVA).
The Map Task, which was developed in Edinburgh in the 1980s, has become something of an industry standard in the study of spontaneous dialogue, and we were able to draw on the extensive experience of colleagues in designing our study. The Map Task works as follows: the two participants to the conversation each have a map showing a variety of named landmarks. The maps may differ slightly in detail; neither speaker can see the other's map. One map (the "Instruction Giver's" map) has a route marked on it, and the task is for the Instruction Giver to explain to the Instruction Follower where the route passes, referring to the various landmarks along the way - accurately enough that the Instruction Follower can reproduce the route on his or her own map.
For our purposes, the point of using the Map Task was to obtain natural productions of certain contour types (e.g. various kinds of question intonation, "continuation rises", "flat hat" accentuation patterns) which are difficult to obtain in reading experiments without explicitly instructing the speakers how to speak (and sometimes not even then). Our most important manipulation of the maps was to select landmark names that manifested the phonological structures we were interested in, and that contained consonant types which would permit easy analysis of pitch patterns. Here we were inspired especially by the work of Martine Grice and her colleagues (e.g. Grice and Savino 1995).
Study 1: alignment and speech rate in English
The question addressed by these read-speech experiments was: what happens
to
alignment when speakers talk faster or slower?
Do the alignment
points stay the same (i.e. do pitch changes become faster or
slower)? Or do the pitch changes take a fairly fixed amount of time,
meaning that the alignment points change when speaking rate changes? Our
Greek findings would naturally lead us to predict the
first outcome, and this is what we found. This work was published
in
JASA
in 1999.
Study 2: alignment and phonological vowel length in Dutch
This is probably the most important study of the project so far, in that
it
establishes clearly that abstract structural features (like
phonological vowel length and/or syllable structure) are relevant to F0
target alignment, and that alignment regularities cannot be explained
on the basis of "low-level" phonetic features alone. The relevance of
structural factors is central to our whole approach. In two read-speech
experiments, we showed that alignment is affected by vowel length: pitch
peaks
are generally aligned during the accented vowel
when the vowel is long and during the following consonant when the vowel
is
short. We also showed that this effect is not just a matter
of the actual duration of the vowel, but depends in an either-or
way on whether the vowel is structurally long or short. The
JASA
paper included with this report is a full writeup of this study.
Study 3: alignment and phonological vowel length in English
This is the least satisfactory part of the project, in that
we devoted considerable time to an important question
but were unable to reach publishable conclusions. Here we outline
what we did, what we found, why we have not
yet been able to publish the findings, and what the prospects
are.
The research question is as follows. There is good reason to believe that in English, as in Dutch, prenuclear high accent peaks are aligned earlier with long vowels than with short vowels. However, in many varieties of English, the phonological status of vowel length distinctions is rather unclear; only in Standard Southern English ("RP") is the phonological situation similar to Dutch, with matched sets of long and short vowels and clear phonological and phonetic correlates of the distinction. In General American English, some of the phonologically short vowels (notably the vowels of bad and bog) are phonetically quite long and not obviously part of long-short pairs. In Standard Scottish English three of the RP length distinctions are absent, but conversely, there is an unusual distribution of shorter and longer vowel allophones (the so-called Scottish Vowel Length Rule), which gives rise to certain quasi-phonemic length distinctions unparalleled in any other English variety (e.g. the quasi-contrast between side and sighed). Consequently, it was of some interest to see how the alignment phenomena discovered in Dutch would be manifested in English.
Our first experiment on this subject compared RP and Scottish alignment for the vowels of beat, bit, bait, bet, bat, and hot. Three results stood out: first, the long-short distinction found in Dutch is clearly also present in English (i.e. alignment relative to the end of the vowel is earlier for beat and bait than for the others); second, Scottish alignment is consistently slightly later later than RP alignment; and third, surprisingly, alignment in the Scottish bat and hot vowels patterns with the short vowels - as in RP - despite the fact that they are phonetically relatively long and do not typically participate in a long-short contrast.
This third result made it advisable to investigate the other vowels of Scottish English (the vowels of boat, but, boot, bite) to see how they behaved; we were especially interested in the behaviour of the boot vowel, which (like the bat and hot vowels) does not participate in a length contrast present in RP, but which is phonetically short. However, the combined results of the two experiments made no sense phonologically, and the data were noisy enough that we felt that recordings of more speakers and/or more cases were needed. We also realised that the materials we had prepared may have been confounded by Scottish Vowel Length Rule effects, and it would have been difficult to design new materials avoiding this problem.
Meanwhile, we had recorded a group of newly-arrived American students, and encountered a different set of problems: the accent peaks on the test syllables were so late that they were difficult to regard as comparable to the British and Dutch data. We believe this finding is related to the caricature rising intonation of informal American English (so-called "uptalk"), but whatever the reason it made the data unusable. Since we did not want base the research on American speakers who had been in Edinburgh for any significant length of time, we were left with little choice but to abandon the American side of the project. Given the difficulties with the Scottish English materials, we we decided that the entire line of research on alignment and phonological vowel length in non-RP English was taking too much time away from other aspects of the project that were potentially more fruitful, and reluctantly put it all on hold. We hope to be able to return to the Scottish material sometime in 2002, but the American material will probably remain as a dead end unless we are able to make a further major investment of time and do appropriate pilot studies.
Study 4: alignment of inter-accent valleys in English
This study builds on Study 1. If (as we found in Study 1) the F0 in
rising pitch accents begins rising at the beginning of the accented
syllable, then we should find alignment differences in pairs like
grade A / grey day
(the valley between the two accent peaks should be later in
grade A
than in
grey day).
In read speech, we did find such differences.
We also demonstrated the perceptual relevance of these differences
in an experiment where we manipulated the alignment of the F0 valley
and asked listeners to judge which phrase they heard. The
Journal of Phonetics
paper included with this report is a writeup of these experiments
and their phonological implications.
Study 5: alignment of post-nuclear valleys in Dutch
This was a read-speech study in which speakers read questions of the
form "Do you live in X?", where X was a town or village name. Using
geographical names allowed us to find sufficient instances of several
different phonological patterns. Most speakers, as expected, used a
fall-rise question intonation in most cases, and we were interested in
studying how the alignment of the F0 valley (between the fall and the
rise) is affected by the presence of a secondary stressed (unreduced)
postnuclear syllable. We found that if there are only weak (reduced)
syllables following the nuclear syllable, the final rise begins at the
beginning of the last syllable. However, if there is a strong
(secondary-stressed) syllable following the last stressed syllable, the
beginning of the rise aligns with the strong vowel: early in the strong
vowel if there are no further syllables in the phrase, and late in the
strong vowel if the secondary stressed syllable is followed by another
weak syllable. As noted earlier (section 2, objective 3), this pattern of findings is a striking confirmation of theoretical ideas about the nature of "phrase accents"
(Grice et al. 2000). This material is almost ready to write up for journal
publication (see study 7 below).
Study 6: alignment of nuclear accent peaks in Dutch
This reading experiment was modelled on the study by Prieto et al.
1995. Most importantly, we wanted to see whether phonological vowel
length plays the same role in nuclear accents that we found for
prenuclear accents. Additionally, like Prieto et al., we wanted to
investigate the effects of "right context": we manipulated whether the
nuclear word consisted of one syllable or two, and whether the
following word began with a stressed or unstressed syllable. Results
show clearly that vowel length produces a large effect, but the
right-context effects are small and inconsistent across speakers. So
far we have analysed the results using ANOVA; it seems clear that a
multiple regression analysis will be more revealing, but we have not
yet done this.
In a published write-up, we expect to report also the data on the nuclear peaks from study 5, which show interesting differences: in particular, they do seem to be affected in consistent ways by the right-context manipulations that were originally intended to affect the postnuclear F0 valleys. We believe the different sensitivity to right-context effects is related to the fact that in study 5 the right-context differences are within the nuclear word, whereas in study 6 they are not.
Study 7: alignment features in intonation patterns in spontaneous speech in Dutch
We were particularly interested in two types of contour that are
difficult to obtain in reading experiments. These are: (1) "flat hat"
accent patterns, in which two high accents on syntactically or
semantically closely linked phrases (e.g. Adjective+Noun) are connected
by a high level F0 stretch, yielding a contour that graphically
resembles a hat (this is a common pattern in Dutch and German, but much
less common in English); (2) natural question contours, of which Dutch
has at least three distinct varieties ("fall-rise", "low rise", and
"high rise", in traditional British terms). In addition to these two,
we also obtained a large number of tokens of "continuation rises" in
utterances where the Instruction Giver was presenting a series of
instructions.
The flat hat data have been analysed acoustically and labelled but we have not yet analysed effects of vowel length, right context, etc. The question fall-rise contours (QFRs), of which there were a considerable number, appear to be exactly comparable to the contour used by most of the speakers in study 5; as noted in section 2b, we are about to test the alignment findings from study 5 against the data from the spontaneous-speech QFRs. As for the continuation rises, they appear to show that the principles for aligning nuclear F0 valleys (i.e. continuation rise low targets) are the same as for aligning post-nuclear F0 valleys (i.e. the QFR lows); if this conclusion stands up to statistical analysis, it will represent further confirmation of the Grice et al. view of phrase accents.
Study 8: alignment in Northern and Southern German
The main purpose of this study - which as noted in section 2b arose
from a serendipitous discovery during a visit to the lab by a German
researcher - was to shed light on how alignment is specified
in different languages. We had already shown that languages can differ:
for example, in English and Dutch the alignment of the pitch peak is
earlier (for all vowels) than it would be in Greek. German appeared
to align prenuclear rises somewhat later than English or Dutch, which
raised the question of whether the differences are categorical (e.g.
Dutch aligns the peak with the end of the syllable and German aligns
the peak with the end of the following vowel) or continuous (e.g.
Dutch aligns the peak 10 ms before the end of the syllable and
German aligns it 30 ms after). Our results, taken together with the
Scottish-RP comparisons in study 3, strongly suggest that language-specific differences must be specified in
quantitative detail (rises in Southern German are a little later than in
Northern German, which in turn are a little later than in Dutch and
English, and the intra-German differences carry over to German-accented
English). In current terminology,
such differences of alignment are a matter of "language-specific phonetic
rules", not phonological structure. The write-up of this study is
already partially completed and should be submitted to
Journal of Phonetics
by about the end of 2001.
In March 2000 Dr. Schepman left to take up a permanent post in Psychology at the University of Abertay Dundee. As the PI was about to embark on a 5-month sabbatical absence from Edinburgh, the project was suspended for 5 months and a no-cost extension was permitted. When the PI returned to Edinburgh in September 2000, RA responsibilities were taken over by Dr. Robin Lickley, a long-term contract researcher in the department who was unfortunately caught between grants and whose background and range of abilities was an astonishingly good match to the projects needs (even including a working knowledge of Dutch!). In the meantime, Dr. Mennen and Dr. Lickley have both taken up posts at Queen Margaret University College in Edinburgh, so there should be no logistical obstacles to completing the remaining papers listed as "in preparation" in section 6 below.
The ESRC's administrative flexibility in dealing with the difficulties faced by PIs when RAs leave in mid-project is not only realistic but also extremely helpful. If we had not been able to postpone the start date or to have the no-cost extension, the project would have had a very much less successful overall outcome.
5b. Associated mini-projects
Because "research-led teaching" is a working reality in our department,
we were able to incorporate results from student projects
into the project.
1. Alignment and speech rate in English (study 1): The impetus for this study came from an MSc dissertation carried out in the department by Daniel Faulkner in the summer of 1997. This dissertation formed the basis of Experiment 1 in the resulting published paper. Because the results were so promising, another student (Hanneke van der Marel, later Hanneke Faulkner) carried out a more focused experiment as the basis for her undergraduate Honours dissertation, which was the basis for Experiment 2 in the published paper. (The dissertation reported on only one speaker, but we were able to obtain local funding to pay Ms. van der Marel to make acoustic measurements of the other 5 speakers during the summer after her graduation.) Both dissertations were supervised by the PI and the statistical analysis of the data in both was redone by Dr. Schepman as part of the main project.
2. Alignment of prenuclear rising accents in German and German-accented English
(study 8): Two course projects for the PI's course module on Prosody
in 1999-2000 served as pilot studies for this study. Daniela Heide, a
visiting German undergraduate, did a preliminary test of the idea
that Northern and Southern German accents differ in alignment,
while Michaela Atterer, who was doing the MSc in Cognitive Science,
piloted the hypothesis that German alignment
patterns would carry over to English. When these two projects
yielded promising results, Atterer (who had returned to Munich)
carried out a full-scale experiment in cooperation with the PI.
5c. Experimental work
Much of the work of the project consisted of devising suitably
controlled speech materials, recording speakers, and making acoustic
measurements. As noted in previous project reports to ESRC, this is
inherently slow and labour-intensive work, and difficult to automate
given our hypotheses and our current state of knowledge.
There is little point in providing details of which materials
were recorded and analysed when, but it is worth noting that
we made two trips to the Netherlands, one in March 1998 and
one in February 1999, and that Michaela Atterer made one trip
back to Edinburgh from Munich to do acoustic analysis of the
German materials in May 2001.
5d. Conferences and talks
Papers reporting on aspects of the project were presented at the following
conferences:
The 14th International Congress of Phonetic Sciences (ICPhS), San Francisco, August 1999. (Published proceedings; see section 6).
Architectures and Mechanisms in Language Processing (AMLAP-99), Edinburgh, September 1999. (A preliminary report of study 2 for an audience of psycholinguists.)
The PI has presented invited talks, classes and seminars on aspects of the project at the following institutions:
6a. Papers Published or Accepted
6b. Papers in Preparation
Schepman, Lickley, Ladd. Paper on the alignment of targets in nuclear
accent peaks and postnuclear valleys (studies 5 and 6). To be submitted
to
Journal of Phonetics
during 2001-02.
Lickley, Schepman, Ladd. Paper on the comparison of phonetic details in
spontaneous and read speech (studies 5 and 7). To be submitted to
Language and Speech
by mid-2002.
Atterer and Ladd. Paper on alignment in German and its phonological
implications (study 8). To be submitted to
Journal of Phonetics
by early 2002.
6c. The Dutch Map Task corpus
The corpus that we have used as the source of our data and that we
have offered to the ESRC data archive consists of 8 conversations involving
4 speakers, 2 male and 2 female, for a total of just over 41 minutes
of speech. The recordings
were made in a single morning in February 1999
at the Phonetics Department at the
University
of Nijmegen. The speakers were undergraduates in their 20s, mostly
students of
English. The corpus was orthographically transcribed by Angela Vonk, a student at the
University of
Nijmegen. Subsequent minor corrections to the transcription have been
made
by Astrid Schepman and Robin Lickley during the course of working with the
speech files.
The material offered to the ESRC data archive consists of the digital recordings of all eight dialogues, the orthographic transcriptions, and reduced copies of the maps used by the participants.
The research programme begun with the Greek project and continued in this project has also had a small but significant impact on the speech technology industry. Rule-based intonation synthesis research at Aculab in Milton Keynes (Monaghan et al. 2001) is exploring ways of modelling accents in terms of two alignment points instead of one, and doing away with independent slope and duration parameters. Commercial confidentiality makes it impossible for us to provide further detail on this research, or indeed to know very much about what other speech technology firms are doing.
More generally, several of our findings point to the conclusion that the standard Pierrehumbert analylsis of English intonational phonology is ripe for a reexamination. If it is true that some of the differences between languages are a matter of "language-specific phonetic rules" rather than phonological differences, then any such reexamination should consider the place of putative phonological categories in the analysis. We believe it would be highly appropriate for ESRC to fund such research.