ALIGNMENT OF FUNDAMENTAL FREQUENCY TARGETS IN ENGLISH AND DUTCH
ESRC Award Reference No. R000 23 7447
Attachment to END OF AWARD REPORT

1. Background

The research reported here builds directly on the PI's previous ESRC-funded project, "Phonetic and Phonological Properties of Tonal Targets in Modern Greek" (R000-23-5614). That project led to the discovery of substantial regularities in the way the overall intonation contour of Greek utterances is coordinated in time (or "aligned") with the phonetic segments of an utterance. Specifically, we found that certain local "targets" or "turning points" in the pitch contour (e.g. the beginning and the end of an accentual rise in pitch) are very precisely anchored to points in the segmental string (beginning of syllable, onset of following vowel, etc.). Our overall aim in the present project was to determine how far these findings can be generalised to different languages and to different speaking styles and rates. The main languages we investigated were Dutch and three different varieties of English.

Like the Greek project, we locate the present project squarely within the developing "Laboratory Phonology" tradition (e.g. Beckman and Kingston 1990, Pierrehumbert et al 2000) in which data from the phonetics laboratory are used to investigate not the physical foundations of speech production and perception, but the nature of the phonological structures that underlie speech. We were also very aware of the potential implications of our findings for improving the prosody of synthetic speech.

2. Objectives

2a. Original objectives
Here we list the objectives of the research, as stated in the original application, followed by brief commentary on how each objective was met. References are made here and throughout the report to eight numbered studies; see section 4 for more detail.

(1) to provide sound and extensive empirical data on F0 target alignment in common intonation patterns in English and Dutch.

This objective has clearly been met, and has been extended to consideration of certain cases in German as well, for reasons discussed in section 2b below. Among the topics we have addressed are:

(2) to shed light on the interaction of phonological and phonetic effects in F0 alignment.

Again, this objective has been thoroughly addressed, most notably for the interaction of phonological "length" and phonetic "duration" in determining peak alignment in high accents, both nuclear and prenuclear, and for the interaction of "prominence" and "finality" in determining the alignment of postnuclear F0 valleys. It is clear from our findings that both phonological and phonetic factors affect the details of alignment.

(3) to clarify several theoretical issues in the analysis of intonational phonology, the most important being (1) the phonology of phrase tones and (2) the nature of the segmental landmarks with which tones may be aligned.

This objective has also been amply met.

Our results bear on at least one other clearly phonological question, namely the phonology of English high accents: our data on the alignment of inter-accent valleys support a substantial revision of the standard Pierrehumbert analysis of high accents (study 4).

(4) to determine the most appropriate means of identifying F0 targets that are not clear maxima or minima but only "elbows", as a general methodological contribution to F0 research.

This objective was abandoned, for two reasons. First, it became clear that we could profitably devote the entire period of the grant to issues that could be studied with reference only to clear maxima and minima. Second, results that emerged from the Greek study after we applied for the present project showed that if we based our analyses of alignment on F0 "elbows" located by eye by expert labellers, our conclusions were the same as if we based them on two different types of automatic analysis. In the longer term, we have no doubt that labour-intensive studies of the sort we have carried out will need to be replaced by more automatic procedures based on much larger databases of speech, but in the meantime we saw little benefit in pursuing small methodological improvements in an area where fundamental questions of substance are being usefully explored. In place of this objective, we have dealt with an entirely separate methodological objective that we feel is important right now (see section 2b immediately below).

(5) to produce two small "Map Task" corpora of spontaneous dialogue, one in English and one in Dutch, in which prosodic variables are manipulated in the choice of landmark names.

A Dutch Map Task corpus consisting of 8 conversations involving 4 speakers (two male and two female), with full orthographic transcription, has been offered to the ESRC data archive. We decided against producing an English map task corpus as the extent of the alignment differences between different varieties of English became clear: our other objectives for English were more appropriately met by controlled experiments involving read speech. For more details on the Map Task and on the corpus see sections 3 and 6.

2b. Changes to objectives

3. Methods

As planned, we used two main sources of speech data: (i) "reading experiments" in which carefully controlled speech materials were read aloud under studio recording conditions; (ii) a studio-recorded corpus of unscripted "task-oriented dialogues" (the "Map Task"; Anderson et al. 1991), which make it possible to exercise some control over what speakers say while at the same time preserving the spontaneity and naturalness of real dialogue. We also carried out one perception experiment, in which the intonation of naturally-spoken utterances was systematically modified by means of digital resynthesis and the resulting stimuli were presented to listeners for judgement, using standard experiment-running software (Psyscope).

The procedures and techniques involved in the reading and perception experiments were fairly standard and we had experience with them from previous work. These procedures include constructing controlled speech materials, segmenting utterances from spectrographic displays, and (for the perception experiments) digital resynthesis of intonation. For the most part the reading experiments had factorial designs which were analysed by Analysis of Variance (ANOVA).

The Map Task, which was developed in Edinburgh in the 1980s, has become something of an industry standard in the study of spontaneous dialogue, and we were able to draw on the extensive experience of colleagues in designing our study. The Map Task works as follows: the two participants to the conversation each have a map showing a variety of named landmarks. The maps may differ slightly in detail; neither speaker can see the other's map. One map (the "Instruction Giver's" map) has a route marked on it, and the task is for the Instruction Giver to explain to the Instruction Follower where the route passes, referring to the various landmarks along the way - accurately enough that the Instruction Follower can reproduce the route on his or her own map.

For our purposes, the point of using the Map Task was to obtain natural productions of certain contour types (e.g. various kinds of question intonation, "continuation rises", "flat hat" accentuation patterns) which are difficult to obtain in reading experiments without explicitly instructing the speakers how to speak (and sometimes not even then). Our most important manipulation of the maps was to select landmark names that manifested the phonological structures we were interested in, and that contained consonant types which would permit easy analysis of pitch patterns. Here we were inspired especially by the work of Martine Grice and her colleagues (e.g. Grice and Savino 1995).

4. Results

Eight specific subparts of the project can be identified. We give here a brief summary of the main findings of each.

Study 1: alignment and speech rate in English
The question addressed by these read-speech experiments was: what happens to alignment when speakers talk faster or slower? Do the alignment points stay the same (i.e. do pitch changes become faster or slower)? Or do the pitch changes take a fairly fixed amount of time, meaning that the alignment points change when speaking rate changes? Our Greek findings would naturally lead us to predict the first outcome, and this is what we found. This work was published in JASA in 1999.

Study 2: alignment and phonological vowel length in Dutch
This is probably the most important study of the project so far, in that it establishes clearly that abstract structural features (like phonological vowel length and/or syllable structure) are relevant to F0 target alignment, and that alignment regularities cannot be explained on the basis of "low-level" phonetic features alone. The relevance of structural factors is central to our whole approach. In two read-speech experiments, we showed that alignment is affected by vowel length: pitch peaks are generally aligned during the accented vowel when the vowel is long and during the following consonant when the vowel is short. We also showed that this effect is not just a matter of the actual duration of the vowel, but depends in an either-or way on whether the vowel is structurally long or short. The JASA paper included with this report is a full writeup of this study.

Study 3: alignment and phonological vowel length in English
This is the least satisfactory part of the project, in that we devoted considerable time to an important question but were unable to reach publishable conclusions. Here we outline what we did, what we found, why we have not yet been able to publish the findings, and what the prospects are.

The research question is as follows. There is good reason to believe that in English, as in Dutch, prenuclear high accent peaks are aligned earlier with long vowels than with short vowels. However, in many varieties of English, the phonological status of vowel length distinctions is rather unclear; only in Standard Southern English ("RP") is the phonological situation similar to Dutch, with matched sets of long and short vowels and clear phonological and phonetic correlates of the distinction. In General American English, some of the phonologically short vowels (notably the vowels of bad and bog) are phonetically quite long and not obviously part of long-short pairs. In Standard Scottish English three of the RP length distinctions are absent, but conversely, there is an unusual distribution of shorter and longer vowel allophones (the so-called Scottish Vowel Length Rule), which gives rise to certain quasi-phonemic length distinctions unparalleled in any other English variety (e.g. the quasi-contrast between side and sighed). Consequently, it was of some interest to see how the alignment phenomena discovered in Dutch would be manifested in English.

Our first experiment on this subject compared RP and Scottish alignment for the vowels of beat, bit, bait, bet, bat, and hot. Three results stood out: first, the long-short distinction found in Dutch is clearly also present in English (i.e. alignment relative to the end of the vowel is earlier for beat and bait than for the others); second, Scottish alignment is consistently slightly later later than RP alignment; and third, surprisingly, alignment in the Scottish bat and hot vowels patterns with the short vowels - as in RP - despite the fact that they are phonetically relatively long and do not typically participate in a long-short contrast.

This third result made it advisable to investigate the other vowels of Scottish English (the vowels of boat, but, boot, bite) to see how they behaved; we were especially interested in the behaviour of the boot vowel, which (like the bat and hot vowels) does not participate in a length contrast present in RP, but which is phonetically short. However, the combined results of the two experiments made no sense phonologically, and the data were noisy enough that we felt that recordings of more speakers and/or more cases were needed. We also realised that the materials we had prepared may have been confounded by Scottish Vowel Length Rule effects, and it would have been difficult to design new materials avoiding this problem.

Meanwhile, we had recorded a group of newly-arrived American students, and encountered a different set of problems: the accent peaks on the test syllables were so late that they were difficult to regard as comparable to the British and Dutch data. We believe this finding is related to the caricature rising intonation of informal American English (so-called "uptalk"), but whatever the reason it made the data unusable. Since we did not want base the research on American speakers who had been in Edinburgh for any significant length of time, we were left with little choice but to abandon the American side of the project. Given the difficulties with the Scottish English materials, we we decided that the entire line of research on alignment and phonological vowel length in non-RP English was taking too much time away from other aspects of the project that were potentially more fruitful, and reluctantly put it all on hold. We hope to be able to return to the Scottish material sometime in 2002, but the American material will probably remain as a dead end unless we are able to make a further major investment of time and do appropriate pilot studies.

Study 4: alignment of inter-accent valleys in English
This study builds on Study 1. If (as we found in Study 1) the F0 in rising pitch accents begins rising at the beginning of the accented syllable, then we should find alignment differences in pairs like grade A / grey day (the valley between the two accent peaks should be later in grade A than in grey day). In read speech, we did find such differences. We also demonstrated the perceptual relevance of these differences in an experiment where we manipulated the alignment of the F0 valley and asked listeners to judge which phrase they heard. The Journal of Phonetics paper included with this report is a writeup of these experiments and their phonological implications.

Study 5: alignment of post-nuclear valleys in Dutch
This was a read-speech study in which speakers read questions of the form "Do you live in X?", where X was a town or village name. Using geographical names allowed us to find sufficient instances of several different phonological patterns. Most speakers, as expected, used a fall-rise question intonation in most cases, and we were interested in studying how the alignment of the F0 valley (between the fall and the rise) is affected by the presence of a secondary stressed (unreduced) postnuclear syllable. We found that if there are only weak (reduced) syllables following the nuclear syllable, the final rise begins at the beginning of the last syllable. However, if there is a strong (secondary-stressed) syllable following the last stressed syllable, the beginning of the rise aligns with the strong vowel: early in the strong vowel if there are no further syllables in the phrase, and late in the strong vowel if the secondary stressed syllable is followed by another weak syllable. As noted earlier (section 2, objective 3), this pattern of findings is a striking confirmation of theoretical ideas about the nature of "phrase accents" (Grice et al. 2000). This material is almost ready to write up for journal publication (see study 7 below).

Study 6: alignment of nuclear accent peaks in Dutch
This reading experiment was modelled on the study by Prieto et al. 1995. Most importantly, we wanted to see whether phonological vowel length plays the same role in nuclear accents that we found for prenuclear accents. Additionally, like Prieto et al., we wanted to investigate the effects of "right context": we manipulated whether the nuclear word consisted of one syllable or two, and whether the following word began with a stressed or unstressed syllable. Results show clearly that vowel length produces a large effect, but the right-context effects are small and inconsistent across speakers. So far we have analysed the results using ANOVA; it seems clear that a multiple regression analysis will be more revealing, but we have not yet done this.

In a published write-up, we expect to report also the data on the nuclear peaks from study 5, which show interesting differences: in particular, they do seem to be affected in consistent ways by the right-context manipulations that were originally intended to affect the postnuclear F0 valleys. We believe the different sensitivity to right-context effects is related to the fact that in study 5 the right-context differences are within the nuclear word, whereas in study 6 they are not.

Study 7: alignment features in intonation patterns in spontaneous speech in Dutch
We were particularly interested in two types of contour that are difficult to obtain in reading experiments. These are: (1) "flat hat" accent patterns, in which two high accents on syntactically or semantically closely linked phrases (e.g. Adjective+Noun) are connected by a high level F0 stretch, yielding a contour that graphically resembles a hat (this is a common pattern in Dutch and German, but much less common in English); (2) natural question contours, of which Dutch has at least three distinct varieties ("fall-rise", "low rise", and "high rise", in traditional British terms). In addition to these two, we also obtained a large number of tokens of "continuation rises" in utterances where the Instruction Giver was presenting a series of instructions.

The flat hat data have been analysed acoustically and labelled but we have not yet analysed effects of vowel length, right context, etc. The question fall-rise contours (QFRs), of which there were a considerable number, appear to be exactly comparable to the contour used by most of the speakers in study 5; as noted in section 2b, we are about to test the alignment findings from study 5 against the data from the spontaneous-speech QFRs. As for the continuation rises, they appear to show that the principles for aligning nuclear F0 valleys (i.e. continuation rise low targets) are the same as for aligning post-nuclear F0 valleys (i.e. the QFR lows); if this conclusion stands up to statistical analysis, it will represent further confirmation of the Grice et al. view of phrase accents.

Study 8: alignment in Northern and Southern German
The main purpose of this study - which as noted in section 2b arose from a serendipitous discovery during a visit to the lab by a German researcher - was to shed light on how alignment is specified in different languages. We had already shown that languages can differ: for example, in English and Dutch the alignment of the pitch peak is earlier (for all vowels) than it would be in Greek. German appeared to align prenuclear rises somewhat later than English or Dutch, which raised the question of whether the differences are categorical (e.g. Dutch aligns the peak with the end of the syllable and German aligns the peak with the end of the following vowel) or continuous (e.g. Dutch aligns the peak 10 ms before the end of the syllable and German aligns it 30 ms after). Our results, taken together with the Scottish-RP comparisons in study 3, strongly suggest that language-specific differences must be specified in quantitative detail (rises in Southern German are a little later than in Northern German, which in turn are a little later than in Dutch and English, and the intra-German differences carry over to German-accented English). In current terminology, such differences of alignment are a matter of "language-specific phonetic rules", not phonological structure. The write-up of this study is already partially completed and should be submitted to Journal of Phonetics by about the end of 2001.

5. Activities

5a. General
When the proposal was submitted, the intention was that Ineke Mennen, the co-applicant, would be the paid research associate on the project. However, before the grant was awarded, she took another job, and the project began with a search for a suitable RA. We were fortunate to be able to hire Astrid Schepman, who had just completed a PhD in Psycholinguistics at the University of Sussex. The project formally began on 1 February 1998.

In March 2000 Dr. Schepman left to take up a permanent post in Psychology at the University of Abertay Dundee. As the PI was about to embark on a 5-month sabbatical absence from Edinburgh, the project was suspended for 5 months and a no-cost extension was permitted. When the PI returned to Edinburgh in September 2000, RA responsibilities were taken over by Dr. Robin Lickley, a long-term contract researcher in the department who was unfortunately caught between grants and whose background and range of abilities was an astonishingly good match to the projects needs (even including a working knowledge of Dutch!). In the meantime, Dr. Mennen and Dr. Lickley have both taken up posts at Queen Margaret University College in Edinburgh, so there should be no logistical obstacles to completing the remaining papers listed as "in preparation" in section 6 below.

The ESRC's administrative flexibility in dealing with the difficulties faced by PIs when RAs leave in mid-project is not only realistic but also extremely helpful. If we had not been able to postpone the start date or to have the no-cost extension, the project would have had a very much less successful overall outcome.

5b. Associated mini-projects
Because "research-led teaching" is a working reality in our department, we were able to incorporate results from student projects into the project.

1. Alignment and speech rate in English (study 1): The impetus for this study came from an MSc dissertation carried out in the department by Daniel Faulkner in the summer of 1997. This dissertation formed the basis of Experiment 1 in the resulting published paper. Because the results were so promising, another student (Hanneke van der Marel, later Hanneke Faulkner) carried out a more focused experiment as the basis for her undergraduate Honours dissertation, which was the basis for Experiment 2 in the published paper. (The dissertation reported on only one speaker, but we were able to obtain local funding to pay Ms. van der Marel to make acoustic measurements of the other 5 speakers during the summer after her graduation.) Both dissertations were supervised by the PI and the statistical analysis of the data in both was redone by Dr. Schepman as part of the main project.

2. Alignment of prenuclear rising accents in German and German-accented English
(study 8): Two course projects for the PI's course module on Prosody in 1999-2000 served as pilot studies for this study. Daniela Heide, a visiting German undergraduate, did a preliminary test of the idea that Northern and Southern German accents differ in alignment, while Michaela Atterer, who was doing the MSc in Cognitive Science, piloted the hypothesis that German alignment patterns would carry over to English. When these two projects yielded promising results, Atterer (who had returned to Munich) carried out a full-scale experiment in cooperation with the PI.

5c. Experimental work
Much of the work of the project consisted of devising suitably controlled speech materials, recording speakers, and making acoustic measurements. As noted in previous project reports to ESRC, this is inherently slow and labour-intensive work, and difficult to automate given our hypotheses and our current state of knowledge. There is little point in providing details of which materials were recorded and analysed when, but it is worth noting that we made two trips to the Netherlands, one in March 1998 and one in February 1999, and that Michaela Atterer made one trip back to Edinburgh from Munich to do acoustic analysis of the German materials in May 2001.

5d. Conferences and talks
Papers reporting on aspects of the project were presented at the following conferences:

The 14th International Congress of Phonetic Sciences (ICPhS), San Francisco, August 1999. (Published proceedings; see section 6).

Architectures and Mechanisms in Language Processing (AMLAP-99), Edinburgh, September 1999. (A preliminary report of study 2 for an audience of psycholinguists.)

The PI has presented invited talks, classes and seminars on aspects of the project at the following institutions:

Dr. Schepman presented an invited talk on study 5 at the University of Dundee in November 2000.

6. Outputs

The most important output of the project is a series of papers in refereed journals, describing the findings of our individual experiments. Some of these have already appeared; one has been provisionally accepted pending minor revisions and will shortly be submitted in final form; still others are actively in preparation. The other important output of the project is the Dutch map task corpus.

6a. Papers Published or Accepted

D. R. Ladd, D. Faulkner, H. Faulkner, A. Schepman (1999). Constant "segmental anchoring" of F0 movements under changes in speech rate. Journal of the Acoustical Society of America, 106: 1543-1554.
[This paper describes the findings of study 1.]
D. R. Ladd, Ineke Mennen, and Astrid Schepman (2000). Phonological conditioning of peak alignment of rising pitch accents in Dutch. Journal of the Acoustical Society of America 107: 2685-2696.
[This paper describes the findings of study 2.]
D. R. Ladd and A. Schepman (1999). Segmental Anchoring of Tones as a Word-Boundary Correlate in English. In Proceedings of the 13th International Congress of Phonetic Sciences, San Francisco, August 1999, vol. 3, pp. 1869-1872.
[This paper gives a preliminary report of one of the experiments of study 4].
D. R. Ladd and A. Schepman (forthcoming). "Sagging transitions" between high pitch accents in English: experimenatl evidence. Accepted pending minor revisions by Journal of Phonetics [This paper is the full report on study 4.]

6b. Papers in Preparation
Schepman, Lickley, Ladd. Paper on the alignment of targets in nuclear accent peaks and postnuclear valleys (studies 5 and 6). To be submitted to Journal of Phonetics during 2001-02.
Lickley, Schepman, Ladd. Paper on the comparison of phonetic details in spontaneous and read speech (studies 5 and 7). To be submitted to Language and Speech by mid-2002.
Atterer and Ladd. Paper on alignment in German and its phonological implications (study 8). To be submitted to Journal of Phonetics by early 2002.

6c. The Dutch Map Task corpus
The corpus that we have used as the source of our data and that we have offered to the ESRC data archive consists of 8 conversations involving 4 speakers, 2 male and 2 female, for a total of just over 41 minutes of speech. The recordings were made in a single morning in February 1999 at the Phonetics Department at the University of Nijmegen. The speakers were undergraduates in their 20s, mostly students of English. The corpus was orthographically transcribed by Angela Vonk, a student at the University of Nijmegen. Subsequent minor corrections to the transcription have been made by Astrid Schepman and Robin Lickley during the course of working with the speech files.

The material offered to the ESRC data archive consists of the digital recordings of all eight dialogues, the orthographic transcriptions, and reduced copies of the maps used by the participants.

7. Impacts

Extensive use has been made of the Dutch map task corpus by Dr. Johanneke Caspers of the University of Leiden. At least three papers published in conference proceedings (Caspers 2000a,b, 2001) are based directly on our corpus. This illustrates the value of producing corpora of this sort, especially in languages other than English. We anticipate that further work based on this corpus will come from at least two other researchers: Anne Wichmann of the University of Central Lancashire, for research on the discourse functions of intonation, and Robin Lickley (latterly RA on the present project) for his ongoing research on disfluencies in spontaneous speech.

The research programme begun with the Greek project and continued in this project has also had a small but significant impact on the speech technology industry. Rule-based intonation synthesis research at Aculab in Milton Keynes (Monaghan et al. 2001) is exploring ways of modelling accents in terms of two alignment points instead of one, and doing away with independent slope and duration parameters. Commercial confidentiality makes it impossible for us to provide further detail on this research, or indeed to know very much about what other speech technology firms are doing.

8. Future Research Priorities

The question of vowel length in Scottish English is of both theoretical interest for phonology and practical interest for Scottish speech technology. It would be unfortunate if the work we have done in study 3 were not brought to a publishable conclusion. However, this is a too small a project to attract ESRC funding.

More generally, several of our findings point to the conclusion that the standard Pierrehumbert analylsis of English intonational phonology is ripe for a reexamination. If it is true that some of the differences between languages are a matter of "language-specific phonetic rules" rather than phonological differences, then any such reexamination should consider the place of putative phonological categories in the analysis. We believe it would be highly appropriate for ESRC to fund such research.

References

Anderson, Anne et al. (1991). The HCRC Map Task Corpus. Language and Speech, 34: 351-366
Beckman, Mary & Kingston, John (1990). Introduction to Kingston and Beckman (eds.), Papers in Laboratory Phonology I, Cambridge: Cambridge University Press, pp. 1-16.
Caspers, Johanneke (2000a). Melodic Characteristics of Backchannels in Dutch Map Task Dialogues. Proceedings of ICLSP, Beijing (no page numbers).
Caspers, Johanneke (2000b). Pitch Accents, Boundary Tones and Turn-taking in Dutch Map Task Dialogues. Proceedings of ICLSP, Beijing (no page numbers).
Caspers, Johanneke (2001). Testing the perceptual relevance of syntactic completion and melodic configuration for turn-taking in Dutch. Proceedings of Eurospeech, Scandinavia, vol. 2, pp 1395-1398.
Grice, Martine, Ladd, D. R. & Arvaniti, Amalia (2000). On the place of phrase accents in intonational phonology. Phonology 17:143-186.
Grice, Martine & Savino, Michelina (1995). Low tone versus "sag" in Bari Italian intonation; a perceptual experiment. Proceedings of ICPhS 13, Stockholm, vol. 4: 658-661.
Monaghan, Alex et al. (2001). Multilingual TTS for Computer Telephony: The Aculab Approach. Proceedings of Eurospeech, Scandinavia, vol. 1, pp. 513-516
Pierrehumbert, Janet, Beckman, Mary, & Ladd. D. R. (2000). Conceptual foundations of phonology as a laboratory science. In N. Burton-Roberts, P. Carr, G. J. Docherty (eds.), Phonological Knowledge: Conceptual and Empirical Issues, Oxford University Press, pp. 273-303.
Prieto, Pilar, van Santen, Jan & Hirschberg, Julia (1995). Tonal alignment patterns in Spanish. Journal of Phonetics 23, 429-451.
Silverman, Kim & Pierrehumbert, Janet (1990). The timing of prenuclear high accents in English. In J. Kingston and M. Beckman (eds.), Papers in Laboratory Phonology I, Cambridge: Cambridge University Press, pp. 72-106.