next up previous
Next: Accuracy measurement Up: Generating F0 contours for Previous: Speech Database

Experiment

For each syllable in the database which carries a Tilt label (a, b, c, sil), a set of 40 features was extracted for testing. The features include the number of syllables, stressed syllables, and accented syllables preceding and succeeding the syllable within the phrase; distance, in syllables, from the previous and to the next event; the number of non-major phrase breaks since the last major break; onset and rhyme length ([10],[7]); percent of the syllable which is unvoiced and position of the syllable within a word (e.g. initial, final, medial). The features also include, with a two-syllable window on either side, accentedness, lexical stress, onset and coda classification (cf. [10]), [5]); Tilt event type (two event window, as well as two syllable), and phrase break values.

All of these features are available to the system at F0 generation time during synthesis. Accent, a simple binary feature, is assumed to have been predicted prior to F0 generation.

Once the features are extracted, a CART training algorithm [2] is used to create a decision tree for each parameter of each event type (twelve trees in total). These trees are used to predict the parameters of the Tilt events on a held out test set. The parameters are then used in the generation of intonation contours.





Kurt Dusterhoff
Tue Jul 1 17:33:41 BST 1997