12 January 2010

Simone Ashby (Instituto de Linguistica Teorica e Computacional)

Reuse of Lexicographic Data for a Multipurpose Pronunciation Database and Phonetic Transcription Generator for Regional Variants of Portuguese

Reuse of lexicographic data has spawned a number of applications that draw from the grammatical and semantic information in these resources to enhance information retrieval, handle sense ambiguity, and develop question answering systems, among other goals. LUPo* is one of the first *speech*-dedicated applications to take full advantage of a collection of lexical resources as the basis for a text-to-speech system. Consisting of a pronunciation lexicon and rule system for generating accent-specific phonetic transcriptions for Portuguese, LUPo will attract a wider audience of pan Lusophone speakers to the online lexical database where it resides (i.e. the *Portal da Lingua Portuguesa* Website, hereafter referred to as the 'Portal'), whilst presenting the research community with a vast resource of Portuguese accent data, against which Portuguese speech applications may be evaluated, and phonological and diachronic theories tested. The project aims to model a multitude of Portuguese accents spanning Africa, Asia, Europe, and South America.

In this talk, I describe my research group's efforts to adapt Susan Fitt's Unisyn Lexicon for English to Portuguese, and take advantage of the Portal's relational structure and rich lexicographic content to create a more integrated and well informed system. LUPo (or the Portuguese Unisyn Lexicon) will capitalize on having direct access to mappings of European and Brazilian Portuguese spelling variants, part of speech information, etymological relationships, and a morphological parser. The end product will be a set of open-source tools for generating accent-specific output for individual lexical entries and, ultimately, multi-word texts.

Particular attention will be focused on the role of part of speech information and morpho-phonological boundaries in specifying the correct phones for a set of orthographic contexts that are particularly problematic for European Portuguese grapheme-to-phone conversions.

* LUPo is a three-year project supported by the Fundacao Para a Ciencia e a Tecnologia in Portugal.

[Back to the P-workshop top page]

owner-pworkshop@ling.ed.ac.uk