Machine Learning and Phonological Classification

Moritz Neugebauer

Previous approaches to machine learning in the field of phonology focus on phonotactic rules or constraints depending on the theoretical framework. However, rather than exploring possible sound combinations within the syllable domain the focus in this paper is on the segment. Here, we investigate possible well-formed combinations of phonological features within a segment. Therefore, our work is based on the assumption that speech sounds are composed of a finite set of phonological features and that feature combinations are partially predictable. In this paper, classes of speech sounds are presented as the result of generalizations over complex segmental feature structures. It is assumed that logical interdependencies between subsegmental articulatory features such as set relations can be extracted from representations which are multilinear in nature. Thus, the talk builds on a constraint-based model of speech recognition (cf. Carson-Berndsen 1998) which provides for the extracted featural information in a multilinear format. By constructing inheritance hierarchies over the set of given features, we provide for a backbone of the recognition system since underspecified feature slots can be overcome simply by reference to our previously built knowledge representation formalism. Even in cases where no single solution to an underspecified representation can be delivered, our approach still puts us in a position where we can compute the set of possible sounds. By this method, we reduce the set of possible well-formed feature structures significantly.