Tae-Yeoub Jang

A two-level morphological analysis of Korean

In this paper, I describe a method of automatic morphological analysis of Korean words using two-level rules of phonology and morphology (Koskenniemi 1983a).

Morphological analysis is useful in order to improve the quality and efficiency of both recognition and synthesis systems of Korean speech as well as syntactic parsers. For recognition, the problem of having to list a large number of same-rooted words in the lexicon and consequently the vocabulary size being unreasonably huge will be resolved. For preprocessing of synthesis, a fairly large amount of data which used to be regarded as exceptions can now be handled within usual letter-to-sound rules.

The "PC-Kimmo" package (Unix Version 2.10), a two-level morphology tool, was used to compile rules and to test the performance. Three main parts, lexicon, rules, word grammar, are built independently and are loaded to work interactively. The lexicon is designed to contain some selected words necessary to check the performance along with other words from a few randomly extracted newspaper articles. Rules are first written by hand and then compiled to have the format to work on a finite state machine. In the word grammar module, morpheme sequences are constrained based on the features specified on each item of the lexicon.

All the words within the corpus were correctly analysed and the experiment is now being run with more data along with relevant modification of the rules. It also turned out that a rule ordering conflict of Korean is resolved thanks to the nature of simultaneous application of two-level rules.

To download this paper, please return to Proceedings of the 1998 Postgraduate Conference