Natural Language Engineering



A corpus-based approach for Korean nominal compound analysis based on linguistic and statistical information


JUNTAE YOON a1, KEY-SUN CHOI a1 and MANSUK SONG a2
a1 Center for Artificial Intelligence Research, Department of Computer Science, Korea Advanced Institute of Science and Technology, Taejon, Korea; e-mail: jtyoon@world.kaist.ac.kr, kschoi@world.kaist.ac.kr
a2 Department of Computer Science, Engineering College, Yonsei University Seoul, Korea; e-mail: mssong@december.yonsei.ac.kr

Abstract

The syntactic structure of a nominal compound must be analyzed first for its semantic interpretation. In addition, the syntactic analysis of nominal compounds is very useful for NLP application such as information extraction, since a nominal compound often has a similar linguistic structure with a simple sentence, as well as representing concrete and compound meaning of an object with several nouns combined. In this paper, we present a novel model for structural analysis of nominal compounds using linguistic and statistical knowledge which is coupled based on lexical information. That is, the syntactic relations defined between nouns (complement-predicate and modifier-head relation) are obtained from large corpora and again used to analyze the structures of nominal compounds and identify the underlying relations between nouns. Experiments show that the model gives good results, and can be effectively used for application systems which do not require deep semantic information.

(Received March 18 1999)
(Revised February 26 2001)