Hostname: page-component-7c8c6479df-8mjnm Total loading time: 0 Render date: 2024-03-29T09:00:20.430Z Has data issue: false hasContentIssue false

Combining Classifiers for word sense disambiguation

Published online by Cambridge University Press:  22 January 2003

RADU FLORIAN
Affiliation:
Department of Computer Science and Center for Language and Speech Processing Johns Hopkins University, MD 21218, USA e-mail: rflorian@cs.jhu.edu, silviu@cs.jhu.edu, cschafer@cs.jhu.edu, yarowsky@cs.jhu.edu
SILVIU CUCERZAN
Affiliation:
Department of Computer Science and Center for Language and Speech Processing Johns Hopkins University, MD 21218, USA e-mail: rflorian@cs.jhu.edu, silviu@cs.jhu.edu, cschafer@cs.jhu.edu, yarowsky@cs.jhu.edu
CHARLES SCHAFER
Affiliation:
Department of Computer Science and Center for Language and Speech Processing Johns Hopkins University, MD 21218, USA e-mail: rflorian@cs.jhu.edu, silviu@cs.jhu.edu, cschafer@cs.jhu.edu, yarowsky@cs.jhu.edu
DAVID YAROWSKY
Affiliation:
Department of Computer Science and Center for Language and Speech Processing Johns Hopkins University, MD 21218, USA e-mail: rflorian@cs.jhu.edu, silviu@cs.jhu.edu, cschafer@cs.jhu.edu, yarowsky@cs.jhu.edu

Abstract

Classifier combination is an effective and broadly useful method of improving system performance. This article investigates in depth a large number of both well-established and novel classifier combination approaches for the word sense disambiguation task, studied over a diverse classifier pool which includes feature-enhanced Naïve Bayes, Cosine, Decision List, Transformation-based Learning and MMVC classifiers. Each classifier has access to the same rich feature space, comprised of distance weighted bag-of-lemmas, local ngram context and specific syntactic relations, such as Verb-Object and Noun-Modifier. This study examines several key issues in system combination for the word sense disambiguation task, ranging from algorithmic structure to parameter estimation. Experiments using the standard SENSEVAL2 lexical-sample data sets in four languages (English, Spanish, Swedish and Basque) demonstrate that the combination system obtains a significantly lower error rate when compared with other systems participating in the SENSEVAL2 exercise, yielding state-of-the-art performance on these data sets.

Type
Research Article
Copyright
2002 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)