Evaluating sense disambiguation across diverse parameter spaces

DAVID YAROWSKY; RADU FLORIAN

doi:10.1017/S135132490200298X

Abstract

This paper presents a comprehensive empirical exploration and evaluation of a diverse range of data characteristics which influence word sense disambiguation performance. It focuses on a set of six core supervised algorithms, including three variants of Bayesian classifiers, a cosine model, non-hierarchical decision lists, and an extension of the transformation-based learning model. Performance is investigated in detail with respect to the following parameters: (a) target language (English, Spanish, Swedish and Basque); (b) part of speech; (c) sense granularity; (d) inclusion and exclusion of major feature classes; (e) variable context width (further broken down by part-of-speech of keyword); (f) number of training examples; (g) baseline probability of the most likely sense; (h) sense distributional entropy; (i) number of senses per keyword; (j) divergence between training and test data; (k) degree of (artificially introduced) noise in the training data; (l) the effectiveness of an algorithm's confidence rankings; and (m) a full keyword breakdown of the performance of each algorithm. The paper concludes with a brief analysis of similarities, differences, strengths and weaknesses of the algorithms and a hierarchical clustering of these algorithms based on agreement of sense classification behavior. Collectively, the paper constitutes the most comprehensive survey of evaluation measures and tests yet applied to sense disambiguation algorithms. And it does so over a diverse range of supervised algorithms, languages and parameter spaces in single unified experimental framework.

Crossref Citations

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Xiaojie Wang and Matsumoto, Y. 2003. Chinese word sense disambiguation by combining pseudo training data. p. 138.

Kilgarriff, Adam and Koeling, Rob 2003. Computational Linguistics and Intelligent Text Processing. Vol. 2588, Issue. , p. 225.

Nica, Iulia Martí, Ma. Antònia Montoyo, Andrés and Vázquez, Sonia 2004. Computational Linguistics and Intelligent Text Processing. Vol. 2945, Issue. , p. 188.

Gliozzo, Alfio Strapparava, Carlo and Dagan, Ido 2004. Unsupervised and supervised exploitation of semantic domains in lexical disambiguation. Computer Speech & Language, Vol. 18, Issue. 3, p. 275.

Lind�n, Krister 2004. Evaluation of Linguistic Features for Word Sense Disambiguation with Self-Organized Document Maps. Computers and the Humanities, Vol. 38, Issue. 4, p. 417.

Seo, Hee-Cheol Hwang, Young-Sook and Rim, Hae-Chang 2004. Computational Linguistics and Intelligent Text Processing. Vol. 2945, Issue. , p. 176.

Wang, Xiaojie 2005. Modeling and Using Context. Vol. 3554, Issue. , p. 529.

Jinying Chen and Palmer, M. 2005. Clustering-based Feature Selection for Verb Sense Disambiguation. p. 36.

Saarikoski, Harri M. T. and Legrand, Steve 2006. Progress in Pattern Recognition, Image Analysis and Applications. Vol. 4225, Issue. , p. 864.

Edmonds, P. 2006. Encyclopedia of Language & Linguistics. p. 607.

Qin, Ying Zhang, Suxiang and Wang, Xiaojie 2006. A Practical Approach to Resolving Combination Ambiguity in Chinese Word Segmentation.

Saarikoski, Harri M. T. Legrand, Steve and Gelbukh, Alexander 2006. MICAI 2006: Advances in Artificial Intelligence. Vol. 4293, Issue. , p. 855.

Padó, Sebastian and Lapata, Mirella 2007. Dependency-Based Construction of Semantic Space Models. Computational Linguistics, Vol. 33, Issue. 2, p. 161.

Koeling, Rob McCarthy, Diana and Carroll, John 2007. Computational Linguistics and Intelligent Text Processing. Vol. 4394, Issue. , p. 241.

Màrquez, Lluís Escudero, Gerard Martínez, David and Rigau, German 2007. Word Sense Disambiguation. Vol. 33, Issue. , p. 167.

Biçici, Ergun 2007. Modeling and Using Context. Vol. 4635, Issue. , p. 82.

Agirre, Eneko and Stevenson, Mark 2007. Word Sense Disambiguation. Vol. 33, Issue. , p. 217.

Saarikoski, Harri M. T. Legrand, Steve and Gelbukh, Alexander 2007. Computational Linguistics and Intelligent Text Processing. Vol. 4394, Issue. , p. 253.

McCarthy, Diana Koeling, Rob Weeds, Julie and Carroll, John 2007. Unsupervised Acquisition of Predominant Word Senses. Computational Linguistics, Vol. 33, Issue. 4, p. 553.

Palmer, Martha Ng, Hwee Tou and Dang, Hoa Trang 2007. Word Sense Disambiguation. Vol. 33, Issue. , p. 75.

Download full list

Article contents

Evaluating sense disambiguation across diverse parameter spaces

Abstract

Access options

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Article contents

Evaluating sense disambiguation across diverse parameter spaces

Abstract

Access options

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests