A corpus-based bootstrapping algorithm for Semi-Automated semantic lexicon construction

ELLEN RILOFF; JESSICA SHEPHERD

doi:10.1017/S1351324999002235

Abstract

Many applications need a lexicon that represents semantic information but acquiring lexical information is time consuming. We present a corpus-based bootstrapping algorithm that assists users in creating domain-specific semantic lexicons quickly. Our algorithm uses a representative text corpus for the domain and a small set of ‘seed words’ that belong to a semantic class of interest. The algorithm hypothesizes new words that are also likely to belong to the semantic class because they occur in the same contexts as the seed words. The best hypotheses are added to the seed word list dynamically, and the process iterates in a bootstrapping fashion. When the bootstrapping process halts, a ranked list of hypothesized category words is presented to a user for review. We used this algorithm to generate a semantic lexicon for eleven semantic classes associated with the MUC-4 terrorism domain.

Footnotes

This research is supported in part by the National Science Foundation under grants IRI-9509820 and IRI-9704240.

Crossref Citations

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Lafourcade, Mathieu 2002. Advances in Object-Oriented Information Systems. Vol. 2426, Issue. , p. 84.

Avancini, Henri Lavelli, Alberto Sebastiani, Fabrizio and Zanoli, Roberto 2006. Automatic expansion of domain-specific lexicons by term categorization. ACM Transactions on Speech and Language Processing, Vol. 3, Issue. 1, p. 1.

Kwong, Oi Yee and Tsou, Benjamin K. 2006. Advances in Natural Language Processing. Vol. 4139, Issue. , p. 322.

Nirenburg, Sergei Oates, Tim and English, Jesse 2007. Learning by Reading by Learning to Read. p. 694.

Zhou, Lina 2007. Ontology learning: state of the art and open issues. Information Technology and Management, Vol. 8, Issue. 3, p. 241.

Nguyen, Thao Pham Thanh Hayashi, Takahiro Onai, Rikio Nishioka, Yuhei Takenaka, Takamasa and Mori, Masaya 2009. A New Minimally Supervised Learning Method for Semantic Term Classification - Experimental Results on Classifying Ratable Aspects Discussed in Customer Reviews. p. 43.

Miłkowski, Marcin 2010. Developing an open‐source, rule‐based proofreading tool. Software: Practice and Experience, Vol. 40, Issue. 7, p. 543.

Peirsman, Yves and Padó, Sebastian 2011. Semantic relations in bilingual lexicons. ACM Transactions on Speech and Language Processing, Vol. 8, Issue. 2, p. 1.

Cai, Dongfeng Ding, Changlin Zuo, Junjun and Bai, Yu 2012. A semi-supervised learning method for Names of Traditional Chinese Prescriptions and Drugs recognition. p. 1.

Yatim, Md. Azza F. Wardhana, Yulistiyan Kamal, Ahmad Soroinda, Anandra A. R. Rachim, Febryan and Wonggo, M. Ismail 2016. A corpus-based lexicon building in Indonesian political context through Indonesian online news media. p. 347.

Wang, Xiaolan Feng, Aaron Golshan, Behzad Halevy, Alon Mihaila, George Oiwa, Hidekazu and Tan, Wang-Chiew 2018. Scalable semantic querying of text. Proceedings of the VLDB Endowment, Vol. 11, Issue. 9, p. 961.

Chen, Bingyang Fan, Lulu and Fu, Xiaobao 2019. Sentiment Classification of Tourism Based on Rules and LDA Topic Model. p. 471.

Mpouli, Suzanne Beigbeder, Michel and Largeron, Christine 2020. Lexifield: a system for the automatic building of lexicons by semantic expansion of short word lists. Knowledge and Information Systems, Vol. 62, Issue. 8, p. 3181.

Kalra, Vandana Kashyap, Indu and Kaur, Harmeet 2022. Generation of domain-specific vocabulary set and classification of documents: weight-inclusion approach. International Journal of Information Technology, Vol. 14, Issue. 1, p. 275.

Article contents

A corpus-based bootstrapping algorithm for Semi-Automated semantic lexicon construction

Abstract

Access options

Footnotes

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Article contents

A corpus-based bootstrapping algorithm for Semi-Automated semantic lexicon construction

Abstract

Access options

Footnotes

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests