Adapting SVM for data sparseness and imbalance: a case study in information extraction

YAOYONG LI; KALINA BONTCHEVA; HAMISH CUNNINGHAM

doi:10.1017/S1351324908004968

Adapting SVM for data sparseness and imbalance: a case study in information extraction

Published online by Cambridge University Press: 01 April 2009

YAOYONG LI ,

KALINA BONTCHEVA and

HAMISH CUNNINGHAM

Show author details

YAOYONG LI: Affiliation:
Department of Computer Science, The University of SheffieldRegent Court, 211 Portobello, Sheffield S1 4DP, UK e-mail: yaoyong@dcs.shef.ac.uk, kalina@dcs.shef.ac.uk, hamish@dcs.shef.ac.uk
KALINA BONTCHEVA: Affiliation:
Department of Computer Science, The University of SheffieldRegent Court, 211 Portobello, Sheffield S1 4DP, UK e-mail: yaoyong@dcs.shef.ac.uk, kalina@dcs.shef.ac.uk, hamish@dcs.shef.ac.uk
HAMISH CUNNINGHAM: Affiliation:
Department of Computer Science, The University of SheffieldRegent Court, 211 Portobello, Sheffield S1 4DP, UK e-mail: yaoyong@dcs.shef.ac.uk, kalina@dcs.shef.ac.uk, hamish@dcs.shef.ac.uk

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Support Vector Machines (SVM) have been used successfully in many Natural Language Processing (NLP) tasks. The novel contribution of this paper is in investigating two techniques for making SVM more suitable for language learning tasks. Firstly, we propose an SVM with uneven margins (SVMUM) model to deal with the problem of imbalanced training data. Secondly, SVM active learning is employed in order to alleviate the difficulty in obtaining labelled training data. The algorithms are presented and evaluated on several Information Extraction (IE) tasks, where they achieved better performance than the standard SVM and the SVM with passive learning, respectively. Moreover, by combining SVMUM with the active learning algorithm, we achieve the best reported results on the seminars and jobs corpora, which are benchmark data sets used for evaluation and comparison of machine learning algorithms for IE. In addition, we also evaluate the token based classification framework for IE with three different entity tagging schemes. In comparison to previous methods dealing with the same problems, our methods are both effective and efficient, which are valuable features for real-world applications. Due to the similarity in the formulation of the learning problem for IE and for other NLP tasks, the two techniques are likely to be beneficial in a wide range of applications1.

Type: Papers
Information: Natural Language Engineering , Volume 15 , Issue 2 , April 2009 , pp. 241 - 271

DOI: https://doi.org/10.1017/S1351324908004968 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2008

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Califf, M. E. 1998. Relational Learning Techniques for Natural Language Information Extraction. Ph.D. thesis, University of Texas at Austin.Google Scholar

Campbell, C., Cristianini, N., and Smola, A. 2000. Query Learning with Large Margin Classifiers. In Proceedings of the Seventeenth International Conference on Machine Learning (ICML-00). Morgan Kaufmann.Google Scholar

Cancedda, N., Cesa-Bianchi, N., Conconi, A., Gentile, C., Goutte, C.Graepel, T., Li, Y., Renders, J. M., and Shawe-Taylor, J. 2003. Kernel methods for document filtering. In Voorhees, E. M. and Buckland, Lori P., (editors, Proceedings of The Eleventh Text Retrieval Conference (TREC 2002). The NIST.Google Scholar

Carreras, X., Màrquez, L., and Padró, L. 2003. Learning a perceptron-based named entity chunker via online recognition feedback. In Proceedings of CoNLL-2003, pages 156–159. Edmonton, Canada.CrossRef Google Scholar

Chapelle, O., Weston, J., Bottou, L., and Vapnik, V. 2000. Vicinal risk minimization. In NIPS, pp. 416–422. MIT Press.Google Scholar

Chieu, H. L., and Ng, H. T. 2002a. A maximum entropy approach to information extraction from semi-structured and free text. In Proceedings of the Eighteenth National Conference on Artificial Intelligence, pp. 786–791. MIT Press.Google Scholar

Chieu, H. L., and Ng, H. T. 2002b. Named entity recognition: A maximum entropy approach using global information. In Proceedings of the 19th International Conference on Computational Linguistics (COLING'02), Taipei, Taiwan.CrossRef Google Scholar

Ciravegna, F. 2001. (LP)², an adaptive algorithm for information extraction from web-related texts. In Proceedings of the IJCAI-2001 Workshop on Adaptive Text Extraction and Mining, Seattle, WA.Google Scholar

Ciravegna, F., Dingli, A., Petrelli, D., and Wilks, Y. 2002. User-system cooperation in document annotation based on information extraction. In 13th International Conference on Knowledge Engineering and Knowledge Management (EKAW02), pp. 122–137, Siguenza, Spain.CrossRef Google Scholar

Collobert, R., Sinz, F., Weston, J., and Bottou, L. 2006. Large scale transductive SVMs. Journal of Machine Learning Research, 7: 1687–1712.Google Scholar

Crammer, K., and Singer, Y. 2001. On the algorithmic implementation of multi-class Kernel-based vector machines. Journal of Machine Learning Research, 2: 265–292.Google Scholar

Cristianini, N., and Shawe-Taylor, J. 2000. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press.CrossRef Google Scholar

Cumby, C., and Roth, D. 2003. On Kernel methods for relational learning. In Proceedings of the 10th International Conference on Machine Learning (ICML-2003), pp. 107–114. Morgan Kaufmann.Google Scholar

Cunningham, H., Maynard, D., Bontcheva, K., and Tablan, V. 2002. GATE: a framework and graphical development environment for robust NLP tools and applications. In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL'02). Springer.Google Scholar

Day, D., Aberdeen, J., Hirschman, L., Kozierok, R., Robinson, P., and Vilain, M. 1997. Mixed-initiative development of language processing systems. In Proceedings of the 5th Conference on Applied Natural Language Processing (ANLP-97).CrossRef Google Scholar

Finn, A., and Kushmerick, N. 2003. Active learning selection strategies for information extraction. In ECML-03 Workshop on Adaptive Text Extraction and Mining.Google Scholar

Florian, R., Ittycheriah, A., Jing, H., and Zhang, T. 2003. Named entity recognition through classifier combination. In Proceedings of CoNLL-2003, pp. 168–171. Edmonton, Canada.CrossRef Google Scholar

Freigtag, D., and McCallum, A. K. 1999. Information extraction with HMMs and shrinkage. In Proceesings of Workshop on Machine Learnig for Information Extraction, pp. 31–36.Google Scholar

Freitag, D. 1998. Machine Learning for Information Extraction in Informal Domains. Ph.D. thesis, Carnegie Mellon University.Google Scholar

Freitag, D., and Kushmerick, N. 2000. Boosted wrapper induction. In Proceedings of AAAI 2000. MIT Press.Google Scholar

Gimenez, J., and Marquez, L. 2003. Fast and accurate part-of-speech tagging: the SVM approach revisited. In Proceedings of the International Conference RANLP-2003 (Recent Advances in Natural Language Processing), pp. 158–165. John Benjamins Publishers.CrossRef Google Scholar

Hacioglu, K., Pradhan, S., Ward, W., Martin, J. H., and Jurafsky, D. 2004. Semantic role labeling by tagging syntactic chunks. In Proceedings of CoNLL-2004, pp. 110–113. Boston, MA, USA.Google Scholar

Hsu, C.-W., and Lin, C.-J. 2002. A comparison of methods for multi-class support vector machines. IEEE Transactions on Neural Networks, 13: 415–425.Google Scholar

Hwa, R. 2004. Sample Selection for Statistical Parsing. Computational Linguistics, 30 (3): 253–276.CrossRef Google Scholar

Isozaki, H., and Kazawa, H. 2002. Efficient support vector classifiers for named entity recognition. In Proceedings of the 19th International Conference on Computational Linguistics (COLING'02), pp. 390–396, Taipei, Taiwan.CrossRef Google Scholar

Jelinek, F. 1997. Statistical Methods for Speech Recognition. MIT Press, Cambridge, MA.Google Scholar

Joachims, T. 1999a. Making large-scale SVM learning practical. In Schölkopf, B., Burges, C. J. C., and Smola, A. J., (eds.), Advances in Kernel Methods – Support Vector Learning, pp. 169–184. MIT Press.Google Scholar

Joachims, T. 1999b. Transductive inference for text classification using support vector machines. In Proceedings of the 16th International Conference on Machine Learning (ICML-99). Morgan Kaufmann.Google Scholar

Jones, R. 2005. Learning to Extract Entities from Labelled and Unlabelled Text. Ph.D. thesis, School of Computer Science, Carnegie Mellon University.Google Scholar

Kudo, T., and Matsumoto, Y. 2000. Use of support vector learning for chunk identification. In Proceedings of Sixth Conference on Computational Natural Language Learning (CoNLL-2000). Lisbon, Portugal.CrossRef Google Scholar

Kudoh, T., and Matsumoto, Y. 2000. Japanese dependency structure analysis based on support vector machines. In 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora. Association for Computational Linguistics.CrossRef Google Scholar

Lee, Y., Ng, H., and Chia, T. 2004. Supervised word sense disambiguation with support vector machines and multiple knowledge sources. In Proceedings of SENSEVAL-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, pp. 137–140. Association for Computational Linguistics.Google Scholar

Lewis, D. D., Yang, Y., Rose, T. G., and Li, F. 2004. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5: 361–397.Google Scholar

Li, Y., Bontcheva, K., and Cunningham, H. 2005a. SVM based learning system for information extraction. In Niranjan, M.Winkler, J. and Lawerence, N., (eds.), Deterministic and Statistical Methods in Machine Learning, LNAI 3635, pp. 319–339. Springer Verlag.CrossRef Google Scholar

Li, Y., Bontcheva, K., and Cunningham, ,. 2005b. Using uneven margins SVM and perceptron for information extraction. In Proceedings of Ninth Conference on Computational Natural Language Learning (CoNLL-2005). Association for Computational Linguistics.CrossRef Google Scholar

Li, Y., and Shawe-Taylor, J. 2003. The SVM with uneven margins and Chinese document categorization. In Proceedings of The 17th Pacific Asia Conference on Language, Information and Computation (PACLIC17), Singapore.Google Scholar

Mayfield, J., McNamee, P., and Piatko, C. 2003. Named entity recognition using hundreds of thousands of features. In Proceedings of CoNLL-2003, pp. 184–187. Edmonton, Canada.CrossRef Google Scholar

Morik, K., Brockhausen, P. and Joachims, T. 1999. Combining statistical learning with a knowledge based approach – a case study in intensive care monitoring. In Proceedings of the 16th International Conference on Machine Learning (ICML-99), pages 268–277, San Francisco, CA.Google Scholar

Nakagawa, T., Kudoh, T., and Matsumoto, Y. 2001. Unknown word guessing and part-of-speech tagging using support vector machines. In Proceedings of the Sixth Natural Language Processing Pacific Rim Symposium. Tokyo, Japan.Google Scholar

Ngai, G., and Yarowsky, D. 2000. Rule Writing or Annotation: Cost-efficient Resource Usage for Base Noun Phrase Chunking. In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, pages 117–125, Hongkong.CrossRef Google Scholar

Rifkin, R., and Klautu, A. 2004. In defense of one-vs-all classification. Journal of Machine Learning Research, 5: 101–141.Google Scholar

Roth, D., and Yih, W. T. 2001. Relational learning via propositional algorithms: an information extraction case study. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI), pp. 1257–1263. Springer.Google Scholar

Sassano, M. 2002. An empirical study of active learning with support vector machines for Japanese word segmentation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL). Association for Computational Linguistics.CrossRef Google Scholar

Schohn, G., and Cohn, D. 2000. Less is more: active learning with support vector machines. In Proceedings of the Seventeenth International Conference on Machine Learning (ICML-00). Morgan Kaufmann.Google Scholar

Shawe-Taylor, J., and Cristianini, N. 1999. Margin distribution bounds on generalization. In Proceedings of European Conference on Computational Learning Theory, EuroCOLT'99, pp. 263–273. Springer.CrossRef Google Scholar

Sitter, A. De, and Daelemans, W. 2003. Information extraction via double classification. In Proceedings of ECML/PRDD 2003 Workshop on Adaptive Text Extraction and Mining (ATEM 2003), Cavtat-Dubrovnik, Croatia.Google Scholar

Soderland, S. 1999. Learning information extraction rules for semi-structured and free text. Machine Learning, 34 (1): 233–272.CrossRef Google Scholar

Tjong Kim Sang, E. F., and Meulder, F. D. 2003. Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In Proceedings of CoNLL-2003, pages 142–147. Edmonton, Canada.CrossRef Google Scholar

Tong, S., and Koller, D. 2001. Support vector machine active learning with applications to text classification. Journal of Machine Learning Research, 2: 45–66.Google Scholar

Tsochantaridis, I., Hofmann, T., Joachims, T., and Altun, Y. 2004. Support vector machine learning for interdependent and structured output spaces. In Proceedings of the 21st International Conference on Machine Learning, Banff, Canada.CrossRef Google Scholar

Tur, G., Schapire, R. E. and Hakkani-Tur, D. 2003. Active learning for spoken language understanding. In Proceedings of 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 276–279. IEEE Press.CrossRef Google Scholar

Vapnik, V. 1998. Statistical Learning Theory. John Wiley & Sons.Google Scholar

Vlachos, A. 2004. Active Learning with Support Vector Machines. MSc thesis, University of Edinburgh.Google Scholar

Wu, T., and Pottenger, W. 2005. A semi-supervised active learning algorithm for information extraction from textual data. Journal of the American Society for Information Science and Technology, 56 (3): 258–271.CrossRef Google Scholar

Yamada, H., and Matsumoto, Y. 2003. Statistical dependency analysis with support vector machines. In The 8th International Workshop of Parsing Technologies (IWPT2003). Kluwer, Dordreht/Boston/London.Google Scholar

Yang, Y. 2001. A study on thresholding strategies for text categorization. In Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'01), pp. 137–145, New York, NY.CrossRef Google Scholar

Zhang, J., and Mani, I. 2003. kNN approach to unbalanced data distributions: a case study involving information extraction. In Proceedings of the ICML'2003 Workshop on Learning from Imbalanced Datasets. Association for Computing Machinery.Google Scholar

Zhou, G., Su, J., Zhang, J., and Zhang, M. 2005. Exploring various knowledge in relation extraction. In Proceedings of the 43rd Annual Meeting of the ACL, pp. 427–434. Association for Computational Linguistics.Google Scholar

Article contents

Adapting SVM for data sparseness and imbalance: a case study in information extraction

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests