Active learning and logarithmic opinion pools for HPSG parse selection

JASON BALDRIDGE; MILES OSBORNE

doi:10.1017/S1351324906004396

Active learning and logarithmic opinion pools for HPSG parse selection

Published online by Cambridge University Press: 01 April 2008

JASON BALDRIDGE and

MILES OSBORNE

Show author details

JASON BALDRIDGE: Affiliation:
Department of Linguistics, University of Texas at Austin, Austin, TX 78712, USA e-mail: jbaldrid@mail.utexas.edu
MILES OSBORNE: Affiliation:
School of Informatics, University of Edinburgh, Edinburgh EH8 9LW, UK e-mail: miles@inf.ed.ac.uk

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

For complex tasks such as parse selection, the creation of labelled training sets can be extremely costly. Resource-efficient schemes for creating informative labelled material must therefore be considered. We investigate the relationship between two broad strategies for reducing the amount of manual labelling necessary to train accurate parse selection models: ensemble models and active learning. We show that popular active learning methods for reducing annotation costs can be outperformed by instead using a model class which uses the available labelled data more efficiently. For this, we use a simple type of ensemble model called the Logarithmic Opinion Pool (LOP). We furthermore show that LOPs themselves can benefit from active learning. As predicted by a theoretical explanation of the predictive power of LOPs, a detailed analysis of active learning using LOPs shows that component model diversity is a strong predictor of successful LOP performance. Other contributions include a novel active learning method, a justification of our simulation studies using timing information, and cross-domain verification of our main ideas using text classification.

Type: Papers
Information: Natural Language Engineering , Volume 14 , Issue 2 , April 2008 , pp. 191 - 222

DOI: https://doi.org/10.1017/S1351324906004396 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2006

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abe, N. and Mamitsuka, H. (1998) Query learning strategies using boosting and bagging. Proceedings of the 15th International Conference on Machine Learning, pp. 1–10.Google Scholar

Argamon-Engelson, S. and Dagan, I. (1999) Committee-based sample selection for probabilistic classifiers. Journal of Artificial Intelligence Research, 11: 335–360.Google Scholar

Baldridge, J. and Osborne, M. (2003) Active learning for HPSG parse selection. Proceedings of the 7th Conference on Natural Language Learning, Edmonton, Canada.Google Scholar

Baldridge, J. and Osborne, M. (2004) Active Learning and the Total Cost of Annotation. Proceedings of EMNLP 2004, pp. 9–16, Barcelona.Google Scholar

Baram, Y.El-Yaniv, R. and Luz, K. (2003) Online choice of active learning algorithms. Proceedings of the 20th International Conference on Machine Learning, pp. 19–26, Washington.Google Scholar

Bender, E. M., Flickinger, D. and Oepen, S. (2002) An open-source starter-kit for the rapid development of cross-linguistically consistent broad-coverage precision grammars. In: J. Carroll, N. Oostdijk and R. Sutcliffe, editors, Proceedings of the Workshop on Grammar Engineering and Evaluation at the 19th International Conference on Computational Linguistics, pp. 8–14, Taipei, Taiwan.Google Scholar

Bond, F., Fujita, S., Hashimoto, C., Kasahara, K., Nariyama, S.Nichols, E.Otani, A.Tanaka, T. and Amano, S. (2004) The Hinoki Treebank: A treebank for text understanding. Proceedings of the 1st International Joint Conference on Natural Language Processing, pp. 7–10, Sanya City, Hainan Island, China.Google Scholar

Carter, D. (1997) The Treebanker. a tool for supervised training of parsed corpora. Proceedings of the Workshop on Computational Environments for Grammar Development and Linguistic Engineering, pp. 9–15, Madrid, Spain.Google Scholar

Castelli, V. and Cover, T. M. (1996) On the Exponential Value of Labeled and Unlabeled Samples in Pattern Recognition with an Unknown Mixing Parameter. Pattern Recognition Letters, 16: 105–111.CrossRef Google Scholar

Charniak, E. and Johnson, M. (2005) Course to fine n-best parsing and maxent discriminative reranking. Proceedings of the 43th Meeting of the Association for Computational Linguistics, pp. 173–180, Ann Arbor, MI.Google Scholar

Cohn, D. A., Ghahramani, Z. and Jordan, M. I. (1995) Active learning with statistical models. In: Tesauro, G.Touretzky, D. and Leen, T., editors, Advances in Neural Information Processing Systems, volume 7, pp. 705–712. MIT Press.Google Scholar

Collins, M. (2000) Discriminative reranking for natural language parsing. n Proceedings of ICML 2000.Google Scholar

Collobert, R. and Bengio, S. (2001) SVMTorch: Support Vector Machines for large-scale regression problems. Machine Learning Research, 1: 143–160.Google Scholar

Copestake, A., Lascarides, A. and Flickinger, D. (2001) An algebra for semantic construction in constraint-based grammars. Proceedings of the 39th Annual Meeting of the ACL, pp. 132–139, Toulouse, France.CrossRef Google Scholar

Flickinger, D. (2000) On building a more efficient grammar by exploiting types. Natural Language Engineering, 6 (1): 15–28. Special Issue on Efficient Processing with HPSG.CrossRef Google Scholar

Freund, Y., Seung, H. S., Shamir, E. and Tishby, N. (1997) Selective sampling using the query by committee algorithm. Machine Learning, 28 (2-3): 133–168.Google Scholar

Geman, S., Bienenstock, E. and Doursat, R. (1992) Neural networks and the bias/variance dilemma. Neural Comput. 4 1: 1–58.Google Scholar

Geman, S. and Johnson, M. (2002) Dynamic programming for parsing and estimation of stochastic unification-based grammars. Proceedings of the 40^th Annual Meeting of the ACL, pp. 279–286, Philadelphia, PA.Google Scholar

Hellan, L. and Haugereid, P. (2003) The NorSource grammar – an exercise in the Matrix grammar building design. Proceedings of Workshop on Multilingual Grammar Engineering, ESSLLI 2003, Wein.Google Scholar

Heskes, T. (1998) Selecting weighting factors in logarithmic opinion pools. In: Jordan, M. I.Kearns, M. J. and Solla, S. A., editors, Advances in Neural Information Processing Systems, volume 10, pp. 266–272. MIT Press.Google Scholar

Hinton, G. E. (1999) Products of experts. Proceedings of the 9th Int. Conference on Artificial Neural Networks, pp. 1–6.Google Scholar

Hwa, R., Osborne, M., Sarkar, A. and Steedman, M. (2003) Corrected Co-training for Statistical Parsers. Proceedings of the ICML Workshop “The Continuum from Labeled to Unlabeled Data”, pp. 95–102.Google Scholar

Hwa, R. (2000) Sample selection for statistical grammar induction. Proceedings of the 2000 Joint SIGDAT Conference on EMNLP and VLC, pp. 45–52, Hong Kong, China.CrossRef Google Scholar

Johannessen, J. B. and Nygaaard, L. (2004) Oslo-skogen. En trebank for norsk. Rapport fra det 10. mote om norsk sprak, Kristiansand, Norway.Google Scholar

Johnson, M., Geman, S., Cannon, S., Chi, Z. and Riezler, S. (1999) Estimators for Stochastic “Unification-Based” Grammars. Proceedings of the 37th Annual Meeting of the ACL, pp. 535–541.Google Scholar

Kohavi, R. and Wolpert, D. (1996) Bias plus variance decomposition for zero-one loss functions. Proceedings of the 13th International Conference on Machine Learning, pp. 275–283, Bari. Morgan Kaufmann.Google Scholar

Kordoni, V. and Neu, J. (2003) Deep grammar development for Modern Greek. Proceedings of the ESSLLI Workshop on Ideas and Strategies for Multilingual Grammar Development, pp. 65–72, Vienna, Austria.Google Scholar

Krogh, A. and Vedelsby, J. (1995) Neural network ensembles, cross validation, and active learning. In: Tesauro, G.Touretzky, D. and Leen, T., editors, Advances in Neural Information Processing Systems, volume 7, pp. 231–238. MIT Press.Google Scholar

Lewis, D. D. and Gale, W. A. (1994) A sequential algorithm for training text classifiers. Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3–12.Google Scholar

Malouf, R. and van Noord, G. (2004) Wide Coverage Parsing with Stochastic Attribute Value Grammars. Proceedings of the 1st International Joint Conference on Natural Language Processing Workshop: Beyond shallow analyses – Formalisms and statistical modeling for deep analyses, Sanya City, Hainan Island, China.Google Scholar

Malouf, R. (2002) A comparison of algorithms for maximum entropy parameter estimation. Proceedings of the Sixth Workshop on Natural Language Learning, pp. 49–55, Taipei, Taiwan.CrossRef Google Scholar

McCallum, A. and Nigam, K. (1998) Employing EM and pool-based active learning for text classification. Proceedings of the International Conference on Machine Learning, pp. 350–358.Google Scholar

Melville, P. and Mooney, R. J. (2004) Diverse ensembles for active learning. Proceedings of the 21st International Conference on Machine Learning, pp. 584–591, Banff, Canada.Google Scholar

Ngai, G. and Yarowsky, D. (2000) Rule Writing or Annotation: Cost-efficient Resource Usage for Base Noun Phrase Chunking. Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, pp. 117–125, Hong Kong.Google Scholar

Oepen, S., Toutanova, K., Shieber, S., Manning, C., Flickinger, D. and Brants, T. (2002) The LinGO Redwoods Treebank: Motivation and preliminary applications. Proceedings of the 19th International Conference on Computational Linguistics, pp. 1253–1257, Taipei, Taiwan.Google Scholar

Osborne, M. and Baldridge, J. (2004) Ensemble-based active learning for parse selection. In: Dumais, S.Marcu, D. and Roukos, S., editors, HLT-NAACL 2004: Main Proceedings, pp. 89–96, Boston, MA.Google Scholar

Osborne, M. (2000) Estimation of Stochastic Attribute-Value Grammars using an Informative Sample. The 18^th International Conference on Computational Linguistics, pp. 586–592, Saarbrücken.CrossRef Google Scholar

Osborne, M. (2002) Shallow parsing using noisy and non-stationary training material. Journal of Machine Learning Research, 2: 551–558.Google Scholar

Pereira, F., Tishby, N. and Lee, L. (1993) Distributional clustering of English words. Proceedings of the Annual Meeting of the ACL, pp. 183–190.Google Scholar

Riezler, S., Prescher, D., Kuhn, J. and Johnson, M. (2000) Lexicalized stochastic modeling of constraint-based grammars using log-linear measures and EM training. Proceedings of the 38th Annual Meeting of the ACL, Hong Kong.CrossRef Google Scholar

Roy, N. and McCallum, A. (2001) Toward optimal active learning through sampling estimation of error reduction. Proceedings of the 18th International Conference on Machine Learning, pp. 441–448. Morgan Kaufmann.Google Scholar

Saar-Tsechansky, M. and Provost, F. (2004) Active sampling for class probability estimation and ranking. Machine Learning, 54 2: 153–178.Google Scholar

Seung, H. S., Opper, M. and Sompolinsky, H. (1992) Query by committee. Computational Learning Theory, 287–294.Google Scholar

Siegel, M. (2000) HPSG Analysis of Japanese. In: W. Wahlster, editor, Verbmobil: Foundations of Speech-to-Speech Translation, pp. 264–279. Springer.Google Scholar

Smith, A., Cohn, T. and Osborne, M. (2005) Logarithmic opinion pools for conditional random fields. Proceedings of ACL 2005, pp. 18–25, Ann Arbor, MI.Google Scholar

Tanaka, T., Bond, F., Oepen, S. and Fujita, S. (2005) High precision treebanking: Blazing useful trees using pos information. Proceedings of the 43th Meeting of the Association for Computational Linguistics, pp. 330–337, Ann Arbor, MI.CrossRef Google Scholar

Tang, M., Luo, X. and Roukos, S. (2002) Active Learning for Statistical Natural Language Parsing. Proceedings of the 40^th Annual Meeting of the ACL, pp. 120–127, Philadelphia, PA.CrossRef Google Scholar

Thompson, C. R., Califf, M. E. and Mooney, R. J. (1999) Active learning for natural language parsing and information extraction. Proceedings of the 16th International Conference on Machine Learning, pp. 406–414. Morgan Kaufmann.Google Scholar

Tong, S. and Koller, D. 2000. Support vector machine active learning with applications to text classification. In: P. Langley, editor, Proceedings of the 17th International Conference on Machine Learning, pp. 999–1006, Stanford, US. Morgan Kaufmann.Google Scholar

Toutanova, K., Markova, P. and Manning, C. (2004) The leaf projection path view of parse trees: Exploring string kernels for HPSG parse selection. Proceedings of EMNLP 2004, pp. 166–173, Barcelona.Google Scholar

Toutanova, K. and Manning, C. (2002) Feature selection for a rich HPSG grammar using decision trees. Proceedings of the 6th Conference on Natural Language Learning, Taipei, Taiwan.Google Scholar

Article contents

Active learning and logarithmic opinion pools for HPSG parse selection

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests