Hostname: page-component-76fb5796d-22dnz Total loading time: 0 Render date: 2024-04-25T13:44:47.347Z Has data issue: false hasContentIssue false

Active learning and logarithmic opinion pools for HPSG parse selection

Published online by Cambridge University Press:  01 April 2008

JASON BALDRIDGE
Affiliation:
Department of Linguistics, University of Texas at Austin, Austin, TX 78712, USA e-mail: jbaldrid@mail.utexas.edu
MILES OSBORNE
Affiliation:
School of Informatics, University of Edinburgh, Edinburgh EH8 9LW, UK e-mail: miles@inf.ed.ac.uk

Abstract

For complex tasks such as parse selection, the creation of labelled training sets can be extremely costly. Resource-efficient schemes for creating informative labelled material must therefore be considered. We investigate the relationship between two broad strategies for reducing the amount of manual labelling necessary to train accurate parse selection models: ensemble models and active learning. We show that popular active learning methods for reducing annotation costs can be outperformed by instead using a model class which uses the available labelled data more efficiently. For this, we use a simple type of ensemble model called the Logarithmic Opinion Pool (LOP). We furthermore show that LOPs themselves can benefit from active learning. As predicted by a theoretical explanation of the predictive power of LOPs, a detailed analysis of active learning using LOPs shows that component model diversity is a strong predictor of successful LOP performance. Other contributions include a novel active learning method, a justification of our simulation studies using timing information, and cross-domain verification of our main ideas using text classification.

Type
Papers
Copyright
Copyright © Cambridge University Press 2006

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abe, N. and Mamitsuka, H. (1998) Query learning strategies using boosting and bagging. Proceedings of the 15th International Conference on Machine Learning, pp. 1–10.Google Scholar
Argamon-Engelson, S. and Dagan, I. (1999) Committee-based sample selection for probabilistic classifiers. Journal of Artificial Intelligence Research, 11: 335360.Google Scholar
Baldridge, J. and Osborne, M. (2003) Active learning for HPSG parse selection. Proceedings of the 7th Conference on Natural Language Learning, Edmonton, Canada.Google Scholar
Baldridge, J. and Osborne, M. (2004) Active Learning and the Total Cost of Annotation. Proceedings of EMNLP 2004, pp. 9–16, Barcelona.Google Scholar
Baram, Y.El-Yaniv, R. and Luz, K. (2003) Online choice of active learning algorithms. Proceedings of the 20th International Conference on Machine Learning, pp. 19–26, Washington.Google Scholar
Bender, E. M., Flickinger, D. and Oepen, S. (2002) An open-source starter-kit for the rapid development of cross-linguistically consistent broad-coverage precision grammars. In: J. Carroll, N. Oostdijk and R. Sutcliffe, editors, Proceedings of the Workshop on Grammar Engineering and Evaluation at the 19th International Conference on Computational Linguistics, pp. 8–14, Taipei, Taiwan.Google Scholar
Bond, F., Fujita, S., Hashimoto, C., Kasahara, K., Nariyama, S.Nichols, E.Otani, A.Tanaka, T. and Amano, S. (2004) The Hinoki Treebank: A treebank for text understanding. Proceedings of the 1st International Joint Conference on Natural Language Processing, pp. 7–10, Sanya City, Hainan Island, China.Google Scholar
Carter, D. (1997) The Treebanker. a tool for supervised training of parsed corpora. Proceedings of the Workshop on Computational Environments for Grammar Development and Linguistic Engineering, pp. 9–15, Madrid, Spain.Google Scholar
Castelli, V. and Cover, T. M. (1996) On the Exponential Value of Labeled and Unlabeled Samples in Pattern Recognition with an Unknown Mixing Parameter. Pattern Recognition Letters, 16: 105111.CrossRefGoogle Scholar
Charniak, E. and Johnson, M. (2005) Course to fine n-best parsing and maxent discriminative reranking. Proceedings of the 43th Meeting of the Association for Computational Linguistics, pp. 173–180, Ann Arbor, MI.Google Scholar
Cohn, D. A., Ghahramani, Z. and Jordan, M. I. (1995) Active learning with statistical models. In: Tesauro, G.Touretzky, D. and Leen, T., editors, Advances in Neural Information Processing Systems, volume 7, pp. 705712. MIT Press.Google Scholar
Collins, M. (2000) Discriminative reranking for natural language parsing. n Proceedings of ICML 2000.Google Scholar
Collobert, R. and Bengio, S. (2001) SVMTorch: Support Vector Machines for large-scale regression problems. Machine Learning Research, 1: 143160.Google Scholar
Copestake, A., Lascarides, A. and Flickinger, D. (2001) An algebra for semantic construction in constraint-based grammars. Proceedings of the 39th Annual Meeting of the ACL, pp. 132–139, Toulouse, France.CrossRefGoogle Scholar
Flickinger, D. (2000) On building a more efficient grammar by exploiting types. Natural Language Engineering, 6 (1): 1528. Special Issue on Efficient Processing with HPSG.CrossRefGoogle Scholar
Freund, Y., Seung, H. S., Shamir, E. and Tishby, N. (1997) Selective sampling using the query by committee algorithm. Machine Learning, 28 (2-3): 133168.Google Scholar
Geman, S., Bienenstock, E. and Doursat, R. (1992) Neural networks and the bias/variance dilemma. Neural Comput. 4 1: 158.Google Scholar
Geman, S. and Johnson, M. (2002) Dynamic programming for parsing and estimation of stochastic unification-based grammars. Proceedings of the 40th Annual Meeting of the ACL, pp. 279–286, Philadelphia, PA.Google Scholar
Hellan, L. and Haugereid, P. (2003) The NorSource grammar – an exercise in the Matrix grammar building design. Proceedings of Workshop on Multilingual Grammar Engineering, ESSLLI 2003, Wein.Google Scholar
Heskes, T. (1998) Selecting weighting factors in logarithmic opinion pools. In: Jordan, M. I.Kearns, M. J. and Solla, S. A., editors, Advances in Neural Information Processing Systems, volume 10, pp. 266272. MIT Press.Google Scholar
Hinton, G. E. (1999) Products of experts. Proceedings of the 9th Int. Conference on Artificial Neural Networks, pp. 1–6.Google Scholar
Hwa, R., Osborne, M., Sarkar, A. and Steedman, M. (2003) Corrected Co-training for Statistical Parsers. Proceedings of the ICML Workshop “The Continuum from Labeled to Unlabeled Data”, pp. 95–102.Google Scholar
Hwa, R. (2000) Sample selection for statistical grammar induction. Proceedings of the 2000 Joint SIGDAT Conference on EMNLP and VLC, pp. 45–52, Hong Kong, China.CrossRefGoogle Scholar
Johannessen, J. B. and Nygaaard, L. (2004) Oslo-skogen. En trebank for norsk. Rapport fra det 10. mote om norsk sprak, Kristiansand, Norway.Google Scholar
Johnson, M., Geman, S., Cannon, S., Chi, Z. and Riezler, S. (1999) Estimators for Stochastic “Unification-Based” Grammars. Proceedings of the 37th Annual Meeting of the ACL, pp. 535–541.Google Scholar
Kohavi, R. and Wolpert, D. (1996) Bias plus variance decomposition for zero-one loss functions. Proceedings of the 13th International Conference on Machine Learning, pp. 275–283, Bari. Morgan Kaufmann.Google Scholar
Kordoni, V. and Neu, J. (2003) Deep grammar development for Modern Greek. Proceedings of the ESSLLI Workshop on Ideas and Strategies for Multilingual Grammar Development, pp. 65–72, Vienna, Austria.Google Scholar
Krogh, A. and Vedelsby, J. (1995) Neural network ensembles, cross validation, and active learning. In: Tesauro, G.Touretzky, D. and Leen, T., editors, Advances in Neural Information Processing Systems, volume 7, pp. 231238. MIT Press.Google Scholar
Lewis, D. D. and Gale, W. A. (1994) A sequential algorithm for training text classifiers. Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 3–12.Google Scholar
Malouf, R. and van Noord, G. (2004) Wide Coverage Parsing with Stochastic Attribute Value Grammars. Proceedings of the 1st International Joint Conference on Natural Language Processing Workshop: Beyond shallow analyses – Formalisms and statistical modeling for deep analyses, Sanya City, Hainan Island, China.Google Scholar
Malouf, R. (2002) A comparison of algorithms for maximum entropy parameter estimation. Proceedings of the Sixth Workshop on Natural Language Learning, pp. 49–55, Taipei, Taiwan.CrossRefGoogle Scholar
McCallum, A. and Nigam, K. (1998) Employing EM and pool-based active learning for text classification. Proceedings of the International Conference on Machine Learning, pp. 350–358.Google Scholar
Melville, P. and Mooney, R. J. (2004) Diverse ensembles for active learning. Proceedings of the 21st International Conference on Machine Learning, pp. 584–591, Banff, Canada.Google Scholar
Ngai, G. and Yarowsky, D. (2000) Rule Writing or Annotation: Cost-efficient Resource Usage for Base Noun Phrase Chunking. Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, pp. 117–125, Hong Kong.Google Scholar
Oepen, S., Toutanova, K., Shieber, S., Manning, C., Flickinger, D. and Brants, T. (2002) The LinGO Redwoods Treebank: Motivation and preliminary applications. Proceedings of the 19th International Conference on Computational Linguistics, pp. 1253–1257, Taipei, Taiwan.Google Scholar
Osborne, M. and Baldridge, J. (2004) Ensemble-based active learning for parse selection. In: Dumais, S.Marcu, D. and Roukos, S., editors, HLT-NAACL 2004: Main Proceedings, pp. 8996, Boston, MA.Google Scholar
Osborne, M. (2000) Estimation of Stochastic Attribute-Value Grammars using an Informative Sample. The 18th International Conference on Computational Linguistics, pp. 586–592, Saarbrücken.CrossRefGoogle Scholar
Osborne, M. (2002) Shallow parsing using noisy and non-stationary training material. Journal of Machine Learning Research, 2: 551558.Google Scholar
Pereira, F., Tishby, N. and Lee, L. (1993) Distributional clustering of English words. Proceedings of the Annual Meeting of the ACL, pp. 183–190.Google Scholar
Riezler, S., Prescher, D., Kuhn, J. and Johnson, M. (2000) Lexicalized stochastic modeling of constraint-based grammars using log-linear measures and EM training. Proceedings of the 38th Annual Meeting of the ACL, Hong Kong.CrossRefGoogle Scholar
Roy, N. and McCallum, A. (2001) Toward optimal active learning through sampling estimation of error reduction. Proceedings of the 18th International Conference on Machine Learning, pp. 441–448. Morgan Kaufmann.Google Scholar
Saar-Tsechansky, M. and Provost, F. (2004) Active sampling for class probability estimation and ranking. Machine Learning, 54 2: 153178.Google Scholar
Seung, H. S., Opper, M. and Sompolinsky, H. (1992) Query by committee. Computational Learning Theory, 287–294.Google Scholar
Siegel, M. (2000) HPSG Analysis of Japanese. In: W. Wahlster, editor, Verbmobil: Foundations of Speech-to-Speech Translation, pp. 264279. Springer.Google Scholar
Smith, A., Cohn, T. and Osborne, M. (2005) Logarithmic opinion pools for conditional random fields. Proceedings of ACL 2005, pp. 18–25, Ann Arbor, MI.Google Scholar
Tanaka, T., Bond, F., Oepen, S. and Fujita, S. (2005) High precision treebanking: Blazing useful trees using pos information. Proceedings of the 43th Meeting of the Association for Computational Linguistics, pp. 330–337, Ann Arbor, MI.CrossRefGoogle Scholar
Tang, M., Luo, X. and Roukos, S. (2002) Active Learning for Statistical Natural Language Parsing. Proceedings of the 40th Annual Meeting of the ACL, pp. 120–127, Philadelphia, PA.CrossRefGoogle Scholar
Thompson, C. R., Califf, M. E. and Mooney, R. J. (1999) Active learning for natural language parsing and information extraction. Proceedings of the 16th International Conference on Machine Learning, pp. 406–414. Morgan Kaufmann.Google Scholar
Tong, S. and Koller, D. 2000. Support vector machine active learning with applications to text classification. In: P. Langley, editor, Proceedings of the 17th International Conference on Machine Learning, pp. 999–1006, Stanford, US. Morgan Kaufmann.Google Scholar
Toutanova, K., Markova, P. and Manning, C. (2004) The leaf projection path view of parse trees: Exploring string kernels for HPSG parse selection. Proceedings of EMNLP 2004, pp. 166–173, Barcelona.Google Scholar
Toutanova, K. and Manning, C. (2002) Feature selection for a rich HPSG grammar using decision trees. Proceedings of the 6th Conference on Natural Language Learning, Taipei, Taiwan.Google Scholar