Improving shift-reduce constituency parsing with large-scale unlabeled data

MUHUA ZHU; JINGBO ZHU; HUIZHEN WANG

doi:10.1017/S1351324913000119

Improving shift-reduce constituency parsing with large-scale unlabeled data

Published online by Cambridge University Press: 19 June 2013

MUHUA ZHU ,

JINGBO ZHU and

HUIZHEN WANG

Show author details

MUHUA ZHU: Affiliation:
Natural Language Processing Lab, Northeastern University, Shenyang 110819, China e-mails: zhumuhua@gmail.com, zhujingbo@mail.neu.edu.cn, wanghuizhen@mail.neu.edu.cn
JINGBO ZHU: Affiliation:
Natural Language Processing Lab, Northeastern University, Shenyang 110819, China e-mails: zhumuhua@gmail.com, zhujingbo@mail.neu.edu.cn, wanghuizhen@mail.neu.edu.cn
HUIZHEN WANG*: Affiliation:
Natural Language Processing Lab, Northeastern University, Shenyang 110819, China e-mails: zhumuhua@gmail.com, zhujingbo@mail.neu.edu.cn, wanghuizhen@mail.neu.edu.cn
*: †Corresponding author.

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Shift-reduce parsing has been studied extensively for diverse grammars due to the simplicity and running efficiency. However, in the field of constituency parsing, shift-reduce parsers lag behind state-of-the-art parsers. In this paper we propose a semi-supervised approach for advancing shift-reduce constituency parsing. First, we apply the uptraining approach (Petrov, S. et al. 2010. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP), Cambridge, MA, USA, pp. 705–713) to improve part-of-speech taggers to provide better part-of-speech tags to subsequent shift-reduce parsers. Second, we enhance shift-reduce parsing models with novel features that are defined on lexical dependency information. Both stages depend on the use of large-scale unlabeled data. Experimental results show that the approach achieves overall improvements of 1.5 percent and 2.1 percent on English and Chinese data respectively. Moreover, the final parsing accuracies reach 90.9 percent and 82.2 percent respectively, which are comparable with the accuracy of state-of-the-art parsers.

Type: Articles
Information: Natural Language Engineering , Volume 21 , Issue 1 , January 2015 , pp. 113 - 138

DOI: https://doi.org/10.1017/S1351324913000119 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2013

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bikel, D. 2004. On the Parameter Space of Generative Lexicalized Statistical Parsing Models. PhD thesis, University of Pennsylvania.Google Scholar

Carreras, X., Collins, M. and Koo, T. 2008. TAG, dynamic programming and the perceptron for efficient, feature-rich parsing. In Proceedings of the 12th Conference on Computational Natural Language Learning (CoNLL), Manchester, UK, pp. 9–16.CrossRef Google Scholar

Charniak, E. 2000. A maximum-entropy-inspired parser. In Proceedings of the First Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL), Washington, USA, pp. 132–9.Google Scholar

Charniak, E. and Johnson, M. 2005. Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), University of Michigan, Ann Arbor, MI, USA, pp. 173–80.Google Scholar

Chen, W., Kazama, J., Uchimoto, K. and Torisawa, K. 2012. Exploiting subtrees in auto-parsed data to improve dependency parsing. Computational Intelligence Journal 28 (3): 426–51 (John Wiley).CrossRef Google Scholar

Chen, W., Kazama, J., Zhang, M., Tsuruoka, Y., Zhang, Y., Wang, Y., Torisawa, K., and Li, H. 2012. Bitext dependency parsing with auto-generated bilingual treebanks. IEEE Transactions on Audio, Speech and Language Processing 20 (5): 1461–72.CrossRef Google Scholar

Clark, S., Curran, J. and Osborne, M. 2003. Bootstrapping POS taggers using unlabeled data. In Proceedings of the 7th Conference on Computational Natural Language Learning (CoNLL), Edmonton, Canada.Google Scholar

Collins, M. 1996. A new statistical parser based on bigram lexical dependencies. In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (ACL), California, USA.Google Scholar

Collins, M. 1997. Three generative, lexicalised models for statistical parsing. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics (ACL), Madrid, Spain.Google Scholar

Collins, M. 1999. Head-Driven Statistical Models for Natural Language Parsing. PhD thesis, University of Pennsylvania.Google Scholar

Collins, M. 2002. Discriminative training methods for hidden Markov models: theory and experiemnts with perceptron algorithm. In Proceedings of the 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, PA, USA, pp. 1–8.Google Scholar

Collins, M. and Roark, B. 2004. Incremental parsing with the perceptron algorithm. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL), Barcelona, Spain.Google Scholar

Eisner, J. and Satta, G. 1999. Efficient parsing for bilexical context-free grammars and head automaton grammars. In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics (ACL), Maryland, USA.Google Scholar

Graff, D. 1995. North American News Text Corpus. Linguistic Data Consortium, Philadelphia, PA. LDC Catalog No. LDC95T21.Google Scholar

Hatori, J., Matsuzaki, T. and Tsujii, J. 2011. Incremental joint POS tagging and dependency parsing in Chinese. In Proceedings of the 5th International Joint Conference on Natural Language Processing (IJCNLP), Chiang Mai, Thailand, pp. 8–13.Google Scholar

Huang, L. 2008. Forest reranking: discriminative parsing with non-local features. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL), Ohio, USA, pp. 586–94.Google Scholar

Huang, L. Y. 2009. Improve Chinese parsing with Max-Ent reranking parser. Master Project Report, Brown University, Providence, RI.Google Scholar

Huang, Z., Eidelman, V. and Harper, M. 2009a. Improving a simple bigram HMM part-of-speech tagger by latent annotation and self-training. In Proceedings of Huamn Language Technology: Conference of the North American Chapter of the Association of Computational Linguistics (HLT-NAACL), 2009, Colorado, USA, pp. 213–6.Google Scholar

Huang, Z. and Harper, M. 2009. Self-training PCFG grammars with latent annotations across languages. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP), Singapore, pp. 832–41.Google Scholar

Huang, Z., Harper, M. and Petrov, S. 2010. Self-training with products of latent variable grammars. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP), Cambridge, MA, USA, pp. 12–22.Google Scholar

Huang, L., Jiang, W. and Liu, Q. 2009b. Billingually constrained (monolingual) shift-reduce parsing. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP), Singapore, pp. 1222–31.Google Scholar

Huang, L. and Sagae, K. 2010. Dynamic programming for linear-time incremental parsing. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), Uppsala, Sweden, pp. 1077–86.Google Scholar

Katz-Brown, J., Petrov, S., McDonald, R., Och, F., Talbot, D., Ichikawa, H., Seno, M., and Kazawa, H. 2011. Training a parser for machine translation reordering. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP), Edinburgh, UK, pp. 183–92.Google Scholar

Koo, T. and Collins, M. 2010. Efficient third-order dependency parsers. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), Uppsala, Sweden, pp. 1–11.Google Scholar

McClosky, D, Charniak, E., and Johnson, M. 2006. Effective self-training for parsing. In Proceedings of Human Language Technology Conference – North American Chapter of the Association of Computational Linguistics (HLT-NAACL), New York, USA, pp. 152–9.Google Scholar

Manning, C. D. 2011. Part-of-speech tagging from 97% to 100%: is it time for some linguistics? In Proceedings of Computational Linguistics and Intelligence Text Processing – 12th International Conference (CICLing), Tokyo, Japan, pp. 171–89.CrossRef Google Scholar

Marcus, P., Santorini, B. and Marcinkiewiz, A. 1993. Building a large annotated corpus of English. Computational Linguistics 19 (2): 313–30 (MIT Press).Google Scholar

Merialdo, B. 1994. Tagging English text with a probabilistic model. Computational Linguistics 20 (2): 155–71, MIT Press.Google Scholar

Nivre, J. 2004. Incrementality in deterministic dependency parsing. In Proceedings of the ACL Workshop on Incremental Parsing: Bringing Engineering and Cognition Together. Workshop at ACL, Barcelona, Spain.Google Scholar

Noord, G. 2007. Using self-trained bilexical preferences to improve disambiguation accuracy. In Proceedings of the 10th International Conference on Parsing Technologies (IWPT), Prague, Czech Republic, pp. 1–10.Google Scholar

Petrov, S. 2010. Products of random latent variable grammars. In Proceedings of Human Language Technologies Conference – North American Chapter of the Association of Computational Linguistics (HLT-NAACL), California, USA, pp. 19–27.Google Scholar

Petrov, S., Chang, P., Ringgaard, M. and Alshawi, H. 2010. Uptraining for accurate deterministic question parsing. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP), Cambridge, MA, USA, pp. 705–13.Google Scholar

Petrov, S. and Klein, D. 2007. Improved inference for unlexicalized parsing. In Proceedings of Human Language Technology Conference – North American Chapter of the Association of Computational Linguistics (HLT-NAACL), New York, USA, pp. 404–11.Google Scholar

Ratnaparkhi, A. 1996. A maximum entropy part of speech tagger. In Proceedings of the 1996 Conference on Empirical Methods in Natural Language Processing (EMNLP), University of Pennsylvania.Google Scholar

Ratnaparkhi, A. 1997. A linear observed time statistical parser based on maximum entropy models. In Proceedings of the 1997 Conference on Empirical Methods in Natural Language Processing (EMNLP), Rhode Island, USA.Google Scholar

Sagae, K. and Lavie, A. 2005. A classifier-based parser with linear run-time complexity. In Proceedings of the 9th International Workshop on Parsing Technologies (IWPT), Vancouver, BC, Canada, pp. 125–32.CrossRef Google Scholar

Sagae, K. and Lavie, A. 2006. A best-first probabilistic shift-reduce parser. In Proceedings of 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL), Sydney, Australia, pp. 691–8.Google Scholar

Søgaard, A. 2010. Simple semi-supervised training of part-of-speech taggers. In Proceedings of of the 48th Annual Meeting of the Association for Computational Linguistics (ACL), Uppsala, Sweden, pp. 205–8.Google Scholar

Suzuki, J. and Isozaki, H. 2008. Semi-supervised labeling and segmentation using giga-word scale unlabeled data. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL), Ohio, USA, pp. 665–73.Google Scholar

Toutanova, K., Klein, D., Manning, C. and Singer, Y. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of Human Language Technology Conference – North Amrican Chapter of the Association for Computational Linguistics (HLT-NAACL), Edmonton, Canada, pp. 252–9.Google Scholar

Tsuruoka, Y., Miyao, Y. and Kazama, J. 2011. Learning with lookahead: can history-based models rival globally optimized models? In Proceedings of the 15th Conference on Computational Natural Language Learning (CoNLL), Oregon, USA, pp. 238–46.Google Scholar

Tsuruoka, Y., Tsujii, J. and Ananiadou, S. 2009. Fast full parsing by linear-chain conditional random fields. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics (EACL), Athens, Greece, pp. 790–8.Google Scholar

Wang, W., Huang, Z. and Harper, M. 2007. Semi-supervised learning for part-of-speech tagging of Mandarin transcribed speech. In Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Hawaii, USA.Google Scholar

Wang, M., Sagae, K. and Mitamura, T. 2006. A fast, accurate deterministic parser for Chinese. In Proceedings of 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING-ACL), Sydney, Australia, pp. 425–32.Google Scholar

Xue, N., Xia, F., Chiou, F. and Palmer, M. 2006. The Penn Chinese Treebank: phrase structure annotation of a large corpus. Natural Language Engineering 11 (2): 207–38 (Cambridge University Press).CrossRef Google Scholar

Yamada, H. and Matsumoto, Y. 2003. Statistical dependency analysis with support vector machines. In Proceedings of the 8th International Workshop on Parsing Technologies (IWPT), Nancy, France, pp. 195–206.Google Scholar

Zhang, Y. and Clark, S. 2011. Shift-reduce CCG parsing. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL), Oregon, USA, pp. 683–92.Google Scholar

Zhang, Y. and Clark, S. 2009. Transition-based parsing of the Chinese treebank using a global discriminative model. In Proceedings of 11th International Conference on Parsing Technologies (IWPT), Paris, France, pp. 162–71.CrossRef Google Scholar

Zhang, H., Zhang, M., Tan, C. and Li, H. 2009. K-best combination of syntactic parsers. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP), Singapore, pp. 1552–60.Google Scholar

Zhao, H., Song, Y., Kit, C., and Zhou, G. 2009. Cross language dependency parsing using a bilingual lexicon. In Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP, Singapore, pp. 55–63.Google Scholar

Article contents

Improving shift-reduce constituency parsing with large-scale unlabeled data

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests