Bootstrapping parsers via syntactic projection across parallel texts

REBECCA HWA; PHILIP RESNIK; AMY WEINBERG; CLARA CABEZAS; OKAN KOLAK

doi:10.1017/S1351324905003840

Abstract

Broad coverage, high quality parsers are available for only a handful of languages. A prerequisite for developing broad coverage parsers for more languages is the annotation of text with the desired linguistic representations (also known as “treebanking”). However, syntactic annotation is a labor intensive and time-consuming process, and it is difficult to find linguistically annotated text in sufficient quantities. In this article, we explore using parallel text to help solving the problem of creating syntactic annotation in more languages. The central idea is to annotate the English side of a parallel corpus, project the analysis to the second language, and then train a stochastic analyzer on the resulting noisy annotations. We discuss our background assumptions, describe an initial study on the “projectability” of syntactic relations, and then present two experiments in which stochastic parsers are developed with minimal human intervention via projection from English.

Footnotes

The authors gratefully acknowledge helpful discussions with Adam Lopez and Gina Levow, the constructive comments of the anonymous reviewers, as well as publicly available software used in this work. This research was supported in part by National Science Foundation grant EIA0130422, Department of Defense contract RD-02-5700, and ONR MURI Contract FCPO.810548265.

Crossref Citations

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Chatterjee, Niladri and Goyal, Shailly 2007. An Example Based Approach for Parsing Natural Language Sentences. p. 451.

Puchol-Blasco, M. Saquete, E. and Martinez-Barco, P. 2007. Multilingual Extension of Temporal Expression Recognition Using Parallel Corpora. p. 175.

Lopez, Adam 2008. Statistical machine translation. ACM Computing Surveys, Vol. 40, Issue. 3, p. 1.

Sánchez-de-Madariaga, Ricardo and Fernández-del-Castillo, José R. 2009. The bootstrapping of the Yarowsky algorithm in real corpora. Information Processing & Management, Vol. 45, Issue. 1, p. 55.

SUDPRASERT, Sutee KAWTRAKUL, Asanee BOITET, Christian and BERMENT, Vincent 2009. Dependency Parsing with Lattice Structures for Resource-Poor Languages. IEICE Transactions on Information and Systems, Vol. E92-D, Issue. 10, p. 2122.

V. Graça, João Ganchev, Kuzman and Taskar, Ben 2010. Learning Tractable Word Alignment Models with Complex Constraints. Computational Linguistics, Vol. 36, Issue. 3, p. 481.

Bamman, David Babeu, Alison and Crane, Gregory 2010. Transferring structural markup across translations using multilingual alignment and projection. p. 11.

Merlo, Paola Bunt, Harry and Nivre, Joakim 2010. Trends in Parsing Technology. Vol. 43, Issue. , p. 1.

Hu, PengLong Yu, Mo Li, Jing Zhu, CongHui and Zhao, TieJun 2011. Semi-supervised Learning Framework for Cross-Lingual Projection. p. 213.

Rögnvaldsson, Eiríkur and Helgadóttir, Sigrún 2011. Language Technology for Cultural Heritage. p. 63.

Mareček, David 2011. Computational Linguistics and Intelligent Text Processing. Vol. 6608, Issue. , p. 144.

Peirsman, Yves and Padó, Sebastian 2011. Semantic relations in bilingual lexicons. ACM Transactions on Speech and Language Processing, Vol. 8, Issue. 2, p. 1.

2011. Linguistic Structure Prediction.

Mohamed, Hassan Omar, Nazlia Aziz, Mohd Juzaidin Ab and Rahman, Suhaimi Ab 2011. Statistical Malay Dependency Parser for Knowledge Acquisition Based on Word Dependency Relation. Procedia - Social and Behavioral Sciences, Vol. 27, Issue. , p. 188.

Wróblewska, Alina and Przepiórkowski, Adam 2012. Computational Collective Intelligence. Technologies and Applications. Vol. 7653, Issue. , p. 364.

Voutilainen, Atro Purtonen, Tanja and Muhonen, Kristiina 2012. Shall We Play the Festschrift Game?. p. 117.

Resnik, Philip Buzek, Olivia Kronrod, Yakov Hu, Chang Quinn, Alexander J. and Bederson, Benjamin B. 2013. Using targeted paraphrasing and monolingual crowdsourcing to improve translation. ACM Transactions on Intelligent Systems and Technology, Vol. 4, Issue. 3, p. 1.

Chiarcos, Christian Moran, Steven Mendes, Pablo N. Nordhoff, Sebastian and Littauer, Richard 2013. The People’s Web Meets NLP. p. 315.

Julien, Simon Langlais, Philippe and Tremblay, Réal 2014. Advances in Artificial Intelligence. Vol. 8436, Issue. , p. 155.

Utt, Jason and Padó, Sebastian 2014. Crosslingual and Multilingual Construction of Syntax-Based Vector Space Models. Transactions of the Association for Computational Linguistics, Vol. 2, Issue. , p. 245.

Download full list

Article contents

Bootstrapping parsers via syntactic projection across parallel texts

Abstract

Access options

Footnotes

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Article contents

Bootstrapping parsers via syntactic projection across parallel texts

Abstract

Access options

Footnotes

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests