Hostname: page-component-8448b6f56d-42gr6 Total loading time: 0 Render date: 2024-04-23T12:39:57.604Z Has data issue: false hasContentIssue false

Bootstrapping parsers via syntactic projection across parallel texts

Published online by Cambridge University Press:  21 September 2005

REBECCA HWA
Affiliation:
Department of Computer Science, University of Pittsburgh, PA 15260, USA e-mail: hwa@cs.pitt.edu
PHILIP RESNIK
Affiliation:
Institute for Advanced Computer Studies and Department of Linguistics, University of Maryland, College Park, MD USA 20742, USA e-mail: resnik@umiacs.umd.edu, weinberg@umiacs.umd.edu, clarac@umiacs.umd.edu
AMY WEINBERG
Affiliation:
Institute for Advanced Computer Studies and Department of Linguistics, University of Maryland, College Park, MD USA 20742, USA e-mail: resnik@umiacs.umd.edu, weinberg@umiacs.umd.edu, clarac@umiacs.umd.edu
CLARA CABEZAS
Affiliation:
Institute for Advanced Computer Studies and Department of Linguistics, University of Maryland, College Park, MD USA 20742, USA e-mail: resnik@umiacs.umd.edu, weinberg@umiacs.umd.edu, clarac@umiacs.umd.edu
OKAN KOLAK
Affiliation:
Department of Computer Science, University of Maryland, College Park, MD 20742, USA e-mail: okan@umiacs.umd.edu

Abstract

Broad coverage, high quality parsers are available for only a handful of languages. A prerequisite for developing broad coverage parsers for more languages is the annotation of text with the desired linguistic representations (also known as “treebanking”). However, syntactic annotation is a labor intensive and time-consuming process, and it is difficult to find linguistically annotated text in sufficient quantities. In this article, we explore using parallel text to help solving the problem of creating syntactic annotation in more languages. The central idea is to annotate the English side of a parallel corpus, project the analysis to the second language, and then train a stochastic analyzer on the resulting noisy annotations. We discuss our background assumptions, describe an initial study on the “projectability” of syntactic relations, and then present two experiments in which stochastic parsers are developed with minimal human intervention via projection from English.

Type
Papers
Copyright
2005 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

The authors gratefully acknowledge helpful discussions with Adam Lopez and Gina Levow, the constructive comments of the anonymous reviewers, as well as publicly available software used in this work. This research was supported in part by National Science Foundation grant EIA0130422, Department of Defense contract RD-02-5700, and ONR MURI Contract FCPO.810548265.