Evaluating two methods for Treebank grammar compaction

ALEXANDER KROTOV; MARK HEPPLE; ROBERT GAIZAUSKAS; YORICK WILKS

doi:10.1017/S1351324900002308

Evaluating two methods for Treebank grammar compaction

Published online by Cambridge University Press: 01 December 1999

ALEXANDER KROTOV ,

MARK HEPPLE ,

ROBERT GAIZAUSKAS and

YORICK WILKS

Show author details

ALEXANDER KROTOV: Affiliation:
Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK; alexk@dcs.shef.ac.uk, hepple@dcs.shef.ac.uk, robertg@dcs.shef.ac.uk, yorick@dcs.shef.ac.uk
MARK HEPPLE: Affiliation:
Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK; alexk@dcs.shef.ac.uk, hepple@dcs.shef.ac.uk, robertg@dcs.shef.ac.uk, yorick@dcs.shef.ac.uk
ROBERT GAIZAUSKAS: Affiliation:
Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK; alexk@dcs.shef.ac.uk, hepple@dcs.shef.ac.uk, robertg@dcs.shef.ac.uk, yorick@dcs.shef.ac.uk
YORICK WILKS: Affiliation:
Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello Street, Sheffield S1 4DP, UK; alexk@dcs.shef.ac.uk, hepple@dcs.shef.ac.uk, robertg@dcs.shef.ac.uk, yorick@dcs.shef.ac.uk

Article contents

Abstract

Get access

Rights & Permissions

Abstract

Treebanks, such as the Penn Treebank, provide a basis for the automatic creation of broad coverage grammars. In the simplest case, rules can simply be ‘read off’ the parse-annotations of the corpus, producing either a simple or probabilistic context-free grammar. Such grammars, however, can be very large, presenting problems for the subsequent computational costs of parsing under the grammar. In this paper, we explore ways by which a treebank grammar can be reduced in size or ‘compacted’, which involve the use of two kinds of technique: (i) thresholding of rules by their number of occurrences; and (ii) a method of rule-parsing, which has both probabilistic and non-probabilistic variants. Our results show that by a combined use of these two techniques, a probabilistic context-free grammar can be reduced in size by 62% without any loss in parsing performance, and by 71% to give a gain in recall, but some loss in precision.

Type: Research Article
Information: Natural Language Engineering , Volume 5 , Issue 4 , December 1999 , pp. 377 - 394

DOI: https://doi.org/10.1017/S1351324900002308 [Opens in a new window]

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article contents

Evaluating two methods for Treebank grammar compaction

Abstract

Access options

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests