A comparison of parsing technologies for the biomedical domain

CLAIRE GROVER; ALEX LASCARIDES; MIRELLA LAPATA

doi:10.1017/S1351324904003547

A comparison of parsing technologies for the biomedical domain

Published online by Cambridge University Press: 28 February 2005

CLAIRE GROVER ,

ALEX LASCARIDES and

MIRELLA LAPATA

Show author details

CLAIRE GROVER: Affiliation:
School of Informatics, The University of Edinburgh, 2 Buccleuch Place, Edinburgh EH8 9LW, UK e-mail: C.Grover@ed.ac.uk, A.Lascarides@ed.ac.uk
ALEX LASCARIDES: Affiliation:
School of Informatics, The University of Edinburgh, 2 Buccleuch Place, Edinburgh EH8 9LW, UK e-mail: C.Grover@ed.ac.uk, A.Lascarides@ed.ac.uk
MIRELLA LAPATA: Affiliation:
Department of Computer Science, University of Sheffield, 11 Portobello Street, Sheffield S1 4DP, UK e-mail: mlap@dcs.shef.ac.uk

Article contents

Abstract

Get access

Rights & Permissions

Abstract

This paper reports on a number of experiments which are designed to investigate the extent to which current NLP resources are able to syntactically and semantically analyse biomedical text. We address two tasks: (a) parsing a real corpus with a hand-built wide-coverage grammar, producing both syntactic analyses and logical forms and (b) automatically computing the interpretation of compound nouns where the head is a nominalisation (e.g. hospital arrival means an arrival at hospital, while patient arrival means an arrival of a patient). For the former task we demonstrate that flexible and yet constrained pre-processing techniques are crucial to success: these enable us to use part-of-speech tags to overcome inadequate lexical coverage, and to package up complex technical expressions prior to parsing so that they are blocked from creating misleading amounts of syntactic complexity. We argue that the XML-processing paradigm is ideally suited for automatically preparing the corpus for parsing. For the latter task, we compute interpretations of the compounds by exploiting surface cues and meaning paraphrases, which in turn are extracted from the parsed corpus. This provides an empirical setting in which we can compare the utility of a comparatively deep parser vs. a shallow one, exploring the trade-off between resolving attachment ambiguities on the one hand and generating errors in the parses on the other. We demonstrate that a model of the meaning of compound nominalisations is achievable with the aid of current broad-coverage parsers.

Type: Papers
Information: Natural Language Engineering , Volume 11 , Issue 1 , March 2005 , pp. 27 - 65

DOI: https://doi.org/10.1017/S1351324904003547 [Opens in a new window]

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article contents

A comparison of parsing technologies for the biomedical domain

Abstract

Access options

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests