Comparing example-based and statistical machine translation

ANDY WAY; NANO GOUGH

doi:10.1017/S1351324905003888

Comparing example-based and statistical machine translation

Published online by Cambridge University Press: 21 September 2005

ANDY WAY and

NANO GOUGH

Show author details

ANDY WAY: Affiliation:
School of Computing, Dublin City University, Dublin 9, Ireland e-mail: away@computing.dcu.ie, ngough@computing.dcu.ie
NANO GOUGH: Affiliation:
School of Computing, Dublin City University, Dublin 9, Ireland e-mail: away@computing.dcu.ie, ngough@computing.dcu.ie

Article contents

Abstract

Get access

Rights & Permissions

Abstract

In previous work (Gough and Way 2004), we showed that our Example-Based Machine Translation (EBMT) system improved with respect to both coverage and quality when seeded with increasing amounts of training data, so that it significantly outperformed the on-line MT system Logomedia according to a wide variety of automatic evaluation metrics. While it is perhaps unsurprising that system performance is correlated with the amount of training data, we address in this paper the question of whether a large-scale, robust EBMT system such as ours can outperform a Statistical Machine Translation (SMT) system. We obtained a large English-French translation memory from Sun Microsystems from which we randomly extracted a near 4K test set. The remaining data was split into three training sets, of roughly 50K, 100K and 200K sentence-pairs in order to measure the effect of increasing the size of the training data on the performance of the two systems. Our main observation is that contrary to perceived wisdom in the field, there appears to be little substance to the claim that SMT systems are guaranteed to outperform EBMT systems when confronted with ‘enough’ training data. Our tests on a 4.8 million word bitext indicate that while SMT appears to outperform our system for French-English on a number of metrics, for English-French, on all but one automatic evaluation metric, the performance of our EBMT system is superior to the baseline SMT model.

Type: Papers
Information: Natural Language Engineering , Volume 11 , Issue 3 , September 2005 , pp. 295 - 309

DOI: https://doi.org/10.1017/S1351324905003888 [Opens in a new window]

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article contents

Comparing example-based and statistical machine translation

Abstract

Access options

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests