Controlled Translation in an Example-based Environment: What do Automatic Evaluation Metrics Tell Us?

Way, Andy; Gough, Nano

doi:10.1007/s10590-005-1403-8

Cited by 5 publications

(1 citation statement)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Rather than using models of syntax in a post hoc fashion, as is the case with most SMT systems, most EBMT models of translation build in syntax a t their core, during both the creation of their knowledge resources and during the translation process itself. IIowever, EBMT systems st111 tend to suffer from problems of coverage and seem to perform better in controlled language (Way and Gough, 2005b) or sublanguage domains (Sumita et al, 1990).…”

Section: Exarnplebased Machine Translation (Ebmt)mentioning

confidence: 99%

Hybrid data-driven models of machine translation

Groves

Way

2006

Machine Translation

Self Cite

View full text Add to dashboard Cite

Section: Exarnplebased Machine Translation (Ebmt)mentioning

confidence: 99%

Hybrid data-driven models of machine translation

Groves

Way

2006

Machine Translation

Self Cite

View full text Add to dashboard Cite

References

2010

The Handbook of Computational Linguistics and Natural Language Processing

View full text Add to dashboard Cite

Example-based machine translation: a review and commentary

Hutchins¹

2006

Machine Translation

View full text Add to dashboard Cite

In the last decade the dominant models of machine translation (MT) have been data-driven or corpus-based. This is in sharp contrast to the dominant framework of the 1980s and previous decades, which was 'rule-based' (RBMT). In general, a distinction is made between, on the one hand, statistical machine translation (SMT), based primarily on word frequency and word combinations, and on the other hand, example-based machine translation (EBMT), based on the extraction and combination of phrases (or other short segments of texts). In both cases the corpora comprise bilingual texts (originals and their translations).The origin of EBMT can be dated precisely to a conference paper in 1981 by Makoto Nagao (1984). Research, however, did not begin until the late 1980s at the same time as the first appearance of the translation memory (TM) as a translator's tool and the first research on SMT. The latter in particular gave rise to much dispute in the early 1990s. EBMT was associated with SMT as both were seen as variants of corpus-based approaches to MT systems, and during the 1990s both became familiar at MT conferences. In recent years, SMT has become the dominant (almost 'mainstream') approach in MT (as witnessed by the proceedings of almost any conference in the field of computational linguistics), and EBMT systems are less evident than SMT (but now more prevalent than RBMT).The overall conception of SMT is now familiar -in essence, virtually all described models derive from the design first formulated in 1988 by the IBM group (Brown et al. 1988). Sentences of the bilingual corpus are first aligned, then individual words or word sequences (called 'phrases' or 'clumps' in SMT literature) of source language (SL) and target language (TL) texts are aligned, i.e. brought into correspondence. On the basis of these alignments are derived a 'translation model' of SL-TL frequencies and a 'language model' of TL word sequences. Translation involves the selection of most probable TL output for each input word or phrase and the determination of the most probable sequence(s) of words in the TL.By contrast, the EBMT model is less clearly defined than the SMT model. Basically (if somewhat superficially), an MT system is an EBMT system if it uses segments (word sequences (strings) and not individual words) of source language (SL) texts extracted from a text corpus (its example database) to build texts in a target language (TL) with the same meaning. The basic units for EBMT are sequences of words (phrases, or 'fragments'), and the basic techniques are the matching of input strings against SL strings in the database, the extraction of corresponding TL strings and the 'recombination' of the strings as acceptable TL sentences. However, there is a multiplicity of techniques, many derived from other approaches, including methods used in RBMT systems, methods found in SMT, techniques used in translation memories (TM) etc., and there seems to be no clear consensus on what the basic 'model' (or design framework) of EBMT is and what it is not. Co...

show abstract

Controlled Translation in an Example-based Environment: What do Automatic Evaluation Metrics Tell Us?

Cited by 5 publications

References 20 publications

Hybrid data-driven models of machine translation

Hybrid data-driven models of machine translation

References

Example-based machine translation: a review and commentary

Contact Info

Product

Resources

About