Multi-level similar segment matching algorithm for translation memories and Example-based Machine Translation

Planas, Emmanuel; Furuse, Osamu

doi:10.3115/992730.992736

Cited by 10 publications

(6 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The semantic tagset used by USAS is a languageindependent multi-tier structure with 21 major discourse fields, subdivided into 232 sub-categories (such as I1.1-= Money: lack; A5.1-= Evaluation: bad), which can be used to detect the semantic context. Identification of semantically similar situations can be also improved by the use of segment-matching algorithms as employed in Example-Based MT (EBMT) and translation memories (Planas and Furuse, 2000;Carl and Way, 2003).…”

Section: Discussionmentioning

confidence: 99%

Using comparable corpora to solve problems difficult for human translators

Sharoff

Babych

Hartley

2006

Proceedings of the COLING/ACL on Main Conference Poster Sessions -

View full text Add to dashboard Cite

In this paper we present a tool that uses comparable corpora to find appropriate translation equivalents for expressions that are considered by translators as difficult. For a phrase in the source language the tool identifies a range of possible expressions used in similar contexts in target language corpora and presents them to the translator as a list of suggestions. In the paper we discuss the method and present results of human evaluation of the performance of the tool, which highlight its usefulness when dictionary solutions are lacking.

show abstract

Section: Discussionmentioning

confidence: 99%

Using comparable corpora to solve problems difficult for human translators

Sharoff

Babych

Hartley

2006

Proceedings of the COLING/ACL on Main Conference Poster Sessions -

View full text Add to dashboard Cite

show abstract

“…Further, in word-based measures, the exploitation of the information provided by the order of the words can be fundamental: word-order-sensitive approaches are demonstrated generally to outperform bag-of-words methods (Baldwin and Tanaka 2000). For instance, several papers (Collins and Cunningham 1996;Cranias et al 1994;Planas and Furuse 2000) report advanced hybrid word-and structure-based matching techniques on the TM. In particular, Collins and Cunningham (1996) and Cranias et al (1994) perform exact matching whereas Planas and Furuse (2000) adopt the edit-distance metric limiting themselves to deletions and equalities.…”

Section: Similarity Measures and Scoring Functionsmentioning

confidence: 99%

“…For instance the approach in Brown (1996) and Leplus et al (2004) stores examples as strings of words together with some alignment and information on equivalence classes (numbers, weekdays, etc.). Other approaches (Cranias et al 1994;Collins and Cunningham 1996;Planas and Furuse 2000) are more language-and knowledge-dependent and perform some text operations (e.g. POS tagging and stemming) on the sentences in order to store more structural information about them.…”

Section: Logical Representation Of Examplesmentioning

confidence: 99%

EXTRA: a system for example-based translation assistance

Mandreoli

Tiberio

2007

Machine Translation

View full text Add to dashboard Cite

In this paper we present EXTRA (EXample-based TRanslation Assistant), a translation memory (TM) system. EXTRA is able to propose effective translation suggestions by relying on syntactic analysis of the text and on a rigorous, language-independent measure; the search is performed efficiently in large amounts of bilingual texts thanks to its advanced retrieval techniques. EXTRA does not use external knowledge requiring the intervention of users and is completely customizable and portable as it has been implemented on top of a standard DataBase Management System. The paper provides a thorough evaluation of both the effectiveness and the efficiency of our system. In particular, in order to quantify the benefits offered by EXTRA assisted translation over manual translation, we introduce a simulator implementing specifically devised statistical, process-oriented, discrete-event models. As far as we know, this is the first time statistical simulation experiments have been used to face the nontrivial problem of evaluating TM systems, particularly for comparing the time that could be saved by performing assisted translation versus "manual" translation and for optimally tuning the system behaviour with respect to differently skilled users. In our experiments, we considered three scenarios, manual translation with one or two translators and assisted translation with one translator. The time needed for one translator to do an assisted translation is significantly closer to that of a team of two translators than to that of the single translator. The mean sentence translation time is by far the lowest for this scenario, corresponding to the highest per translator productivity. 123 168 F. Mandreoli et al.We also estimate the total translation time when the number of query sentences, the maximum number of suggestions to be read, and the probability of look up are varied: the best trade-off is given by reading (and presenting) four or five suggestions at the most.

show abstract

“…This involves an extension of the English semantic tagger, development of the Russian tagger with the target lexical coverage of 90% of source texts, designing the procedure for retrieval of semantically similar situations and completing the user interface. Identification of semantically similar situations can be improved by the use of segmentmatching algorithms as employed in ExampleBased MT and translation memories (Planas and Furuse, 2000;Carl and Way, 2003).…”

Section: Discussionmentioning

confidence: 99%

“…It is widely acknowledged that human translators can benefit from a wide range of applications in computational linguistics, including Machine Translation (Carl and Way, 2003), Translation Memory (Planas and Furuse, 2000), etc. There have been recent research on tools detecting translation equivalents for technical vocabulary in a restricted domain, e.g.…”

Section: Introductionmentioning

confidence: 99%

Assist

Sharoff

Babych

Rayson

et al. 2006

Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters &Amp;

View full text Add to dashboard Cite

The problem we address in this paper is that of providing contextual examples of translation equivalents for words from the general lexicon using comparable corpora and semantic annotation that is uniform for the source and target languages. For a sentence, phrase or a query expression in the source language the tool detects the semantic type of the situation in question and gives examples of similar contexts from the target language corpus.

show abstract

Multi-level similar segment matching algorithm for translation memories and Example-based Machine Translation

Cited by 10 publications

References 9 publications

Using comparable corpora to solve problems difficult for human translators

Using comparable corpora to solve problems difficult for human translators

EXTRA: a system for example-based translation assistance

Assist

Contact Info

Product

Resources

About