Online adaptation strategies for statistical machine translation in post-editing scenarios

Martínez-Gómez, Pascual; Sanchis-Trilles, Germán; Casacuberta, Francisco

doi:10.1016/j.patcog.2012.01.011

Cited by 32 publications

(31 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Incremental MT learning has been investigated several times, usually starting from no data (Barrachina et al, 2009;Ortiz-Martínez et al, 2010), via simulated post-editing (Martínez-Gómez et al, 2012;Denkowski et al, 2014a), or via re-ranking (Wäschle et al, 2013). No previous experiments combined large-scale baselines, full re-tuning of the model weights, and HTER optimization.…”

Section: Related Workmentioning

confidence: 99%

Human Effort and Machine Learnability in Computer Aided Translation

Green¹,

Wang²,

Chuang³

et al. 2014

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Analyses of computer aided translation typically focus on either frontend interfaces and human effort, or backend translation and machine learnability of corrections. However, this distinction is artificial in practice since the frontend and backend must work in concert. We present the first holistic, quantitative evaluation of these issues by contrasting two assistive modes: postediting and interactive machine translation (MT). We describe a new translator interface, extensive modifications to a phrasebased MT system, and a novel objective function for re-tuning to human corrections. Evaluation with professional bilingual translators shows that post-edit is faster than interactive at the cost of translation quality for French-English and EnglishGerman. However, re-tuning the MT system to interactive output leads to larger, statistically significant reductions in HTER versus re-tuning to post-edit. Analysis shows that tuning directly to HTER results in fine-grained corrections to subsequent machine output.

show abstract

Section: Related Workmentioning

confidence: 99%

Human Effort and Machine Learnability in Computer Aided Translation

Green¹,

Wang²,

Chuang³

et al. 2014

Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

“…To overcome this problem, several incremental alignment models have been proposed in the literature (Levenberg, Callison-Burch, & Osborne, 2010). With the exception of the stream-based translation approach, which adds or updates the original TM scores according to the new material (Ortiz-Martínez, García-Varea, & Casacuberta, 2010;Martínez-Gómez, Sanchis-Trilles, & Casacuberta, 2012;Mathur et al, 2013), the adaptation step is usually carried out by creating specific translation tables from the edited translations (using the standard phrase-extraction and phrase-scoring algorithms) and then combining them with the original translation tables. It is important to note that most of the work on incremental adaptation has been tested in the scenarios where references are used instead of UEs.…”

Section: Related Workmentioning

confidence: 99%

“…However, only simulated data, using references instead of actual human UEs, were considered in this work. Ortiz-Martínez et al (2010) and Martínez-Gómez et al (2012) applied an incremental version of the Expectation Maximization (EM) algorithm (Neal & Hinton, 1998) that minimizes an error function with small sequences of mini-batched data. This paradigm is commonly known as stream-based translation, as small portions of data are processed over time.…”

Section: Related Workmentioning

confidence: 99%

Leveraging Online User Feedback to Improve Statistical Machine Translation

Formiga¹,

Barrón-Cedeño²,

Màrquez

et al. 2015

jair

View full text Add to dashboard Cite

In this article we present a three-step methodology for dynamically improving a statistical machine translation (SMT) system by incorporating human feedback in the form of free edits on the system translations. We target at feedback provided by casual users, which is typically error-prone. Thus, we first propose a filtering step to automatically identify the better user-edited translations and discard the useless ones. A second step produces a pivot-based alignment between source and user-edited sentences, focusing on the errors made by the system. Finally, a third step produces a new translation model and combines it linearly with the one from the original system. We perform a thorough evaluation on a real-world dataset collected from the Reverso.net translation service and show that every step in our methodology contributes significantly to improve a general purpose SMT system. Interestingly, the quality improvement is not only due to the increase of lexical coverage, but to a better lexical selection, reordering, and morphology. Finally, we show the robustness of the methodology by applying it to a different scenario, in which the new examples come from an automatically Web-crawled parallel corpus. Using exactly the same architecture and models provides again a significant improvement of the translation quality of a general purpose baseline SMT system.

show abstract

“…On the MT system side, research on adaptive approaches tailored to interactive SMT and CAT scenarios explored the online learning protocol (Littlestone, 1988) to improve various aspects of the decoding process (Cesa-Bianchi et al, 2008;Ortiz-Martínez et al, 2010;Martínez-Gómez et al, 2011;Martínez-Gómez et al, 2012;Mathur et al, 2013;Bertoldi et al, 2013).…”

Section: Related Workmentioning

confidence: 99%

Adaptive Quality Estimation for Machine Translation

Turchi¹,

Anastasopoulos²,

Souza³

et al. 2014

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

The automatic estimation of machine translation (MT) output quality is a hard task in which the selection of the appropriate algorithm and the most predictive features over reasonably sized training sets plays a crucial role. When moving from controlled lab evaluations to real-life scenarios the task becomes even harder. For current MT quality estimation (QE) systems, additional complexity comes from the difficulty to model user and domain changes. Indeed, the instability of the systems with respect to data coming from different distributions calls for adaptive solutions that react to new operating conditions. To tackle this issue we propose an online framework for adaptive QE that targets reactivity and robustness to user and domain changes. Contrastive experiments in different testing conditions involving user and domain changes demonstrate the effectiveness of our approach.

show abstract

Online adaptation strategies for statistical machine translation in post-editing scenarios

Cited by 32 publications

References 19 publications

Human Effort and Machine Learnability in Computer Aided Translation

Human Effort and Machine Learnability in Computer Aided Translation

Leveraging Online User Feedback to Improve Statistical Machine Translation

Adaptive Quality Estimation for Machine Translation

Contact Info

Product

Resources

About