A Large-Scale Test Set for the Evaluation of Context-Aware Pronoun Translation in Neural Machine Translation

Müller, Mathias; Rios, Annette; Voita, Elena; Sennrich, Rico

doi:10.18653/v1/w18-6307

Cited by 129 publications

(147 citation statements)

References 21 publications

Supporting

Mentioning

104

Contrasting

Unclassified

Order By: Relevance

“…The third rule that we conform to is to 1) create two contrastive source sentences for each lexical or syntactic ambiguity point, where each source sentence corresponds to one reasonable interpretation of the ambiguity point, and 2) to provide two contrastive translations for each created source sentence. This is similar to other linguistic evaluation by contrastive examples in the MT literature (Avramidis et al, 2019;Bawden et al, 2018;Müller et al, 2018;Sennrich, 2017). These two contrastive translations have similar wordings: one is correct and the other is not correct in that it translates the ambiguity part into the corresponding translation of the contrastive source sentence.…”

Section: Test Suite Designsupporting

confidence: 83%

The Box is in the Pen: Evaluating Commonsense Reasoning in Neural Machine Translation

He¹,

Wang²,

Xiong³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Does neural machine translation yield translations that are congenial with common sense? In this paper, we present a test suite to evaluate the commonsense reasoning capability of neural machine translation. The test suite consists of three test sets, covering lexical and contextless/contextual syntactic ambiguity that requires commonsense knowledge to resolve. We manually create 1,200 triples, each of which contain a source sentence and two contrastive translations, involving 7 different common sense types. Language models pretrained on large-scale corpora, such as BERT, GPT-2, achieve a commonsense reasoning accuracy of lower than 72% on target translations of this test suite. We conduct extensive experiments on the test suite to evaluate commonsense reasoning in neural machine translation and investigate factors that have impact on this capability. Our experiments and analyses demonstrate that neural machine translation performs poorly on commonsense reasoning of the three ambiguity types in terms of both reasoning accuracy ( 60.1%) and reasoning consistency ( 31%). We will release our test suite as a machine translation commonsense reasoning testbed to promote future work in this direction.

show abstract

Section: Test Suite Designsupporting

confidence: 83%

The Box is in the Pen: Evaluating Commonsense Reasoning in Neural Machine Translation

He¹,

Wang²,

Xiong³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

show abstract

“…As a result of this process, a translation pair can consist of multiple sentences as shown in Example (c) of Figure 1. We do not split them into single sentences, considering a recent trend of context-sensitive machine translation (Bawden et al, 2018;Müller et al, 2018;Zhang et al, 2018;Miculicich et al, 2018). One can use split sentences for training a model, but an important note is that there is no guarantee that all the internal sentences are perfectly aligned.…”

Section: Extracting Parallel Text Segmentsmentioning

confidence: 99%

“…Examples of widely-used datasets are those included in WMT (Bojar et al, 2018) and LDC 1 , while new evaluation datasets are being actively created (Michel and Neubig, 2018;Bawden et al, 2018; Müller et al, 2018). These existing datasets have mainly focused on translating plain text.…”

Section: Introductionmentioning

confidence: 99%

A High-Quality Multilingual Dataset for Structured Documentation Translation

Hashimoto¹,

Buschiazzo²,

Bradbury³

et al. 2019

Proceedings of the Fourth Conference on Machine Translation (Volume 1: Research Papers)

View full text Add to dashboard Cite

This paper presents a high-quality multilingual dataset for the documentation domain to advance research on localization of structured text. Unlike widely-used datasets for translation of plain text, we collect XML-structured parallel text segments from the online documentation for an enterprise software platform. These Web pages have been professionally translated from English into 16 languages and maintained by domain experts, and around 100,000 text segments are available for each language pair. 1 We build and evaluate translation models for seven target languages from English, with several different copy mechanisms and an XML-constrained beam search. We also experiment with a non-English pair to show that our dataset has the potential to explicitly enable 17 × 16 translation settings. Our experiments show that learning to translate with the XML tags improves translation accuracy, and the beam search accurately generates XML structures. We also discuss tradeoffs of using the copy mechanisms by focusing on translation of numerical words and named entities. We further provide a detailed human analysis of gaps between the model output and human translations for real-world applications, including suitability for post-editing. * Now at Google Brain. 1 Our new dataset is available at https://github. com/salesforce/localization-xml-mt. 2 https://www.ldc.upenn.edu/ -Example (a) English: You can use this report on your Community Management Home dashboard or in Community Workspaces under DashboardsHome . Japanese: このレポートは、 [コミュニティ管理 ] のホームのダッシュボード、またはコミュニティワークスペースの [ダッシュボード ] [ホーム] で使用できます。 -Example (b) English: Results with bothbeach and house in the searchable fields of the record. Japanese: レコードの検索可能な項目に beach と house の両方が含まれている結果。

show abstract

“…Lexical ambiguity as a challenge for machine translation has received a lot of attention in recent years. Rios Gonzales et al (2017) and Rios Gonzales et al (2018) focus on ambiguous German nouns, while Guillou et al (2018) and Müller et al (2018) investigate ambiguous English pronouns. Broader linguistic evaluations presented in Burchardt et al (2017) and Klubička et al (2018) also include ambiguity, but conjunctions are not mentioned in any context.…”

Section: Related Workmentioning

confidence: 99%

Are Ambiguous Conjunctions Problematic for Machine Translation?

Popović

Castilho

2019

Proceedings - Natural Language Processing in a Deep Learning World

View full text Add to dashboard Cite

The translation of ambiguous words still poses challenges for machine translation. In this work, we carry out a systematic quantitative analysis regarding the ability of different machine translation systems to disambiguate the source language conjunctions "but" and "and". We evaluate specialised test sets focused on the translation of these two conjunctions. The test sets contain source languages that do not distinguish different variants of the given conjunction, whereas the target languages do. In total, we evaluate the conjunction "but" on 20 translation outputs, and the conjunction "and" on 10. All machine translation systems almost perfectly recognise one variant of the target conjunction, especially for the source conjunction "but". The other target variant, however, represents a challenge for machine translation systems, with accuracy varying from 50% to 95% for "but" and from 20% to 57% for "and". The major error for all systems is replacing the correct target variant with the opposite one.

show abstract

A Large-Scale Test Set for the Evaluation of Context-Aware Pronoun Translation in Neural Machine Translation

Cited by 129 publications

References 21 publications

The Box is in the Pen: Evaluating Commonsense Reasoning in Neural Machine Translation

The Box is in the Pen: Evaluating Commonsense Reasoning in Neural Machine Translation

A High-Quality Multilingual Dataset for Structured Documentation Translation

Are Ambiguous Conjunctions Problematic for Machine Translation?

Contact Info

Product

Resources

About