Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conferen 2019
DOI: 10.18653/v1/d19-1294
|View full text |Cite
|
Sign up to set email alerts
|

Evaluating Pronominal Anaphora in Machine Translation: An Evaluation Measure and a Test Suite

Abstract: The ongoing neural revolution in machine translation has made it easier to model larger contexts beyond the sentence-level, which can potentially help resolve some discourse-level ambiguities such as pronominal anaphora, thus enabling better translations. Unfortunately, even when the resulting improvements are seen as substantial by humans, they remain virtually unnoticed by traditional automatic evaluation measures like BLEU, as only a few words end up being affected. Thus, specialized evaluation measures are… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
17
0

Year Published

2020
2020
2021
2021

Publication Types

Select...
4
2

Relationship

1
5

Authors

Journals

citations
Cited by 15 publications
(17 citation statements)
references
References 31 publications
(26 reference statements)
0
17
0
Order By: Relevance
“…Although neural machine translation (NMT) has achieved great progress in recent years (Cho et al, 2014;Bahdanau et al, 2015;Luong et al, 2015;Vaswani et al, 2017), when fed an entire document, standard NMT systems translate sentences in isolation without considering the cross-sentence dependencies. Consequently, document-level neural machine translation (DocNMT) methods are proposed to utilize source-side or target-side intersentence contextual information to improve translation quality over sentences in a document (Jean et al, 2017;Wang et al, 2017;Tiedemann and Scherrer, 2017;Tu et al, 2018;Kuang et al, 2018;Junczys-Dowmunt, 2019;Ma et al, 2020 More recently, researchers of DocNMT mainly focus on exploring various attention-based networks to leverage the cross-sentence context efficiently, and evaluate the special discourse phenomena (Bawden et al, 2018;Müller et al, 2018;Voita et al, 2019b;Jwalapuram et al, 2019). However, there is still an issue that has received less attention: which context sentences should be used when translating a source sentence?…”
Section: Introductionmentioning
confidence: 99%
“…Although neural machine translation (NMT) has achieved great progress in recent years (Cho et al, 2014;Bahdanau et al, 2015;Luong et al, 2015;Vaswani et al, 2017), when fed an entire document, standard NMT systems translate sentences in isolation without considering the cross-sentence dependencies. Consequently, document-level neural machine translation (DocNMT) methods are proposed to utilize source-side or target-side intersentence contextual information to improve translation quality over sentences in a document (Jean et al, 2017;Wang et al, 2017;Tiedemann and Scherrer, 2017;Tu et al, 2018;Kuang et al, 2018;Junczys-Dowmunt, 2019;Ma et al, 2020 More recently, researchers of DocNMT mainly focus on exploring various attention-based networks to leverage the cross-sentence context efficiently, and evaluate the special discourse phenomena (Bawden et al, 2018;Müller et al, 2018;Voita et al, 2019b;Jwalapuram et al, 2019). However, there is still an issue that has received less attention: which context sentences should be used when translating a source sentence?…”
Section: Introductionmentioning
confidence: 99%
“…To eliminate word alignment errors, we compare this overlap over the set of dictionarymatched target pronouns, in contrast to the set of target words aligned to a given source pronoun as done by AutoPRF and APT. two measures which rely on computing pronoun overlap between the target and reference translation, we employ an ELMo-based (Peters et al, 2018) evaluation framework that distinguishes between a good and a bad translation via pairwise ranking (Jwalapuram et al, 2019). We use the CRC setting of this metric which considers the same reference context (one previous and one next sentence) for both reference and system translations.…”
Section: Discussionmentioning
confidence: 99%
“…. Common Reference Context (CRC)(Jwalapuram et al, 2019). In addition to the previous 9 https://github.com/idiap/APT…”
mentioning
confidence: 99%
“…Our work is related to adversarial datasets for testing robustness used in Natural Language Processing tasks such as studying gender bias (Zhao et al, 2018;Rudinger et al, 2018;Stanovsky et al, 2019), natural language inference (Glockner et al, 2018) and classification (Wang et al, 2019). Jwalapuram et al (2019) propose a model for pronoun translation evaluation trained on pairs of sentences consisting of the reference and a system output with differing pronouns. However, as Guillou and Hardmeier (2018) point out, this fails to take into account that often there is not a 1:1 correspondence between pronouns in different languages.…”
Section: Coreference Resolution In Machine Translationmentioning
confidence: 99%