ContraCAT: Contrastive Coreference Analytical Templates for Machine Translation

Stojanovski, Dario; Krojer, Benno; Peskov, Denis; Fraser, Alexander

doi:10.18653/v1/2020.coling-main.417

Cited by 12 publications

(13 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Müller et al (2018) built a large-scale dataset for anaphoric pronoun resolution, Bawden et al (2018) manually created a dataset for both pronoun resolution and lexical choice and Voita et al (2019) created a dataset that targets deixis, ellipsis and lexical cohesion. Stojanovski et al (2020) showed through adversarial attacks that models that do well on other contrastive datasets rely on surface heuristics and create a contrastive dataset to address this. In contrast, our CXMI metric is phenomenon-agnostic and can be measured with respect to all phenomena that require context in translation.…”

Section: Related Workmentioning

confidence: 99%

Measuring and Increasing Context Usage in Context-Aware Machine Translation

Fernandes

Yin

Neubig

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Recent work in neural machine translation has demonstrated both the necessity and feasibility of using inter-sentential context -context from sentences other than those currently being translated. However, while many current methods present model architectures that theoretically can use this extra context, it is often not clear how much they do actually utilize it at translation time. In this paper, we introduce a new metric, conditional cross-mutual information, to quantify the usage of context by these models. Using this metric, we measure how much document-level machine translation systems use particular varieties of context. We find that target context is referenced more than source context, and that conditioning on a longer context has a diminishing effect on results. We then introduce a new, simple training method, context-aware word dropout, to increase the usage of context by context-aware models. Experiments show that our method increases context usage and that this reflects on the translation quality according to metrics such as BLEU and COMET, as well as performance on anaphoric pronoun resolution and lexical cohesion contrastive datasets. 1

show abstract

Section: Related Workmentioning

confidence: 99%

Measuring and Increasing Context Usage in Context-Aware Machine Translation

Fernandes

Yin

Neubig

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

show abstract

“…Such undesirable behaviour is facilitated by dataset biases that models are exposed to during training (Emelin et al, 2020). In their study of coreference, (Stojanovski et al, 2020) indicate that gender and positional biases can influence model behavior. To verify whether this is the case for cross-lingual Winograd schemas, we examine how strongly pronoun gender and the relative antecedent position correlates with model preference.…”

Section: Resultsmentioning

confidence: 99%

“…Similarly, the study of coreference has a long tradition in machine translation. Several CoR datasets have been proposed in the past, including (Guillou and Hardmeier, 2016;Bawden et al, 2018;Müller et al, 2018;Stojanovski et al, 2020). Among those, that of (Stojanovski et al, 2020) is most relevant to our work.…”

Section: Related Workmentioning

confidence: 99%

“…Several CoR datasets have been proposed in the past, including (Guillou and Hardmeier, 2016;Bawden et al, 2018;Müller et al, 2018;Stojanovski et al, 2020). Among those, that of (Stojanovski et al, 2020) is most relevant to our work. While it contains samples that require world knowledge to resolve coreference, they are constructed from a fixed set of templates and remain limited to EN-DE.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Wino-X: Multilingual Winograd Schemas for Commonsense Reasoning and Coreference Resolution

Emelin¹,

Sennrich²

2021

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Winograd schemas are a well-established tool for evaluating coreference resolution (CoR) and commonsense reasoning (CSR) capabilities of computational models. So far, schemas remained largely confined to English, limiting their utility in multilingual settings. This work presents Wino-X, a parallel dataset of German, French, and Russian schemas, aligned with their English counterparts. We use this resource to investigate whether neural machine translation (NMT) models can perform CoR that requires commonsense knowledge and whether multilingual language models (MLLMs) are capable of CSR across multiple languages. Our findings show Wino-X to be exceptionally challenging for NMT systems that are prone to undesireable biases and unable to detect disambiguating information. We quantify biases using established statistical methods and define ways to address both of these issues. We furthermore present evidence of active cross-lingual knowledge transfer in MLLMs, whereby fine-tuning models on English schemas yields CSR improvements in other languages. 1

show abstract

“…in a sequence-to-sequence task, human-written pairs (or pairs that are machinegenerated to be different from the training distribution on purpose) may tell us more about the robustness of models outside the mode. For example, terminology-constrained or interactive applications depend on robustness against improbable contexts, and contrastive evaluation indicates that current NMT systems lack such robustness (Stojanovski et al, 2020). Similarly, syntactic evaluation of language models using randomly generated or nonsensical sentences (Gulordava et al, 2018;Warstadt et al, 2020) can be seen as method to assess the robustness of a model under improbable input, rather than as an assessment of generative capabilities in general.…”

Section: Error Typementioning

confidence: 99%

On the Limits of Minimal Pairs in Contrastive Evaluation

Vamvas¹,

Sennrich²

2021

Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP

View full text Add to dashboard Cite

Minimal sentence pairs are frequently used to analyze the behavior of language models. It is often assumed that model behavior on contrastive pairs is predictive of model behavior at large. We argue that two conditions are necessary for this assumption to hold: First, a tested hypothesis should be well-motivated, since experiments show that contrastive evaluation can lead to false positives. Secondly, test data should be chosen such as to minimize distributional discrepancy between evaluation time and deployment time. For a good approximation of deployment-time decoding, we recommend that minimal pairs are created based on machine-generated text, as opposed to humanwritten references. We present a contrastive evaluation suite for English-German MT that implements this recommendation. 1

show abstract

ContraCAT: Contrastive Coreference Analytical Templates for Machine Translation

Cited by 12 publications

References 24 publications

Measuring and Increasing Context Usage in Context-Aware Machine Translation

Measuring and Increasing Context Usage in Context-Aware Machine Translation

Wino-X: Multilingual Winograd Schemas for Commonsense Reasoning and Coreference Resolution

On the Limits of Minimal Pairs in Contrastive Evaluation

Contact Info

Product

Resources

About