The MuCoW Test Suite at WMT 2019: Automatically Harvested Multilingual Contrastive Word Sense Disambiguation Test Sets for Machine Translation

Raganato, Alessandro; Scherrer, Yves; Tiedemann, Jörg

doi:10.18653/v1/w19-5354

Cited by 28 publications

(26 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Table 4 gives the best accuracy on in-domain and out-of-domain test sets. The accuracy of both models has a substantial drop on the out-of-domain test set which is consistent with the finding from Raganato et al (2019). The drop of char-d7 is even bigger than that of bpe-d7 which indicates that CHAR models are not more robust to domain-mismatch when learning word senses compared to BPE-based models.…”

Section: Robustness To Domain-mismatchsupporting

confidence: 82%

See 1 more Smart Citation

Understanding Pure Character-Based Neural Machine Translation: The Case of Translating Finnish into English

Tang¹,

Sennrich²,

Nivre³

2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

Recent work has shown that deeper character-based neural machine translation (NMT) models can outperform subword-based models. However, it is still unclear what makes deeper character-based models successful. In this paper, we conduct an investigation into pure character-based models in the case of translating Finnish into English, including exploring the ability to learn word senses and morphological inflections and the attention mechanism. We demonstrate that word-level information is distributed over the entire character sequence rather than over a single character, and characters at different positions play different roles in learning linguistic knowledge. In addition, character-based models need more layers to encode word senses which explains why only deeper models outperform subword-based models. The attention distribution pattern shows that separators attract a lot of attention and we explore a sparse word-level attention to enforce character hidden states to capture the full word-level information. Experimental results show that the word-level attention with a single head results in 1.2 BLEU points drop.

show abstract

Section: Robustness To Domain-mismatchsupporting

confidence: 82%

“…For the WSD probing task, we use the FI-EN part of the MuCoW (Raganato et al, 2019) test set, which is a multilingual test suite for WSD in the WMT19 shared task. It has 2,117 annotated sentences.…”

Section: Datamentioning

confidence: 99%

Understanding Pure Character-Based Neural Machine Translation: The Case of Translating Finnish into English

Tang¹,

Sennrich²,

Nivre³

2020

Proceedings of the 28th International Conference on Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…In this respect, a model with predefined fixed patterns may struggle to encode global semantic features. To this end, we evaluate our models on two German-English WSD test suites, ContraWSD (Rios Gonzales et al, 2017) and MuCoW (Raganato et al, 2019). 9 Table 6 shows the performance of our models on the WSD benchmarks.…”

Section: Word Sense Disambiguationmentioning

confidence: 99%

Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation

Raganato

Scherrer

Tiedemann

2020

Findings of the Association for Computational Linguistics: EMNLP 2020

Self Cite

View full text Add to dashboard Cite

Transformer-based models have brought a radical change to neural machine translation. A key feature of the Transformer architecture is the so-called multi-head attention mechanism, which allows the model to focus simultaneously on different parts of the input. However, recent works have shown that most attention heads learn simple, and often redundant, positional patterns. In this paper, we propose to replace all but one attention head of each encoder layer with simple fixed -non-learnable -attentive patterns that are solely based on position and do not require any external knowledge. Our experiments with different data sizes and multiple language pairs show that fixing the attention heads on the encoder side of the Transformer at training time does not impact the translation quality and even increases BLEU scores by up to 3 points in low-resource scenarios.

show abstract

“…We next compile a parallel lexicon of homograph translations, prioritizing a high coverage of all possible senses. Similar to (Raganato et al, 2019), we obtain sense-specific translations from crosslingual BabelNet (Navigli and Ponzetto, 2010) synsets. Since BabelNet entries vary in their granularity, we iteratively merge related synsets as long as they have at least three German translations in common or share at least one definition.…”

Section: Resource Collectionmentioning

confidence: 99%

Detecting Word Sense Disambiguation Biases in Machine Translation for Model-Agnostic Adversarial Attacks

Emelin¹,

Titov²,

Sennrich³

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Word sense disambiguation is a well-known source of translation errors in NMT. We posit that some of the incorrect disambiguation choices are due to models' over-reliance on dataset artifacts found in training data, specifically superficial word co-occurrences, rather than a deeper understanding of the source text. We introduce a method for the prediction of disambiguation errors based on statistical data properties, demonstrating its effectiveness across several domains and model types. Moreover, we develop a simple adversarial attack strategy that minimally perturbs sentences in order to elicit disambiguation errors to further probe the robustness of translation models. Our findings indicate that disambiguation robustness varies substantially between domains and that different models trained on the same data are vulnerable to different attacks. 1

show abstract

The MuCoW Test Suite at WMT 2019: Automatically Harvested Multilingual Contrastive Word Sense Disambiguation Test Sets for Machine Translation

Cited by 28 publications

References 30 publications

Understanding Pure Character-Based Neural Machine Translation: The Case of Translating Finnish into English

Understanding Pure Character-Based Neural Machine Translation: The Case of Translating Finnish into English

Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation

Detecting Word Sense Disambiguation Biases in Machine Translation for Model-Agnostic Adversarial Attacks

Contact Info

Product

Resources

About