Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 2021
DOI: 10.18653/v1/2021.findings-acl.261
|View full text |Cite
|
Sign up to set email alerts
|

Exploring Unsupervised Pretraining Objectives for Machine Translation

Abstract: Unsupervised cross-lingual pretraining has achieved strong results in neural machine translation (NMT), by drastically reducing the need for large parallel data. Most approaches adapt masked-language modeling (MLM) to sequence-to-sequence architectures, by masking parts of the input and reconstructing them in the decoder. In this work, we systematically compare masking with alternative objectives that produce inputs resembling real (full) sentences, by reordering and replacing words based on their context. We … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(5 citation statements)
references
References 31 publications
1
4
0
Order By: Relevance
“…By contrast, when we add word replacements in the encoder, it greatly reduces Lit-TER in both splits. This aligns with the findings of Baziotis et al (2021), who show that source-side word replacements make the decoder less prone to copying (or "trusting") the encoder.…”
Section: Targeted Evaluationsupporting
confidence: 89%
See 2 more Smart Citations
“…By contrast, when we add word replacements in the encoder, it greatly reduces Lit-TER in both splits. This aligns with the findings of Baziotis et al (2021), who show that source-side word replacements make the decoder less prone to copying (or "trusting") the encoder.…”
Section: Targeted Evaluationsupporting
confidence: 89%
“…Masking yields no effect on the zero split, but increases errors on the joint split. Baziotis et al (2021), show that masking promotes copying, which we speculate it could lead to word-by-word translation and increase Lit-TER. Decoder-side word replacements yield a similar behaviour in terms of LitTER, which we hypothesize could push the decoder to rely more on the encoder, therefore encouraging word-by-word translation.…”
Section: Targeted Evaluationmentioning
confidence: 75%
See 1 more Smart Citation
“…We refer to this loss as encoder-based MLM loss (eMLM; Baziotis et al 2021). It trains the encoder to reconstruct input text representations while attending to multimodal information.…”
Section: Self-supervised Auxiliary Guidancementioning
confidence: 99%
“…Ref. [29] explored many unsupervised pre-training objectives and systematically analyzed them in both supervised and unsupervised settings.…”
Section: Unsupervised Pre-trainingmentioning
confidence: 99%