Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) 2022
DOI: 10.18653/v1/2022.acl-long.528
|View full text |Cite
|
Sign up to set email alerts
|

Prediction Difference Regularization against Perturbation for Neural Machine Translation

Abstract: Regularization methods applying input perturbation have drawn considerable attention and have been frequently explored for NMT tasks in recent years. Despite their simplicity and effectiveness, we argue that these methods are limited by the under-fitting of training data. In this paper, we utilize prediction difference for ground-truth tokens to analyze the fitting of token-level samples and find that underfitting is almost as common as over-fitting. We introduce prediction difference regularization (PD-R), a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(3 citation statements)
references
References 23 publications
0
3
0
Order By: Relevance
“…Since it is difficult to train an end-to-end ST model directly, some training techniques like pretraining (Weiss et al, 2017;Berard et al, 2018;Bansal et al, 2019;Stoian et al, 2020;Wang et al, 2020b;Dong et al, 2021a;Alinejad and Sarkar, 2020;Zheng et al, 2021b;, multi-task learning (Le et al, 2020;Vydana et al, 2021;Tang et al, 2021b;Ye et al, 2021;Tang et al, 2021a), curriculum learning (Kano et al, 2017;Wang et al, 2020c), and meta-learning (Indurthi et al, 2020) have been applied. Recent work has introduced mixup on machine translation (Zhang et al, 2019b;Guo et al, 2022;Fang and Feng, 2022), sentence classification (Chen et al, 2020;Jindal et al, 2020;Sun et al, 2020), multilingual understanding , and speech recognition (Medennikov et al, 2018;Sun et al, 2021;Lam et al, 2021a;Meng et al, 2021), and obtained enhancements.…”
Section: Can the Final Model Still Perform Mt Task?mentioning
confidence: 99%
“…Since it is difficult to train an end-to-end ST model directly, some training techniques like pretraining (Weiss et al, 2017;Berard et al, 2018;Bansal et al, 2019;Stoian et al, 2020;Wang et al, 2020b;Dong et al, 2021a;Alinejad and Sarkar, 2020;Zheng et al, 2021b;, multi-task learning (Le et al, 2020;Vydana et al, 2021;Tang et al, 2021b;Ye et al, 2021;Tang et al, 2021a), curriculum learning (Kano et al, 2017;Wang et al, 2020c), and meta-learning (Indurthi et al, 2020) have been applied. Recent work has introduced mixup on machine translation (Zhang et al, 2019b;Guo et al, 2022;Fang and Feng, 2022), sentence classification (Chen et al, 2020;Jindal et al, 2020;Sun et al, 2020), multilingual understanding , and speech recognition (Medennikov et al, 2018;Sun et al, 2021;Lam et al, 2021a;Meng et al, 2021), and obtained enhancements.…”
Section: Can the Final Model Still Perform Mt Task?mentioning
confidence: 99%
“…Neural machine translation (NMT) (Bahdanau et al, 2014) has made great progress in recent years (Barrault et al, 2020;Guo et al, 2022). However, as the input text exceeds a single sentence, sentence-level NMT methods will fail to capture discourse phenomena, such as pronominal anaphora, lexical consistency, and document coherence.…”
Section: Introductionmentioning
confidence: 99%
“…The encoder "understands" the sentence in the source language to form a dimension-specific floatingpoint vector from which the decoder generates a word-by-word translation of the target language. In its infancy, RNN 5 , LSTM 6 , GRU 7 , and other structures were widely used as network structures for encoder and decoder in NMT 8 . In 2017, the Transformer 9 came out, which not only dramatically surpasses RNN-based neural networks in translation effect but achieves training efficiency through parallelization of training.…”
Section: Introductionmentioning
confidence: 99%