Modeling Future Cost for Neural Machine Translation

Duan, Chaoqun; Chen, Kehai; Wang, Rui; Utiyama, Masao; Sumita, Eiichiro; Zhu, Conghui; Zhao, Tiejun

doi:10.1109/taslp.2020.3042006

“…Alternatively L Reg can be an objective from another task, known as multi-task learning. Translation-specific multi-task terms include a coverage term (Tu et al, 2016), a right-to-left translation objective (Zhang et al, 2019b), the 'future cost' of a partial translation (Duan et al, 2020), or a target language modelling objective (Gülcehre et al, 2015;Sriram et al, 2018;Stahlberg et al, 2018a). Another approach is dropout: randomly omitting a subset of parameters θ dropout from optimization for a training batch (Hinton et al, 2012).…”

Section: Objective Function Regularizationmentioning

confidence: 99%

Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A Survey

Saunders¹

2022

jair

View full text Add to dashboard Cite

The development of deep learning techniques has allowed Neural Machine Translation (NMT) models to become extremely powerful, given sufficient training data and training time. However, systems struggle when translating text from a new domain with a distinct style or vocabulary. Fine-tuning on in-domain data allows good domain adaptation, but requires sufficient relevant bilingual data. Even if this is available, simple fine-tuning can cause overfitting to new data and catastrophic forgetting of previously learned behaviour. We survey approaches to domain adaptation for NMT, particularly where a system may need to translate across multiple domains. We divide techniques into those revolving around data selection or generation, model architecture, parameter adaptation procedure, and inference procedure. We finally highlight the benefits of domain adaptation and multidomain adaptation techniques to other lines of NMT research.

show abstract

“…Future Information Incorporation. There are numerous works [7,11,12,27,39,42,45] dived to exploit the future information to boost the performance for sequence-to-sequence learning. However, their modelings are different from ours.…”

Section: Related Workmentioning

confidence: 99%

Efficient Modeling of Future Context for Image Captioning

Fei

¹

2022

Proceedings of the 30th ACM International Conference on Multimedia

View full text Add to dashboard Cite

Existing approaches to image captioning usually generate the sentence word-by-word from left to right, with the constraint of conditioned on local context including the given image and history generated words. There have been many studies target to make use of global information during decoding, e.g., iterative refinement. However, it is still under-explored how to effectively and efficiently incorporate the future context. To respond to this issue, inspired by that Non-Autoregressive Image Captioning (NAIC) can leverage two-side relation with modified mask operation, we aim to graft this advance to the conventional Autoregressive Image Captioning (AIC) model while maintaining the inference efficiency without extra time cost. Specifically, AIC and NAIC models are first trained combined with shared visual encoders, forcing the visual encoder to contain sufficient and valid future context; then the AIC model is encouraged to capture the causal dynamics of cross-layer interchanging from NAIC model on its unconfident words, which follows a teacher-student paradigm and optimized with the distribution calibration training objective. Empirical evidences demonstrate that our proposed approach clearly surpass the state-of-the-art baselines in both automatic metrics and human evaluations on the MS COCO benchmark. The source code is available at: https://github.com/feizc/Future-Caption. CCS CONCEPTS• Computing methodologies → Computer vision; Natural language processing.

show abstract

“…[1,64] employ an extra teacher network to help the neural machine translation model capture global information with knowledge distillation. For [11,42,45], given the previous history, in addition to the current target, they further predict the future words, i.e., [11,42] one more step ahead, and [45] the rest of the sequence. [39] only consider the current target to model the future information and [48,50,55] regularize the right-to-left generation, while we directly leverage the effective knowledge to enhance the modeling of the future information.…”

Section: Related Workmentioning

confidence: 99%

“…However, most of the works rely on multi-pass decoding or specially customized decoding algorithms which leads to a significant increase in training and inference costs. On the other hand, modeling the global context in the reverse direction by pairing the conventional left-to-right image captioning model with a right-to-left auxiliary model is also delivered [11,48,50,55]. However, in these methods, the modeling of reverse context is still conditioned on the local context with a separate network and they cannot sufficiently encourage the image captioning model to exploit a truly flexible global context.…”

Section: Introductionmentioning

confidence: 99%

Efficient Modeling of Future Context for Image Captioning

Fei¹,

Huang²,

Wei³

et al. 2022

Preprint

0

View full text Add to dashboard Cite

Existing approaches to image captioning usually generate the sentence word-by-word from left to right, with the constraint of conditioned on local context including the given image and history generated words. There have been many studies target to make use of global information during decoding, e.g., iterative refinement. However, it is still under-explored how to effectively and efficiently incorporate the future context. To respond to this issue, inspired by that Non-Autoregressive Image Captioning (NAIC) can leverage two-side relation with modified mask operation, we aim to graft this advance to the conventional Autoregressive Image Captioning (AIC) model while maintaining the inference efficiency without extra time cost. Specifically, AIC and NAIC models are first trained combined with shared visual encoders, forcing the visual encoder to contain sufficient and valid future context; then the AIC model is encouraged to capture the causal dynamics of cross-layer interchanging from NAIC model on its unconfident words, which follows a teacher-student paradigm and optimized with the distribution calibration training objective. Empirical evidences demonstrate that our proposed approach clearly surpass the state-of-the-art baselines in both automatic metrics and human evaluations on the MS COCO benchmark. The source code is available at: https://github.com/feizc/Future-Caption. CCS CONCEPTS• Computing methodologies → Computer vision; Natural language processing.

show abstract

Modeling Future Cost for Neural Machine Translation

Cited by 17 publications

References 29 publications

Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A Survey

Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A Survey

Efficient Modeling of Future Context for Image Captioning

Efficient Modeling of Future Context for Image Captioning

Contact Info

Product

Resources

About