2021
DOI: 10.1109/taslp.2020.3042006
|View full text |Cite
|
Sign up to set email alerts
|

Modeling Future Cost for Neural Machine Translation

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 17 publications
(6 citation statements)
references
References 29 publications
0
6
0
Order By: Relevance
“…Alternatively L Reg can be an objective from another task, known as multi-task learning. Translation-specific multi-task terms include a coverage term (Tu et al, 2016), a right-to-left translation objective (Zhang et al, 2019b), the 'future cost' of a partial translation (Duan et al, 2020), or a target language modelling objective (Gülcehre et al, 2015;Sriram et al, 2018;Stahlberg et al, 2018a). Another approach is dropout: randomly omitting a subset of parameters θ dropout from optimization for a training batch (Hinton et al, 2012).…”
Section: Objective Function Regularizationmentioning
confidence: 99%
“…Alternatively L Reg can be an objective from another task, known as multi-task learning. Translation-specific multi-task terms include a coverage term (Tu et al, 2016), a right-to-left translation objective (Zhang et al, 2019b), the 'future cost' of a partial translation (Duan et al, 2020), or a target language modelling objective (Gülcehre et al, 2015;Sriram et al, 2018;Stahlberg et al, 2018a). Another approach is dropout: randomly omitting a subset of parameters θ dropout from optimization for a training batch (Hinton et al, 2012).…”
Section: Objective Function Regularizationmentioning
confidence: 99%
“…Future Information Incorporation. There are numerous works [7,11,12,27,39,42,45] dived to exploit the future information to boost the performance for sequence-to-sequence learning. However, their modelings are different from ours.…”
Section: Related Workmentioning
confidence: 99%
“…[1,64] employ an extra teacher network to help the neural machine translation model capture global information with knowledge distillation. For [11,42,45], given the previous history, in addition to the current target, they further predict the future words, i.e., [11,42] one more step ahead, and [45] the rest of the sequence. [39] only consider the current target to model the future information and [48,50,55] regularize the right-to-left generation, while we directly leverage the effective knowledge to enhance the modeling of the future information.…”
Section: Related Workmentioning
confidence: 99%
“…However, most of the works rely on multi-pass decoding or specially customized decoding algorithms which leads to a significant increase in training and inference costs. On the other hand, modeling the global context in the reverse direction by pairing the conventional left-to-right image captioning model with a right-to-left auxiliary model is also delivered [11,48,50,55]. However, in these methods, the modeling of reverse context is still conditioned on the local context with a separate network and they cannot sufficiently encourage the image captioning model to exploit a truly flexible global context.…”
Section: Introductionmentioning
confidence: 99%