Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers) 2017
DOI: 10.18653/v1/p17-1013
|View full text |Cite
|
Sign up to set email alerts
|

Deep Neural Machine Translation with Linear Associative Unit

Abstract: Deep Neural Networks (DNNs) have provably enhanced the state-of-the-art Neural Machine Translation (NMT) with their capability in modeling complex functions and capturing complex linguistic structures. However NMT systems with deep architecture in their encoder or decoder RNNs often suffer from severe gradient diffusion due to the non-linear recurrent activations, which often make the optimization much more difficult. To address this problem we propose novel linear associative units (LAU) to reduce the gradien… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
41
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 49 publications
(42 citation statements)
references
References 27 publications
1
41
0
Order By: Relevance
“…This enables the top module to have direct access to both the low-level input signals from the word embedding and high-level information generated by the bottom module. Similar principles can be found in Wang et al (2017); .…”
Section: Approachsupporting
confidence: 72%
“…This enables the top module to have direct access to both the low-level input signals from the word embedding and high-level information generated by the bottom module. Similar principles can be found in Wang et al (2017); .…”
Section: Approachsupporting
confidence: 72%
“…Unlike the above translation task, the WMT14 English-French translation task provides a significant larger dataset. The full training data have approximately 36M sentence pairs, from which we only used 12M instances for experiments following previous work (Jean et al, 2015;Gehring et al, 2017a;Luong et al, 2015b;Wang et al, 2017a). We show the results in Table 3.…”
Section: Results On English-french Translationmentioning
confidence: 99%
“…• Coverage (Wang et al, 2017): an attentionbased NMT system enhanced with a coverage mechanism to handle the over-translation and under-translation problem.…”
Section: Discussionmentioning
confidence: 99%
“…On this task, DeepLAU (Wang et al, 2017b) is chosen as the baseline and also used as the pretrained model. We list the translation performance of our models and some existing NMT systems in (Gehring et al, 2017) and Transformer (Vaswani et al, 2017) which have much deeper architectures with relative much more parameters.…”
Section: Results On English-german Translationmentioning
confidence: 99%