Deep Neural Machine Translation with Linear Associative Unit

Wang, Mingxuan; Lu, Zhengdong; Zhou, Jie; Li, Qun

doi:10.18653/v1/p17-1013

Cited by 49 publications

(42 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This enables the top module to have direct access to both the low-level input signals from the word embedding and high-level information generated by the bottom module. Similar principles can be found in Wang et al (2017); .…”

Section: Approachsupporting

confidence: 72%

Depth Growing for Neural Machine Translation

Wu¹,

Wang²,

Xia³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

While very deep neural networks have shown effectiveness for computer vision and text classification applications, how to increase the network depth of neural machine translation (NMT) models for better translation quality remains a challenging problem. Directly stacking more blocks to the NMT model results in no improvement and even reduces performance. In this work, we propose an effective two-stage approach with three specially designed components to construct deeper NMT models, which results in significant improvements over the strong Transformer baselines on WMT14 English→German and English→French translation tasks 1 .

show abstract

Section: Approachsupporting

confidence: 72%

Depth Growing for Neural Machine Translation

Wu¹,

Wang²,

Xia³

et al. 2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

show abstract

“…Unlike the above translation task, the WMT14 English-French translation task provides a significant larger dataset. The full training data have approximately 36M sentence pairs, from which we only used 12M instances for experiments following previous work (Jean et al, 2015;Gehring et al, 2017a;Luong et al, 2015b;Wang et al, 2017a). We show the results in Table 3.…”

Section: Results On English-french Translationmentioning

confidence: 99%

“…• Coverage (Wang et al, 2017): an attentionbased NMT system enhanced with a coverage mechanism to handle the over-translation and under-translation problem.…”

Section: Discussionmentioning

confidence: 99%

Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Networks

Zhang¹,

Xiong²,

Su³

et al. 2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

In this paper, we propose an additionsubtraction twin-gated recurrent network (ATR) to simplify neural machine translation. The recurrent units of ATR are heavily simplified to have the smallest number of weight matrices among units of all existing gated RNNs.With the simple addition and subtraction operation, we introduce a twin-gated mechanism to build input and forget gates which are highly correlated. Despite this simplification, the essential non-linearities and capability of modeling long-distance dependencies are preserved. Additionally, the proposed ATR is more transparent than LSTM/GRU due to the simplification. Forward self-attention can be easily established in ATR, which makes the proposed network interpretable. Experiments on WMT14 translation tasks demonstrate that ATR-based neural machine translation can yield competitive performance on English-German and English-French language pairs in terms of both translation quality and speed. Further experiments on NIST Chinese-English translation, natural language inference and Chinese word segmentation verify the generality and applicability of ATR on different natural language processing tasks.

show abstract

“…On this task, DeepLAU (Wang et al, 2017b) is chosen as the baseline and also used as the pretrained model. We list the translation performance of our models and some existing NMT systems in (Gehring et al, 2017) and Transformer (Vaswani et al, 2017) which have much deeper architectures with relative much more parameters.…”

Section: Results On English-german Translationmentioning

confidence: 99%

Reference Network for Neural Machine Translation

Liu

Sun

2019

Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

Neural Machine Translation (NMT) has achieved notable success in recent years. Such a framework usually generates translations in isolation. In contrast, human translators often refer to reference data, either rephrasing the intricate sentence fragments with common terms in source language, or just accessing to the golden translation directly. In this paper, we propose a Reference Network to incorporate referring process into translation decoding of NMT. To construct a reference book, an intuitive way is to store the detailed translation history with extra memory, which is computationally expensive. Instead, we employ Local Coordinates Coding (LCC) to obtain global context vectors containing monolingual and bilingual contextual information for NMT decoding. Experimental results on Chinese-English and English-German tasks demonstrate that our proposed model is effective in improving the translation quality with lightweight computation cost.

show abstract

Deep Neural Machine Translation with Linear Associative Unit

Cited by 49 publications

References 27 publications

Depth Growing for Neural Machine Translation

Depth Growing for Neural Machine Translation

Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Networks

Reference Network for Neural Machine Translation

Contact Info

Product

Resources

About