A Context-Aware Recurrent Encoder for Neural Machine Translation

Zhang, Biao; Xiong, Deyi; Su, Jinsong; Duan, Hongliang

doi:10.1109/taslp.2017.2751420

Cited by 62 publications

(26 citation statements)

References 14 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In particular, our single model yields a detokenized BLEU score of 21.99. In order to show that the proposed model can be orthogonal to previous methods that improve LSTM/GRU-based NMT, we integrate a singlelayer context-aware (CA) encoder (Zhang et al, 2017b) into our system. The ATR+CA system further reaches 22.7 BLEU, outperforming the winner system (Buck et al, 2014) by a substantial improvement of 2 BLEU points.…”

Section: Results On English-german Translationmentioning

confidence: 99%

See 1 more Smart Citation

Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Networks

Zhang¹,

Xiong²,

Su³

et al. 2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Self Cite

View full text Add to dashboard Cite

In this paper, we propose an additionsubtraction twin-gated recurrent network (ATR) to simplify neural machine translation. The recurrent units of ATR are heavily simplified to have the smallest number of weight matrices among units of all existing gated RNNs.With the simple addition and subtraction operation, we introduce a twin-gated mechanism to build input and forget gates which are highly correlated. Despite this simplification, the essential non-linearities and capability of modeling long-distance dependencies are preserved. Additionally, the proposed ATR is more transparent than LSTM/GRU due to the simplification. Forward self-attention can be easily established in ATR, which makes the proposed network interpretable. Experiments on WMT14 translation tasks demonstrate that ATR-based neural machine translation can yield competitive performance on English-German and English-French language pairs in terms of both translation quality and speed. Further experiments on NIST Chinese-English translation, natural language inference and Chinese word segmentation verify the generality and applicability of ATR on different natural language processing tasks.

show abstract

Section: Results On English-german Translationmentioning

confidence: 99%

“…"RL" and "WPM" is the reinforcement learning optimization and word piece model used in . "CA" is the context-aware recurrent encoder (Zhang et al, 2017b). "LAU" and "F-F" denote the linear associative unit and the fast-forward architecture proposed by Wang et al (2017a) and Zhou et al (2016) respectively.…”

Section: Trainingmentioning

confidence: 99%

Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Networks

Zhang¹,

Xiong²,

Su³

et al. 2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Self Cite

View full text Add to dashboard Cite

show abstract

“…W ORD embedding is a real-valued vector representation of words by embedding both semantic and syntactic meanings obtained from unlabeled large corpus. It is a powerful tool widely used in modern natural language processing (NLP) tasks, including semantic analysis [1], information retrieval [2], dependency parsing [3], [4], [5], question answering [6], [7] and machine translation [6], [8], [9]. Learning a high quality representation is extremely important for these tasks, yet the question "what is a good word embedding model" remains an open problem.…”

Section: Introductionmentioning

confidence: 99%

Evaluating word embedding models: methods and experimental results

Wang

Wang²,

Chen

et al. 2019

SIP

235

107

View full text Add to dashboard Cite

Extensive evaluation on a large number of word embedding models for language processing applications is conducted in this work. First, we introduce popular word embedding models and discuss desired properties of word models and evaluation methods (or evaluators). Then, we categorize evaluators into intrinsic and extrinsic two types. Intrinsic evaluators test the quality of a representation independent of specific natural language processing tasks while extrinsic evaluators use word embeddings as input features to a downstream task and measure changes in performance metrics specific to that task. We report experimental results of intrinsic and extrinsic evaluators on six word embedding models. It is shown that different evaluators focus on different aspects of word models, and some are more correlated with natural language processing tasks. Finally, we adopt correlation analysis to study performance consistency of extrinsic and intrinsic evalutors.

show abstract

“…Indeed, the model proposed by Choi et al (2017) attempts to improve NMT by integrating context vectors associated to source words into the generation process during decoding. The model proposed by Zhang et al (2017) is aware of previous attended words on the source side in order to better predict which words will be attended in future. The self-attentive residual decoder designed by Werlen et al (2018) leverages the contextual information from previously translated words on the target side.…”

Section: Resultsmentioning

confidence: 99%

Integrating Weakly Supervised Word Sense Disambiguation into Neural Machine Translation

Pappas

Henderson

et al. 2018

TACL

View full text Add to dashboard Cite

This paper demonstrates that word sense disambiguation (WSD) can improve neural machine translation (NMT) by widening the source context considered when modeling the senses of potentially ambiguous words. We first introduce three adaptive clustering algorithms for WSD, based on k-means, Chinese restaurant processes, and random walks, which are then applied to large word contexts represented in a low-rank space and evaluated on SemEval shared-task data. We then learn word vectors jointly with sense vectors defined by our best WSD method, within a state-of-the-art NMT system. We show that the concatenation of these vectors, and the use of a sense selection mechanism based on the weighted average of sense vectors, outperforms several baselines including sense-aware ones. This is demonstrated by translation on five language pairs. The improvements are above one BLEU point over strong NMT baselines, +4% accuracy over all ambiguous nouns and verbs, or +20% when scored manually over several challenging words.

show abstract

A Context-Aware Recurrent Encoder for Neural Machine Translation

Cited by 62 publications

References 14 publications

Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Networks

Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Networks

Evaluating word embedding models: methods and experimental results

Integrating Weakly Supervised Word Sense Disambiguation into Neural Machine Translation

Contact Info

Product

Resources

About