Alignment-Based Neural Machine Translation

Alkhouli, Tamer; Bretschner, Gabriel; Peter, J. Dinesh; Hethnawi, Mohammed; Guta, Andreas; Ney, Hermann

doi:10.18653/v1/w16-2206

Cited by 37 publications

(42 citation statements)

References 25 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…W m projects the features of x k to a relatively low dimension for reducing computational overhead, and W m projects the aggregated features back to the same dimension as y q . 1 For 2-d image data, we separately encode the x-axis relative position R X k−q and y-axis relative position R Y k−q , and concatenate them to be the final encoding…”

Section: Transformer Attentionmentioning

confidence: 99%

An Empirical Study of Spatial Attention Mechanisms in Deep Networks

Zhu

Cheng

Zhang³

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

365

145

View full text Add to dashboard Cite

Attention mechanisms have become a popular component in deep neural networks, yet there has been little examination of how different influencing factors and methods for computing attention from these factors affect performance. Toward a better general understanding of attention mechanisms, we present an empirical study that ablates various spatial attention elements within a generalized attention formulation, encompassing the dominant Transformer attention as well as the prevalent deformable convolution and dynamic convolution modules. Conducted on a variety of applications, the study yields significant findings about spatial attention in deep networks, some of which run counter to conventional understanding. For example, we find that the comparison of query and key content in Transformer attention is negligible for self-attention, but vital for encoder-decoder attention. On the other hand, a proper combination of deformable convolution with key content saliency achieves the best accuracy-efficiency tradeoff in self-attention. Our results suggest that there exists much room for improvement in the design of attention mechanisms.

show abstract

Section: Transformer Attentionmentioning

confidence: 99%

An Empirical Study of Spatial Attention Mechanisms in Deep Networks

Zhu

Cheng

Zhang³

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

365

145

View full text Add to dashboard Cite

show abstract

“…(2). Following (Alkhouli et al, 2016), the alignment model predicts the relative jump ∆ i = b i − b i−1 from the previous source position b i−1 to the current source position b i . This model has a bidirectional source encoder consisting of two recurrent layers (yellow), and a recurrent layer maintaining the target state (red).…”

Section: Recurrent Alignment Modelmentioning

confidence: 99%

“…Nowadays, HMM is used with IBM models to generate word alignments, which are needed to train phrase-based systems. Alkhouli et al (2016) and Wang et al (2017) apply the hidden Markov model decomposition using feedforward lexical and alignment neural network models. In this work, we are interested in using more expressive models.…”

Section: Introductionmentioning

confidence: 99%

Biasing Attention-Based Recurrent Neural Networks Using External Alignment Information

Alkhouli¹,

Ney²

2017

Proceedings of the Second Conference on Machine Translation

Self Cite

View full text Add to dashboard Cite

This work explores extending attentionbased neural models to include alignment information as input. We modify the attention component to have dependence on the current source position. The attention model is then used as a lexical model together with an additional alignment model to generate translation. The attention model is trained using external alignment information, and it is applied in decoding by performing beam search over the lexical and alignment hypotheses. The alignment model is used to score these alignment candidates. We demonstrate that the attention layer is capable of using the alignment information to improve over the baseline attention model that uses no such alignments. Our experiments are performed on two tasks: WMT 2016 English→Romanian and WMT 2017 German→English.

show abstract

“…Our feed-forward alignment model has the same architecture ( Figure 1) as the one proposed in (Alkhouli et al, 2016). Thus the alignment probability can be modeled by:…”

Section: Definition Of Neural Network-based Hmmmentioning

confidence: 99%

Hybrid Neural Network Alignment and Lexicon Model in Direct HMM for Statistical Machine Translation

Wang¹,

Alkhouli²,

Zhu³

et al. 2017

Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 2: Short Papers)

Self Cite

View full text Add to dashboard Cite

Recently, the neural machine translation systems showed their promising performance and surpassed the phrase-based systems for most translation tasks. Retreating into conventional concepts machine translation while utilizing effective neural models is vital for comprehending the leap accomplished by neural machine translation over phrase-based methods. This work proposes a direct hidden Markov model (HMM) with neural network-based lexicon and alignment models, which are trained jointly using the Baum-Welch algorithm. The direct HMM is applied to rerank the n-best list created by a state-of-the-art phrase-based translation system and it provides improvements by up to 1.0% BLEU scores on two different translation tasks.

show abstract

Alignment-Based Neural Machine Translation

Cited by 37 publications

References 25 publications

An Empirical Study of Spatial Attention Mechanisms in Deep Networks

An Empirical Study of Spatial Attention Mechanisms in Deep Networks

Biasing Attention-Based Recurrent Neural Networks Using External Alignment Information

Hybrid Neural Network Alignment and Lexicon Model in Direct HMM for Statistical Machine Translation

Contact Info

Product

Resources

About