Proceedings of the 55th Annual Meeting of the Association For Computational Linguistics (Volume 1: Long Papers) 2017
DOI: 10.18653/v1/p17-1012
|View full text |Cite
|
Sign up to set email alerts
|

A Convolutional Encoder Model for Neural Machine Translation

Abstract: The prevalent approach to neural machine translation relies on bi-directional LSTMs to encode the source sentence. We present a faster and simpler architecture based on a succession of convolutional layers. This allows to encode the source sentence simultaneously compared to recurrent networks for which computation is constrained by temporal dependencies. On WMT'16 EnglishRomanian translation we achieve competitive accuracy to the state-of-the-art and on WMT'15 English-German we outperform several recently pub… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
348
0
2

Year Published

2017
2017
2024
2024

Publication Types

Select...
5
5

Relationship

0
10

Authors

Journals

citations
Cited by 426 publications
(351 citation statements)
references
References 28 publications
1
348
0
2
Order By: Relevance
“…Interestingly, the best performing model turned out to be nearly equivalent to the base model (described in Section 3.3), differing only in that it used 512-dimensional additive attention. While not the focus on this work, we were able to achieve further improvements by combining all of our insights into a single model described in Table 7 (Jean et al, 2015), RNNSearch-LV (Jean et al, 2015), BPE (Sennrich et al, 2016b), BPE-Char (Chung et al, 2016), Deep-Att , Luong (Luong et al, 2015a), Deep-Conv (Gehring et al, 2016), GNMT (Wu et al, 2016), and OpenNMT (Klein et al, 2017). Systems with an * do not have a public implementation.…”
Section: Final System Comparisonmentioning
confidence: 99%
“…Interestingly, the best performing model turned out to be nearly equivalent to the base model (described in Section 3.3), differing only in that it used 512-dimensional additive attention. While not the focus on this work, we were able to achieve further improvements by combining all of our insights into a single model described in Table 7 (Jean et al, 2015), RNNSearch-LV (Jean et al, 2015), BPE (Sennrich et al, 2016b), BPE-Char (Chung et al, 2016), Deep-Att , Luong (Luong et al, 2015a), Deep-Conv (Gehring et al, 2016), GNMT (Wu et al, 2016), and OpenNMT (Klein et al, 2017). Systems with an * do not have a public implementation.…”
Section: Final System Comparisonmentioning
confidence: 99%
“…1 Note that u is usually deterministic with respect to x s and accurate representation of the conditional distribution highly depends on the decoder. In neural machine translation, the exact forms of encoder and decoder are specified using RNNs (Sutskever et al, 2014), CNNs (Gehring et al, 2016), and attention Vaswani et al, 2017) as building blocks. The decoding distribution, P dec θ (x t | u), is typically modeled autoregressively.…”
Section: Encoder-decoder Frameworkmentioning
confidence: 99%
“…Neural attention mechanism Neural a ention mechanism has inspired many state-of-the-art models in several machine learning tasks including image caption generation [22], machine translation [5,19] and semantic role labeling [18]. Its e ectiveness is owed to making the model focus on more important detailed information and neglecting the useless information.…”
Section: Related Workmentioning
confidence: 99%