A Convolutional Encoder Model for Neural Machine Translation

Gehring, Jonas; Auli, Michael; Grangier, David; Dauphin, Yann N.

doi:10.18653/v1/p17-1012

Cited by 426 publications

(351 citation statements)

References 28 publications

Supporting

Mentioning

348

Contrasting

Unclassified

Order By: Relevance

“…Interestingly, the best performing model turned out to be nearly equivalent to the base model (described in Section 3.3), differing only in that it used 512-dimensional additive attention. While not the focus on this work, we were able to achieve further improvements by combining all of our insights into a single model described in Table 7 (Jean et al, 2015), RNNSearch-LV (Jean et al, 2015), BPE (Sennrich et al, 2016b), BPE-Char (Chung et al, 2016), Deep-Att , Luong (Luong et al, 2015a), Deep-Conv (Gehring et al, 2016), GNMT (Wu et al, 2016), and OpenNMT (Klein et al, 2017). Systems with an * do not have a public implementation.…”

Section: Final System Comparisonmentioning

confidence: 99%

Massive Exploration of Neural Machine Translation Architectures

Britz¹,

Goldie²,

Luong³

et al. 2017

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

355

245

View full text Add to dashboard Cite

Neural Machine Translation (NMT) has shown remarkable progress over the past few years, with production systems now being deployed to end-users. As the field is moving rapidly, it has become unclear which elements of NMT architectures have a significant impact on translation quality. In this work, we present a large-scale analysis of the sensitivity of NMT architectures to common hyperparameters. We report empirical results and variance numbers for several hundred experimental runs, corresponding to over 250,000 GPU hours on a WMT English to German translation task. Our experiments provide practical insights into the relative importance of factors such as embedding size, network depth, RNN cell type, residual connections, attention mechanism, and decoding heuristics. As part of this contribution, we also release an open-source NMT framework in TensorFlow to make it easy for others to reproduce our results and perform their own experiments.

show abstract

Section: Final System Comparisonmentioning

confidence: 99%

Massive Exploration of Neural Machine Translation Architectures

Britz¹,

Goldie²,

Luong³

et al. 2017

Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing

355

245

View full text Add to dashboard Cite

show abstract

“…1 Note that u is usually deterministic with respect to x s and accurate representation of the conditional distribution highly depends on the decoder. In neural machine translation, the exact forms of encoder and decoder are specified using RNNs (Sutskever et al, 2014), CNNs (Gehring et al, 2016), and attention Vaswani et al, 2017) as building blocks. The decoding distribution, P dec θ (x t | u), is typically modeled autoregressively.…”

Section: Encoder-decoder Frameworkmentioning

confidence: 99%

Consistency by Agreement in Zero-Shot Neural Machine Translation

Al-Shedivat¹,

Parikh²

2019

Proceedings of the 2019 Conference of the North

View full text Add to dashboard Cite

Generalization and reliability of multilingual translation often highly depend on the amount of available parallel data for each language pair of interest. In this paper, we focus on zero-shot generalization-a challenging setup that tests models on translation directions they have not been optimized for at training time.To solve the problem, we (i) reformulate multilingual translation as probabilistic inference, (ii) define the notion of zero-shot consistency and show why standard training often results in models unsuitable for zero-shot tasks, and (iii) introduce a consistent agreement-based training method that encourages the model to produce equivalent translations of parallel sentences in auxiliary languages. We test our multilingual NMT models on multiple public zeroshot translation benchmarks (IWSLT17, UN corpus, Europarl) and show that agreementbased learning often results in 2-3 BLEU zeroshot improvement over strong baselines without any loss in performance on supervised translation directions. Philip Koehn. 2017. Europarl: A parallel corpus for statistical machine translation. Philipp Koehn. 2009. Statistical machine translation. Cambridge University Press. Philipp Koehn and Rebecca Knowles. 2017. Six challenges for neural machine translation. arXiv preprint arXiv:1706.03872. Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc'Aurelio Ranzato. 2017. Unsupervised machine translation using monolingual corpora only. arXiv preprint arXiv:1711.00043.

show abstract

“…Neural attention mechanism Neural a ention mechanism has inspired many state-of-the-art models in several machine learning tasks including image caption generation [22], machine translation [5,19] and semantic role labeling [18]. Its e ectiveness is owed to making the model focus on more important detailed information and neglecting the useless information.…”

Section: Related Workmentioning

confidence: 99%

Collective Link Prediction Oriented Network Embedding with Hierarchical Graph Attention

Jiao

Xiong

Zhang

et al. 2019

Proceedings of the 28th ACM International Conference on Information and Knowledge Management

View full text Add to dashboard Cite

To enjoy more social network services, users nowadays are usually involved in multiple online sites at the same time. Aligned social networks provide more information to alleviate the problem of data insu ciency. In this paper, we target on the collective link prediction problem and aim to predict both the intra-network social links as well as the inter-network anchor links across multiple aligned social networks. It is not an easy task, and the major challenges involve the network characteristic di erence problem and di erent directivity properties of the social and anchor links to be predicted. To address the problem, we propose an application oriented network embedding framework, Hierarchical Graph A ention based Network Embedding (HGANE), for collective link prediction over directed aligned networks. Very di erent from the conventional general network embedding models, HGANE e ectively incorporates the collective link prediction task objectives into consideration. It learns the representations of nodes by aggregating information from both the intra-network neighbors (connected by social links) and inter-network partners (connected by anchor links). What's more, we introduce a hierarchical graph a ention mechanism for the intra-network neighbors and inter-network partners respectively, which resolves the network characteristic di erences and the link directivity challenges e ectively. Extensive experiments have been conducted on real-world aligned networks datasets to demonstrate that our model outperformed the state-of-the-art baseline methods in addressing the collective link prediction problem by a large margin.

show abstract

A Convolutional Encoder Model for Neural Machine Translation

Cited by 426 publications

References 28 publications

Massive Exploration of Neural Machine Translation Architectures

Massive Exploration of Neural Machine Translation Architectures

Consistency by Agreement in Zero-Shot Neural Machine Translation

Collective Link Prediction Oriented Network Embedding with Hierarchical Graph Attention

Contact Info

Product

Resources

About