Empirical Investigation of Optimization Algorithms in Neural Machine Translation

Bahar, Parnia; Alkhouli, Tamer; Peter, J. Dinesh; Brix, Christopher; Ney, Hermann

doi:10.1515/pralin-2017-0005

Cited by 18 publications

(17 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We have three variants of our model, using: (i) only the source memory (S-NMT+src mem), (ii) only the target memory (S-NMT+trg mem), or 5 In our initial experiments, we found SGD to be more effective than Adam/Adagrad; an observation also made by Bahar et al (2017). 6 For the document NMT model training, we did some preliminary experiments using different learning rates and used the scheme which converged to the best perplexity in the least number of epochs while for sentence-level training we follow Cohn et al (2016).…”

Section: Resultsmentioning

confidence: 91%

Document Context Neural Machine Translation with Memory Networks

Maruf¹,

Haffari²

2018

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

143

141

View full text Add to dashboard Cite

We present a document-level neural machine translation model which takes both source and target document context into account using memory networks. We model the problem as a structured prediction problem with interdependencies among the observed and hidden variables, i.e., the source sentences and their unobserved target translations in the document. The resulting structured prediction problem is tackled with a neural translation model equipped with two memory components, one each for the source and target side, to capture the documental interdependencies. We train the model endto-end, and propose an iterative decoding algorithm based on block coordinate descent. Experimental results of English translations from French, German, and Estonian documents show that our model is effective in exploiting both source and target document context, and statistically significantly outperforms the previous work in terms of BLEU and METEOR.

show abstract

Section: Resultsmentioning

confidence: 91%

Document Context Neural Machine Translation with Memory Networks

Maruf¹,

Haffari²

2018

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

143

141

View full text Add to dashboard Cite

show abstract

“…Note that the baseline in this work is much stronger than in our prior work ( >5 BLEU). This is due to multiple factors that have been recommended as best practices for neural MT and have been incorporated in the present baseline -deduplication of the training data, ensemble decoding using multiple random runs, use of Adam as the optimizer instead of AdaDelta (Bahar et al, 2017;Denkowski and Neubig, 2017), and checkpoint averaging (Bahar et al, 2017) -as well as a more recent neural modeling toolkit.…”

Section: Neural Mt Systemmentioning

confidence: 99%

Identifying Semantic Divergences in Parallel Text without Annotations

Vyas

Niu

Carpuat

2018

Proceedings of the 2018 Conference of the North American Chapter Of the Association for Computational Linguistics: Hu

View full text Add to dashboard Cite

Recognizing that even correct translations are not always semantically equivalent, we automatically detect meaning divergences in parallel sentence pairs with a deep neural model of bilingual semantic similarity which can be trained for any parallel corpus without any manual annotation. We show that our semantic model detects divergences more accurately than models based on surface features derived from word alignments, and that these divergences matter for neural machine translation.

show abstract

“…So et al (2019) apply NAS to Transformer on NMT tasks. There is also work on empirically exploring hyperparameters and architectures of NMT systems (Bahar et al, 2017;Britz et al, 2017;Lim et al, 2018), though the focus is on finding general best-practice configurations. This differs from the goal of HPO, which aims to find the best configuration specific to a given dataset.…”

Section: Related Workmentioning

confidence: 99%

Reproducible and Efficient Benchmarks for Hyperparameter Optimization of Neural Machine Translation Systems

Zhang

Duh

2020

Transactions of the Association for Computational Linguistics

View full text Add to dashboard Cite

Hyperparameter selection is a crucial part of building neural machine translation (NMT) systems across both academia and industry. Fine-grained adjustments to a model’s architecture or training recipe can mean the difference between a positive and negative research result or between a state-of-the-art and underperforming system. While recent literature has proposed methods for automatic hyperparameter optimization (HPO), there has been limited work on applying these methods to neural machine translation (NMT), due in part to the high costs associated with experiments that train large numbers of model variants. To facilitate research in this space, we introduce a lookup-based approach that uses a library of pre-trained models for fast, low cost HPO experimentation. Our contributions include (1) the release of a large collection of trained NMT models covering a wide range of hyperparameters, (2) the proposal of targeted metrics for evaluating HPO methods on NMT, and (3) a reproducible benchmark of several HPO methods against our model library, including novel graph-based and multiobjective methods.

show abstract

Empirical Investigation of Optimization Algorithms in Neural Machine Translation

Cited by 18 publications

References 6 publications

Document Context Neural Machine Translation with Memory Networks

Document Context Neural Machine Translation with Memory Networks

Identifying Semantic Divergences in Parallel Text without Annotations

Reproducible and Efficient Benchmarks for Hyperparameter Optimization of Neural Machine Translation Systems

Contact Info

Product

Resources

About