Proceedings of the Tenth Workshop on Statistical Machine Translation 2015
DOI: 10.18653/v1/w15-3014
|View full text |Cite
|
Sign up to set email alerts
|

Montreal Neural Machine Translation Systems for WMT’15

Abstract: Neural machine translation (NMT) systems have recently achieved results comparable to the state of the art on a few translation tasks, including English→French and English→German. The main purpose of the Montreal Institute for Learning Algorithms (MILA) submission to WMT'15 is to evaluate this new approach on a greater variety of language pairs. Furthermore, the human evaluation campaign may help us and the research community to better understand the behaviour of our systems. We use the RNNsearch architecture,… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
96
1

Year Published

2015
2015
2021
2021

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 120 publications
(102 citation statements)
references
References 11 publications
2
96
1
Order By: Relevance
“…In deep fusion, the controller parameters and output parameters are tuned on further parallel training data, but the language model parameters are fixed during the finetuning stage. Jean et al (2015b) also report on experiments with reranking of NMT output with a 5-gram language model, but improvements are small (between 0.1-0.5 BLEU).…”
Section: Related Workmentioning
confidence: 99%
“…In deep fusion, the controller parameters and output parameters are tuned on further parallel training data, but the language model parameters are fixed during the finetuning stage. Jean et al (2015b) also report on experiments with reranking of NMT output with a 5-gram language model, but improvements are small (between 0.1-0.5 BLEU).…”
Section: Related Workmentioning
confidence: 99%
“…While this monotonicity condition is true for vanilla NMT (Eq. 3), it does not hold for methods like length normalization (Jean et al, 2015;Boulanger-Lewandowski et al, 2013; or word rewards (He et al, 2016): Length normalization gives an advantage to longer hypotheses by dividing the score by the sentence length, while a word reward directly violates monotonicity as it rewards each word with a positive value. In Sec.…”
Section: Exact Inference For Neural Modelsmentioning
confidence: 99%
“…We call this averaging averagedbest. On the other hand, building ensembles requires training several models which is time consuming, however, it is common to do that in NMT (Jean et al, 2015). Thus, an investigation is needed to discover whether either the model or the estimation of its parameters is weak.…”
Section: Introductionmentioning
confidence: 99%