Proceedings of the First Workshop on Neural Machine Translation 2017
DOI: 10.18653/v1/w17-3208
|View full text |Cite
|
Sign up to set email alerts
|

An Empirical Study of Mini-Batch Creation Strategies for Neural Machine Translation

Abstract: Training of neural machine translation (NMT) models usually uses mini-batches for efficiency purposes. During the minibatched training process, it is necessary to pad shorter sentences in a mini-batch to be equal in length to the longest sentence therein for efficient computation. Previous work has noted that sorting the corpus based on the sentence length before making mini-batches reduces the amount of padding and increases the processing speed. However, despite the fact that mini-batch creation is an essent… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
1

Year Published

2018
2018
2023
2023

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 17 publications
(14 citation statements)
references
References 8 publications
(5 reference statements)
0
13
1
Order By: Relevance
“…We use mini-batching that limits the number of words in the mini-batch instead of the number of sentences (Morishita et al, 2017). We limit the mini-batch size to 5000 words.…”
Section: Training Detailsmentioning
confidence: 99%
“…We use mini-batching that limits the number of words in the mini-batch instead of the number of sentences (Morishita et al, 2017). We limit the mini-batch size to 5000 words.…”
Section: Training Detailsmentioning
confidence: 99%
“…The result of the training is shown in Fig.5 where the model was trained on three hidden layers ANN with 150 hidden neurons in each layer. Due to the considerably large training set of 19,696 images, the training was carry out by randomly dividing the training set into mini-batches of 1,024 images to reduce computational cost [29]. Hence, the cross entropy in Fig.5 appears to be fluctuating due to some minibatches were slightly harder to predict that the other.…”
Section: ) Ann Training and Hold-out Cross-validationmentioning
confidence: 99%
“…The currently published next best performing simple system, Parikh et al ( 2016) at 86.3% accuracy, introduced use of the attention mechanism for the NLI task, the way it is generally being used today. 7 Morishita et al (2017) explored the effect of minibatching on the learning of Neural Machine Translation models, carrying out their experiments on two datasets (two language-pairs). In particular, they studied the strategies of (1) sorting by length of the source sentence, (2) target sentence, or (3) both, among other things.…”
Section: State-of-the-art Nlimentioning
confidence: 99%