An Empirical Study of Mini-Batch Creation Strategies for Neural
            Machine Translation

Morishita, Makoto; Oda, Yusuke; Neubig, Graham; Yoshino, Katsumi; Sudoh, Katsuhito; Nakamura, Satoshi

doi:10.18653/v1/w17-3208

Cited by 17 publications

(14 citation statements)

References 8 publications

(5 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We use mini-batching that limits the number of words in the mini-batch instead of the number of sentences (Morishita et al, 2017). We limit the mini-batch size to 5000 words.…”

Section: Training Detailsmentioning

confidence: 99%

Scheduled Multi-Task Learning: From Syntax to Translation

Kiperwasser

Ballesteros

2018

TACL

View full text Add to dashboard Cite

Neural encoder-decoder models of machine translation have achieved impressive results, while learning linguistic knowledge of both the source and target languages in an implicit end-to-end manner. We propose a framework in which our model begins learning syntax and translation interleaved, gradually putting more focus on translation. Using this approach, we achieve considerable improvements in terms of BLEU score on relatively large parallel corpus (WMT14 English to German) and a lowresource (WIT German to English) setup.

show abstract

“…We use mini-batching that limits the number of words in the mini-batch instead of the number of sentences (Morishita et al, 2017). We limit the mini-batch size to 5000 words.…”

Section: Training Detailsmentioning

confidence: 99%

Scheduled Multi-Task Learning: From Syntax to Translation

Kiperwasser

Ballesteros

2018

TACL

View full text Add to dashboard Cite

show abstract

“…The result of the training is shown in Fig.5 where the model was trained on three hidden layers ANN with 150 hidden neurons in each layer. Due to the considerably large training set of 19,696 images, the training was carry out by randomly dividing the training set into mini-batches of 1,024 images to reduce computational cost [29]. Hence, the cross entropy in Fig.5 appears to be fluctuating due to some minibatches were slightly harder to predict that the other.…”

Section: ) Ann Training and Hold-out Cross-validationmentioning

confidence: 99%

Multi-class Classification of Ceramic Tile Surface Quality using Artificial Neural Network and Principal Component Analysis

Ramadhan¹,

Rachmat

Atmaja

et al. 2019

Proceedings of the 2018 International Conference on Industrial Enterprise and System Engineering (IcoIESE 2018)

View full text Add to dashboard Cite

The visual inspection of ceramic tile surface is an important factor which may influence the perceived surface quality of the product. While manual labor offers an alternative in the task of visual inspection, human limitation related problem such as fatigue and safety may pose an undesirable inspection performance when applied in mass production industry. This study attempted to automate the process of ceramic quality inspection through computerized image classification. Specifically, a dimensionality reduction technique called Principal Component Analysis and classification technique Artificial Neural Network were incorporated in the study to classify five categories of surface quality: normal, crack, chip-off, scratch and dry spots. Given 400 principal components as the input layer and three hidden layers consisting 150 hidden units each, the model was trained under 19,696 training images by using Adam Optimization. By performing prediction on the test set consisting of 4,256 images, the trained model was able to achieve the classification accuracy of 90.13%.

show abstract

“…The currently published next best performing simple system, Parikh et al ( 2016) at 86.3% accuracy, introduced use of the attention mechanism for the NLI task, the way it is generally being used today. 7 Morishita et al (2017) explored the effect of minibatching on the learning of Neural Machine Translation models, carrying out their experiments on two datasets (two language-pairs). In particular, they studied the strategies of (1) sorting by length of the source sentence, (2) target sentence, or (3) both, among other things.…”

Section: State-of-the-art Nlimentioning

confidence: 99%

When data permutations are pathological: the case of neural natural language inference

Schluter

Varab²

2018

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Consider two competitive machine learning models, one of which was considered state-of-the art, and the other a competitive baseline. Suppose that by just permuting the examples of the training set, say by reversing the original order, by shuffling, or by mini-batching, you could report substantially better/worst performance for the system of your choice, by multiple percentage points. In this paper, we illustrate this scenario for a trending NLP task: Natural Language Inference (NLI). We show that for the two central NLI corpora today, the learning process of neural systems is far too sensitive to permutations of the data. In doing so we reopen the question of how to judge a good neural architecture for NLI, given the available dataset and perhaps, further, the soundness of the NLI task itself in its current state.

show abstract

An Empirical Study of Mini-Batch Creation Strategies for Neural Machine Translation

Cited by 17 publications

References 8 publications

Scheduled Multi-Task Learning: From Syntax to Translation

Scheduled Multi-Task Learning: From Syntax to Translation

Multi-class Classification of Ceramic Tile Surface Quality using Artificial Neural Network and Principal Component Analysis

When data permutations are pathological: the case of neural natural language inference

Contact Info

Product

Resources

About