2015
DOI: 10.48550/arxiv.1502.02478
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Efficient batchwise dropout training using submatrices

Abstract: Dropout is a popular technique for regularizing artificial neural networks. Dropout networks are generally trained by minibatch gradient descent with a dropout mask turning off some of the units-a different pattern of dropout is applied to every sample in the minibatch. We explore a very simple alternative to the dropout mask. Instead of masking dropped out units by setting them to zero, we perform matrix multiplication using a submatrix of the weight matrix-unneeded hidden units are never calculated. Performi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2015
2015
2022
2022

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 3 publications
0
5
0
Order By: Relevance
“…The reason why the LSTM performs better with a high dropout rate is because it tends to overfit soon during training, and even if it could reach high training accuracy, its validation (and therefore testing) accuracy would be weak. In this study, there is a trend based on seasonality, and in order to have an LSTM model that is not overly simplistic (therefore needing at least 200 units), and to train as long as possible, generalization was achieved via high dropout [61].…”
Section: Discussionmentioning
confidence: 99%
“…The reason why the LSTM performs better with a high dropout rate is because it tends to overfit soon during training, and even if it could reach high training accuracy, its validation (and therefore testing) accuracy would be weak. In this study, there is a trend based on seasonality, and in order to have an LSTM model that is not overly simplistic (therefore needing at least 200 units), and to train as long as possible, generalization was achieved via high dropout [61].…”
Section: Discussionmentioning
confidence: 99%
“…ELFISH (Xu et al 2019) randomly removes neurons before training on slow devices at the beginning of a round. Graham, Reizenstein, and Robinson (2015) study the suitability of dropout (Srivastava et al 2014) to reduce resource requirements. They find that computations can only be saved if dropout is done in a structured way, i.e., the same neurons are dropped for all samples of a mini-batch.…”
Section: Related Workmentioning
confidence: 99%
“…To reduce the number of computations, the dropout pattern needs to show some regularity that still allows using vectorization of dense matrix operations. This can be achieved by dropping contiguous parts of the computation (Graham, Reizenstein, and Robinson 2015). Modern NNs consist of many different layer types such as convolutional, pooling, fully-connected, activation, or normalization layers.…”
Section: Dropout To Reduce Computations In Trainingmentioning
confidence: 99%
“…The batchwise dropout Graham et al (2015) is a version of dropout where we use a unique bernouilli mask to discard neurons for each sample in the minibatch. Thus the batchwise dropout reduces the number of parameters in the architecture quadratically in the percentage of kept neurons.…”
Section: Smart Committee Dropoutmentioning
confidence: 99%
“…In Graham et al (2015), a batchwise dropout committee has already been envisaged for future work in order to fasten the testing while averaging the prediction on the full network Baldi & Sadowski (2014). The main advantage is to obtain a committee which the committee has the same architecture than the full network, with members having some zero constraints on several connections.…”
Section: Smart Committee Dropoutmentioning
confidence: 99%