2014
DOI: 10.1016/j.artint.2014.02.004
|View full text |Cite
|
Sign up to set email alerts
|

The dropout learning algorithm

Abstract: Dropout is a recently introduced algorithm for training neural network by randomly dropping units during training to prevent their co-adaptation. A mathematical analysis of some of the static and dynamic properties of dropout is provided using Bernoulli gating variables, general enough to accommodate dropout on units or connections, and with variable rates. The framework allows a complete analysis of the ensemble averaging properties of dropout in linear networks, which is useful to understand the non-linear c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

3
227
0

Year Published

2016
2016
2023
2023

Publication Types

Select...
7
1
1

Relationship

1
8

Authors

Journals

citations
Cited by 318 publications
(252 citation statements)
references
References 35 publications
3
227
0
Order By: Relevance
“…Learning rate decayed linearly from 0.01 to a final value starting and finishing at a specified number of epochs. Dropout (in which nodes are removed during training) with values of p from 0.0 to 0.5 were used at several combinations of layers to add regularization [37,38]. These networks had 9 fully connected hidden layers with rectified linear units [39,40].…”
Section: Feedforward Neural Networkmentioning
confidence: 99%
“…Learning rate decayed linearly from 0.01 to a final value starting and finishing at a specified number of epochs. Dropout (in which nodes are removed during training) with values of p from 0.0 to 0.5 were used at several combinations of layers to add regularization [37,38]. These networks had 9 fully connected hidden layers with rectified linear units [39,40].…”
Section: Feedforward Neural Networkmentioning
confidence: 99%
“…Dropout prevents neutrons from coadapting by randomly setting a fraction, governed by the dropout hyperparameter, to zero at each training iteration. This results in a model that can be interpreted as randomly sampling from an exponential number of similar networks [64], and creates more generalizable representations of data.…”
Section: Appendix A: Neural Network Glossarymentioning
confidence: 99%
“…A tanh activation function was applied to the nodes in each layer, as well as an L2 regularizer with weight decay set to 0.001. Dropout [62][63][64] was also applied to each layer. While many variations of the network structure were investigated, a systematic hyperparameter tuning was not undertaken due to computational limitations.…”
Section: A Fully-connected Networkmentioning
confidence: 99%
“…While running stochastic gradient descent, dropout involves randomly sampling the set of features that are considered in any step of the algorithm. However, in a contrast to randomized splitting in a forest, the effect of dropout training is fairly well understood: for example, in the case of single-layer models, dropout can be understood as a form of data-adaptive ridge-like regularization (Baldi and Sadowski 2014;Wager et al 2013). Fleshing out the connections between dropout and random forests could provide new insights about the role of feature sampling in growing trees.…”
Section: Why Is Feature Sampling Helpful?mentioning
confidence: 99%