Efficient batchwise dropout training using submatrices

Graham, Ben; Reizenstein, Jeremy; Robinson, Leigh

doi:10.48550/arxiv.1502.02478

Cited by 4 publications

(5 citation statements)

References 3 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The reason why the LSTM performs better with a high dropout rate is because it tends to overfit soon during training, and even if it could reach high training accuracy, its validation (and therefore testing) accuracy would be weak. In this study, there is a trend based on seasonality, and in order to have an LSTM model that is not overly simplistic (therefore needing at least 200 units), and to train as long as possible, generalization was achieved via high dropout [61].…”

Section: Discussionmentioning

confidence: 99%

Application of Artificial Neural Networks for Natural Gas Consumption Forecasting

2020

View full text Add to dashboard Cite

The present research study explores three types of neural network approaches for forecasting natural gas consumption in fifteen cities throughout Greece; a simple perceptron artificial neural network (ANN), a state-of-the-art Long Short-Term Memory (LSTM), and the proposed deep neural network (DNN). In this research paper, a DNN implementation is proposed where variables related to social aspects are introduced as inputs. These qualitative factors along with a deeper, more complex architecture are utilized for improving the forecasting ability of the proposed approach. A comparative analysis is conducted between the proposed DNN, the simple ANN, and the advantageous LSTM, with the results offering a deeper understanding the characteristics of Greek cities and the habitual patterns of their residents. The proposed implementation shows efficacy on forecasting daily values of energy consumption for up to four years. For the evaluation of the proposed approach, a real-life dataset for natural gas prediction was used. A detailed discussion is provided on the performance of the implemented approaches, the ANN and the LSTM, that are characterized as particularly accurate and effective in the literature, and the proposed DNN with the inclusion of the qualitative variables that govern human behavior, which outperforms them.

show abstract

Section: Discussionmentioning

confidence: 99%

Application of Artificial Neural Networks for Natural Gas Consumption Forecasting

2020

View full text Add to dashboard Cite

show abstract

“…ELFISH (Xu et al 2019) randomly removes neurons before training on slow devices at the beginning of a round. Graham, Reizenstein, and Robinson (2015) study the suitability of dropout (Srivastava et al 2014) to reduce resource requirements. They find that computations can only be saved if dropout is done in a structured way, i.e., the same neurons are dropped for all samples of a mini-batch.…”

Section: Related Workmentioning

confidence: 99%

“…To reduce the number of computations, the dropout pattern needs to show some regularity that still allows using vectorization of dense matrix operations. This can be achieved by dropping contiguous parts of the computation (Graham, Reizenstein, and Robinson 2015). Modern NNs consist of many different layer types such as convolutional, pooling, fully-connected, activation, or normalization layers.…”

Section: Dropout To Reduce Computations In Trainingmentioning

confidence: 99%

DISTREAL: Distributed Resource-Aware Learning in Heterogeneous Systems

Rapp

Khalili

Pfeiffer

et al. 2022

AAAI

View full text Add to dashboard Cite

We study the problem of distributed training of neural networks (NNs) on devices with heterogeneous, limited, and time-varying availability of computational resources. We present an adaptive, resource-aware, on-device learning mechanism, DISTREAL, which is able to fully and efficiently utilize the available resources on devices in a distributed manner, increasing the convergence speed. This is achieved with a dropout mechanism that dynamically adjusts the computational complexity of training an NN by randomly dropping filters of convolutional layers of the model. Our main contribution is the introduction of a design space exploration (DSE) technique, which finds Pareto-optimal per-layer dropout vectors with respect to resource requirements and convergence speed of the training. Applying this technique, each device is able to dynamically select the dropout vector that fits its available resource without requiring any assistance from the server. We implement our solution in a federated learning (FL) system, where the availability of computational resources varies both between devices and over time, and show through extensive evaluation that we are able to significantly increase the convergence speed over the state of the art without compromising on the final accuracy.

show abstract

“…The batchwise dropout Graham et al (2015) is a version of dropout where we use a unique bernouilli mask to discard neurons for each sample in the minibatch. Thus the batchwise dropout reduces the number of parameters in the architecture quadratically in the percentage of kept neurons.…”

Section: Smart Committee Dropoutmentioning

confidence: 99%

“…In Graham et al (2015), a batchwise dropout committee has already been envisaged for future work in order to fasten the testing while averaging the prediction on the full network Baldi & Sadowski (2014). The main advantage is to obtain a committee which the committee has the same architecture than the full network, with members having some zero constraints on several connections.…”

Section: Smart Committee Dropoutmentioning

confidence: 99%

QBDC: Query by dropout committee for training deep supervised architecture

Ducoffe,

Precioso

2015

Preprint

View full text Add to dashboard Cite

While the current trend is to increase the depth of neural networks to increase their performance, the size of their training database has to grow accordingly. We notice an emergence of tremendous databases, although providing labels to build a training set still remains a very expensive task. We tackle the problem of selecting the samples to be labelled in an online fashion. In this paper, we present an active learning strategy based on query by committee and dropout technique to train a Convolutional Neural Network (CNN). We derive a commmittee of partial CNNs resulting from batchwise dropout runs on the initial CNN. We evaluate our active learning strategy for CNN on MNIST benchmark, showing in particular that selecting less than 30 % from the annotated database is enough to get similar error rate as using the full training set on MNIST. We also studied the robustness of our method against adversarial examples.

show abstract

Efficient batchwise dropout training using submatrices

Cited by 4 publications

References 3 publications

Application of Artificial Neural Networks for Natural Gas Consumption Forecasting

Application of Artificial Neural Networks for Natural Gas Consumption Forecasting

DISTREAL: Distributed Resource-Aware Learning in Heterogeneous Systems

QBDC: Query by dropout committee for training deep supervised architecture

Contact Info

Product

Resources

About