Distributed Newton Methods for Deep Neural Networks

Wang, Chien-Chih; Tan, Kent Loong; Chen, Chun‐Ting; Lin, Yu-Hsiang; Keerthi, S. Sathiya; Mahajan, Dhruv; Sellamanickam, Sundararajan

doi:10.1162/neco_a_01088

Cited by 19 publications

(16 citation statements)

References 29 publications

(40 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…9 Sensorless 48 11 1.00 Based on phase current measurements of an electric motor, predict different error conditions (Paschke et al, 2013). We use the transformations from Wang et al (2018).…”

Section: S Nomentioning

confidence: 99%

Interpretability With Accurate Small Models

Ghose¹,

Ravindran²

2020

Front. Artif. Intell.

View full text Add to dashboard Cite

Models often need to be constrained to a certain size for them to be considered interpretable. For example, a decision tree of depth 5 is much easier to understand than one of depth 50. Limiting model size, however, often reduces accuracy. We suggest a practical technique that minimizes this trade-off between interpretability and classification accuracy. This enables an arbitrary learning algorithm to produce highly accurate small-sized models. Our technique identifies the training data distribution to learn from that leads to the highest accuracy for a model of a given size. We represent the training distribution as a combination of sampling schemes. Each scheme is defined by a parameterized probability mass function applied to the segmentation produced by a decision tree. An Infinite Mixture Model with Beta components is used to represent a combination of such schemes. The mixture model parameters are learned using Bayesian Optimization. Under simplistic assumptions, we would need to optimize for O(d) variables for a distribution over a d-dimensional input space, which is cumbersome for most real-world data. However, we show that our technique significantly reduces this number to a fixed set of eight variables at the cost of relatively cheap preprocessing. The proposed technique is flexible: it is model-agnostic, i.e., it may be applied to the learning algorithm for any model family, and it admits a general notion of model size. We demonstrate its effectiveness using multiple real-world datasets to construct decision trees, linear probability models and gradient boosted models with different sizes. We observe significant improvements in the F1-score in most instances, exceeding an improvement of 100% in some cases.

show abstract

Section: S Nomentioning

confidence: 99%

Interpretability With Accurate Small Models

Ghose¹,

Ravindran²

2020

Front. Artif. Intell.

View full text Add to dashboard Cite

show abstract

“…As the loss function is a non-linear function the consequence is that it is difficult to find a training algorithm for achieving minimum value. Some of the algorithms used to find the minimum value of the loss function are: Gradient descent [19], Newton's method [20], Conjugate gradient [21], Quasi Newton [22], Levenberg Marquardt [23].…”

Section: Trainingmentioning

confidence: 99%

Application of a Scaled MNIST Dataset Blended with Natural Scene Background on ResNet

Marinov

Mtetwa

Larijani

2019

Proceedings of the 2019 International Conference on Big Data and Education

View full text Add to dashboard Cite

Deep learning (DL) has gained a lot of popularity in the science and business community. It has been successful in a range of applications, especially in computer vision. This paper presents results from applying scaled MNIST images dataset to a popular implementation of deep learning called ResNet. This is a valuable contribution because in general convolutional networks are not scale invariant. Our objective is to explore the behavior of a residual neural network when trained and evaluated using three different datasets of scaled MNIST images.

show abstract

“…The selected datasets reflect some of the data properties present in real applications, i.e., small or medium size datasets represented by both small and large number of features. From these datasets, seven are taken from the UCI machine learning repository (Lichman, 2013), and nine are obtained from recent studies (Anguita et al, 2013;Johnson & Xie, 2013;Schmeier, Jankovic & Bajic, 2011;Singh et al, 2002;Soufan et al, 2015a;Tsanas et al, 2014;Wang et al, 2016;Yeh & Lien, 2009). Table 1 shows the summary information for these datasets.…”

Section: Datasetsmentioning

confidence: 99%

DANNP: an efficient artificial neural network pruning tool

Alshahrani

Soufan

Magana-Mora

et al. 2017

PeerJ Computer Science

View full text Add to dashboard Cite

Background. Artificial neural networks (ANNs) are a robust class of machine learning models and are a frequent choice for solving classification problems. However, determining the structure of the ANNs is not trivial as a large number of weights (connection links) may lead to overfitting the training data. Although several ANN pruning algorithms have been proposed for the simplification of ANNs, these algorithms are not able to efficiently cope with intricate ANN structures required for complex classification problems. Methods. We developed DANNP, a web-based tool, that implements parallelized versions of several ANN pruning algorithms. The DANNP tool uses a modified version of the Fast Compressed Neural Network software implemented in C++ to considerably enhance the running time of the ANN pruning algorithms we implemented. In addition to the performance evaluation of the pruned ANNs, we systematically compared the set of features that remained in the pruned ANN with those obtained by different stateof-the-art feature selection (FS) methods. Results. Although the ANN pruning algorithms are not entirely parallelizable, DANNP was able to speed up the ANN pruning up to eight times on a 32-core machine, compared to the serial implementations. To assess the impact of the ANN pruning by DANNP tool, we used 16 datasets from different domains. In eight out of the 16 datasets, DANNP significantly reduced the number of weights by 70%-99%, while maintaining a competitive or better model performance compared to the unpruned ANN. Finally, we used a naïve Bayes classifier derived with the features selected as a byproduct of the ANN pruning and demonstrated that its accuracy is comparable to those obtained by the classifiers trained with the features selected by several state-of-the-art FS methods. The FS ranking methodology proposed in this study allows the users to identify the most discriminant features of the problem at hand. To the best of our knowledge, DANNP (publicly available at www.cbrc.kaust.edu.sa/dannp) is the only available and on-line accessible tool that provides multiple parallelized ANN pruning options. Datasets and DANNP code can be obtained at www.cbrc.kaust.edu.sa/dannp/data.php and https://doi.org/10.5281/zenodo.1001086.

show abstract

Distributed Newton Methods for Deep Neural Networks

Cited by 19 publications

References 29 publications

Interpretability With Accurate Small Models

Interpretability With Accurate Small Models

Application of a Scaled MNIST Dataset Blended with Natural Scene Background on ResNet

DANNP: an efficient artificial neural network pruning tool

Contact Info

Product

Resources

About