2018
DOI: 10.1162/neco_a_01088
|View full text |Cite
|
Sign up to set email alerts
|

Distributed Newton Methods for Deep Neural Networks

Abstract: Deep learning involves a difficult nonconvex optimization problem with a large number of weights between any two adjacent layers of a deep structure. To handle large data sets or complicated networks, distributed training is needed, but the calculation of function, gradient, and Hessian is expensive. In particular, the communication and the synchronization cost may become a bottleneck. In this letter, we focus on situations where the model is distributedly stored and propose a novel distributed Newton method f… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
15
0

Year Published

2018
2018
2022
2022

Publication Types

Select...
7
3

Relationship

1
9

Authors

Journals

citations
Cited by 19 publications
(16 citation statements)
references
References 29 publications
(40 reference statements)
0
15
0
Order By: Relevance
“…9 Sensorless 48 11 1.00 Based on phase current measurements of an electric motor, predict different error conditions (Paschke et al, 2013). We use the transformations from Wang et al (2018).…”
Section: S Nomentioning
confidence: 99%
“…9 Sensorless 48 11 1.00 Based on phase current measurements of an electric motor, predict different error conditions (Paschke et al, 2013). We use the transformations from Wang et al (2018).…”
Section: S Nomentioning
confidence: 99%
“…As the loss function is a non-linear function the consequence is that it is difficult to find a training algorithm for achieving minimum value. Some of the algorithms used to find the minimum value of the loss function are: Gradient descent [19], Newton's method [20], Conjugate gradient [21], Quasi Newton [22], Levenberg Marquardt [23].…”
Section: Trainingmentioning
confidence: 99%
“…The selected datasets reflect some of the data properties present in real applications, i.e., small or medium size datasets represented by both small and large number of features. From these datasets, seven are taken from the UCI machine learning repository (Lichman, 2013), and nine are obtained from recent studies (Anguita et al, 2013;Johnson & Xie, 2013;Schmeier, Jankovic & Bajic, 2011;Singh et al, 2002;Soufan et al, 2015a;Tsanas et al, 2014;Wang et al, 2016;Yeh & Lien, 2009). Table 1 shows the summary information for these datasets.…”
Section: Datasetsmentioning
confidence: 99%