Newton Methods for Convolutional Neural Networks

Wang, Chien-Chih; Tan, Kent Loong

doi:10.1145/3368271

Cited by 9 publications

(16 citation statements)

References 24 publications

(26 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These typically require the use of direct or iterative solvers to compute the step towards optimality, for example, solving with the Hessian matrix in a Newton method. As a result the applicability of Newton-type schemes for the computation of W (j) and b (j) has received more attention with a strong focus on exploiting the structure of the Hessian matrix [35,51,65,239,283,287]. The computational complexity of neural networks is challenging on many levels.…”

Section: Numerical Linear Algebra In Deep Learningmentioning

confidence: 99%

A literature survey of matrix methods for data science

Stoll

2020

GAMM-Mitteilungen

View full text Add to dashboard Cite

Efficient numerical linear algebra is a core ingredient in many applications across almost all scientific and industrial disciplines. With this survey we want to illustrate that numerical linear algebra has played and is playing a crucial role in enabling and improving data science computations with many new developments being fueled by the availability of data and computing resources. We highlight the role of various different factorizations and the power of changing the representation of the data as well as discussing topics such as randomized algorithms, functions of matrices, and high‐dimensional problems. We briefly touch upon the role of techniques from numerical linear algebra used within deep learning.

show abstract

Section: Numerical Linear Algebra In Deep Learningmentioning

confidence: 99%

A literature survey of matrix methods for data science

Stoll

2020

GAMM-Mitteilungen

View full text Add to dashboard Cite

show abstract

“…Hessian-vector product have use case in training deep neural network, also known as Hessian-free optimization. Recently, Newton methods have been investigated as an alternative optimization technique, but nearly all existing studies consider only fully-connected feed-forward neural networks [19]. Newton methods for CNN involve complicated operations due to this limited researchers have conducted a thorough investigation.…”

Section: Introductionmentioning

confidence: 99%

“…Newton methods for CNN involve complicated operations due to this limited researchers have conducted a thorough investigation. One of the major work in this direction is introduction of Newton methods in CNN for the optimization [19]. There are many reasons to work further in this direction.…”

Section: Introductionmentioning

confidence: 99%

“…At first, it is generally more substantial to apply weight updates derived from second-order methods in terms of optimization aspect. Meanwhile, it takes roughly the same time to obtain curvature-vector products [8] and computing the gradient which make it possible to use second-order method on large scale model [6].This second order method have shown better results to traditional stochastic gradient (SG) methods [19]. But it needs to find Hessian matrix.…”

Section: Introductionmentioning

confidence: 99%

“…But it needs to find Hessian matrix. Finding Hessian matrix becomes very complex as we might resort to finite differences method while dealing with the Image data [19]. Another issue is that if we have n dimensions, the Hessian will need O(n 2 ) space and computational complexity for n × n matrix.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Newton methods based convolution neural networks using parallel processing

Thakur¹,

Sharma²

2021

Preprint

View full text Add to dashboard Cite

Training of convolutional neural networks is a high dimensional and a non-convex optimization problem. At present, it is inefficient in situations where parametric learning rates can not be confidently set. Some past works have introduced Newton methods for training deep neural networks. Newton methods for convolutional neural networks involve complicated operations. Finding the Hessian matrix in second-order methods becomes very complex as we mainly use the finite differences method with the image data. Newton methods for convolutional neural networks deals with this by using the sub-sampled Hessian Newton methods. In this paper, we have used the complete data instead of the sub-sampled methods that only handle partial data at a time. Further, we have used parallel processing instead of serial processing in mini-batch computations. The results obtained using parallel processing in this study, outperform the time taken by the previous approach.

show abstract