2020
DOI: 10.1137/19m1247620
|View full text |Cite
|
Sign up to set email alerts
|

Layer-Parallel Training of Deep Residual Neural Networks

Abstract: Residual neural networks (ResNets) are a promising class of deep neural networks that have shown excellent performance for a number of learning tasks, e.g., image classification and recognition. Mathematically, ResNet architectures can be interpreted as forward Euler discretizations of a nonlinear initial value problem whose time-dependent control variables represent the weights of the neural network. Hence, training a ResNet can be cast as an optimal control problem of the associated dynamical system. For sim… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
66
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 78 publications
(66 citation statements)
references
References 42 publications
0
66
0
Order By: Relevance
“…The connection of neural network architectures to optimal control problems as introduced in [120,125] makes heavy use of PDE-constrained optimization techniques including efficient matrix vector products. This topic has also recently received more attention from within the machine learning community [52] and promises to be a very interesting field for combining traditional methods from numerical analysis with deep learning.…”
Section: Numerical Linear Algebra In Deep Learningmentioning
confidence: 99%
“…The connection of neural network architectures to optimal control problems as introduced in [120,125] makes heavy use of PDE-constrained optimization techniques including efficient matrix vector products. This topic has also recently received more attention from within the machine learning community [52] and promises to be a very interesting field for combining traditional methods from numerical analysis with deep learning.…”
Section: Numerical Linear Algebra In Deep Learningmentioning
confidence: 99%
“…Very recently, Yalla and Enquist [40] showed the promise of using a machine learned model as coarse propagator for test problems. Going the other way, Schroder [32] and Günther et al [14] recently showed that parallel-in-time integration can be used to speed up the process of training neural networks.…”
Section: Related Workmentioning
confidence: 99%
“…In the most recent period, researchers have explored distributed deep training methods from various other dimensions to accelerate the training process, Günther et al 35 provides a proof-of-concept for layer-parallel training of ResNets and demonstrates two options to benefit from the layer-parallel approach. Recently, independent work on Pipedream 11 proposed a distributed pipeline system for DNN training.…”
Section: Related Workmentioning
confidence: 99%