2017
DOI: 10.48550/arxiv.1704.04932
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deep Relaxation: partial differential equations for optimizing deep neural networks

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
31
0

Year Published

2017
2017
2022
2022

Publication Types

Select...
4
4

Relationship

4
4

Authors

Journals

citations
Cited by 15 publications
(32 citation statements)
references
References 0 publications
1
31
0
Order By: Relevance
“…3 that the best validation accuracy (72.9%) of the proposed method is higher than the one obtained with classical backpropagation (71.7%). Such a positive effect of proximal smoothing on the generalization capabilities of deep networks is consistent with the observations of Chaudhari et al (2017b). Finally, the accuracies on the test set after 50 epochs are 70.7% for ProxProp and 69.6% for BackProp which suggests that the proposed algorithm can lead to better generalization.…”
Section: Proxprop As a First-order Oraclesupporting
confidence: 85%
“…3 that the best validation accuracy (72.9%) of the proposed method is higher than the one obtained with classical backpropagation (71.7%). Such a positive effect of proximal smoothing on the generalization capabilities of deep networks is consistent with the observations of Chaudhari et al (2017b). Finally, the accuracies on the test set after 50 epochs are 70.7% for ProxProp and 69.6% for BackProp which suggests that the proposed algorithm can lead to better generalization.…”
Section: Proxprop As a First-order Oraclesupporting
confidence: 85%
“…The continuous-time point-of-view used in this paper gives access to general principles that govern SGD, such analyses are increasingly becoming popular (Wibisono et al, 2016;Chaudhari et al, 2017b). However, in practice, deep networks are trained for only a few epochs with discrete-time updates.…”
Section: Discussionmentioning
confidence: 99%
“…Equivalence of Entropy-SGD and Elastic-SGD. The resemblance of ( 6) and ( 7) is not a coincidence, the authors in Chaudhari et al (2017) proved that Elastic-SGD is equivalent to Entropy-SGD if the y k updates converge quickly, i.e., if the sub-objective of (6a),…”
Section: Background and Related Workmentioning
confidence: 99%

Parle: parallelizing stochastic gradient descent

Chaudhari,
Baldassi,
Zecchina
et al. 2017
Preprint
Self Cite