2021
DOI: 10.48550/arxiv.2110.01765
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Rapid training of deep neural networks without skip connections or normalization layers using Deep Kernel Shaping

Abstract: Using an extended and formalized version of the Q/C map analysis of Poole et al. (2016), along with Neural Tangent Kernel theory, we identify the main pathologies present in deep networks that prevent them from training fast and generalizing to unseen data, and show how these can be avoided by carefully controlling the "shape" of the network's initializationtime kernel function. We then develop a method called Deep Kernel Shaping (DKS), which accomplishes this using a combination of precise parameter initializ… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
18
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 6 publications
(18 citation statements)
references
References 32 publications
0
18
0
Order By: Relevance
“…In summary, based on our experiments on different initializations and activation functions and our findings on ResNet-18, NNPK seems to hint at the choices that are known to work also well for fully trained models (Gotmare et al, 2018;Shah et al, 2016;Martens et al, 2021).…”
Section: Effect Of Activation Functionmentioning
confidence: 57%
See 4 more Smart Citations
“…In summary, based on our experiments on different initializations and activation functions and our findings on ResNet-18, NNPK seems to hint at the choices that are known to work also well for fully trained models (Gotmare et al, 2018;Shah et al, 2016;Martens et al, 2021).…”
Section: Effect Of Activation Functionmentioning
confidence: 57%
“…First, the NRF seems to improve by using skip connections for both initialization and activation function combinations. However, the performance of the trained model using the adjustments proposed in (Martens et al, 2021) seems to improve without the skip connections. This observation shows that there are cases for which NNPK may not reflect the performance of the final trained model (although the results may vary when using data augmentation and other types of regularizations).…”
Section: Does Skip Connection Improve Nrf?mentioning
confidence: 96%
See 3 more Smart Citations