2022
DOI: 10.48550/arxiv.2206.10011
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

When Does Re-initialization Work?

Abstract: Re-initializing a neural network during training has been observed to improve generalization in recent works. Yet it is neither widely adopted in deep learning practice nor is it often used in state-of-the-art training protocols. This raises the question of when re-initialization works, and whether it should be used together with regularization techniques such as data augmentation, weight decay and learning rate schedules. In this work, we conduct an extensive empirical comparison of standard training with a s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
2
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(2 citation statements)
references
References 13 publications
0
2
0
Order By: Relevance
“…Thus, in our settings, using shrink-perturb is the best method to inherit the weights. The default parameters of shrink-perturb from [44] ([0.4, 0.1]) worked well in PBT-NAS without any tuning.…”
Section: Shrink-perturb Is the Superior Way Of Weight Inheritancementioning
confidence: 99%
See 1 more Smart Citation
“…Thus, in our settings, using shrink-perturb is the best method to inherit the weights. The default parameters of shrink-perturb from [44] ([0.4, 0.1]) worked well in PBT-NAS without any tuning.…”
Section: Shrink-perturb Is the Superior Way Of Weight Inheritancementioning
confidence: 99%
“…In [44], shrink-perturb was found to benefit performance, thus raising the question if using it gives PBT-NAS an unfair advantage that is not related to NAS. In order to test this, we added shrink-perturb to random search.…”
Section: Shrink-perturb Is the Superior Way Of Weight Inheritancementioning
confidence: 99%