When Does Re-initialization Work?

Zaidi, Sheheryar; Berariu, Tudor; Kim, Hyunjik; Bornschein, Jörg; Clopath, Claudia; Teh, Yee Whye; Pascanu, Razvan

doi:10.48550/arxiv.2206.10011

Search citation statements

Order By: Relevance

Paper Sections

Select...

Shrink-perturb Is the Superior Way Of Weight Inheritance2

Citation Types

Supporting

Mentioning

Contrasting

Year Published

2023

Publication Types

Select...

Book1

Relationship

Self Cite0

Independent1

Authors

Journals

Cited by 1 publication

(2 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus, in our settings, using shrink-perturb is the best method to inherit the weights. The default parameters of shrink-perturb from [44] ([0.4, 0.1]) worked well in PBT-NAS without any tuning.…”

Section: Shrink-perturb Is the Superior Way Of Weight Inheritancementioning

confidence: 99%

See 1 more Smart Citation

Shrink-Perturb Improves Architecture Mixing During Population Based Training for Neural Architecture Search

Chebykin,

Dushatskiy,

Alderliesten

et al. 2023

Frontiers in Artificial Intelligence and Applications

View full text Add to dashboard Cite

In this work, we show that simultaneously training and mixing neural networks is a promising way to conduct Neural Architecture Search (NAS). For hyperparameter optimization, reusing the partially trained weights allows for efficient search, as was previously demonstrated by the Population Based Training (PBT) algorithm. We propose PBT-NAS, an adaptation of PBT to NAS where architectures are improved during training by replacing poorly-performing networks in a population with the result of mixing well-performing ones and inheriting the weights using the shrink-perturb technique. After PBT-NAS terminates, the created networks can be directly used without retraining. PBT-NAS is highly parallelizable and effective: on challenging tasks (image generation and reinforcement learning) PBT-NAS achieves superior performance compared to baselines (random search and mutation-based PBT).

show abstract

Section: Shrink-perturb Is the Superior Way Of Weight Inheritancementioning

confidence: 99%

“…In [44], shrink-perturb was found to benefit performance, thus raising the question if using it gives PBT-NAS an unfair advantage that is not related to NAS. In order to test this, we added shrink-perturb to random search.…”

Section: Shrink-perturb Is the Superior Way Of Weight Inheritancementioning

confidence: 99%