Determinants of capture-recapture success: an evaluation of trapping methods to estimate population and community parameters for Atlantic forest small mammals

Population Based Training (PBT) is a recent approach that jointly optimizes neural network weights and hyperparameters which periodically copies weights of the best performers and mutates hyperparameters during training. Previous PBT implementations have been synchronized glass-box systems. We propose a general, black-box PBT framework that distributes many asynchronous "trials" (a small number of training steps with warm-starting) across a cluster, coordinated by the PBT controller. The black-box design does not make assumptions on model architectures, loss functions or training procedures. Our system supports dynamic hyperparameter schedules to optimize both differentiable and non-differentiable metrics. We apply our system to train a state-of-the-art WaveNet generative model for human voice synthesis. We show that our PBT system achieves better accuracy and faster convergence compared to existing methods, given the same computational resource. KEYWORDSneural networks, black-box optimization, population based training, speech synthesis, evolutionary algorithms, wavenet.

show abstract

Rapid training of deep neural networks without skip connections or normalization layers using Deep Kernel Shaping

Martens¹,

Ballard²,

Desjardins³

et al. 2021

Preprint

View full text Add to dashboard Cite

Using an extended and formalized version of the Q/C map analysis of Poole et al. (2016), along with Neural Tangent Kernel theory, we identify the main pathologies present in deep networks that prevent them from training fast and generalizing to unseen data, and show how these can be avoided by carefully controlling the "shape" of the network's initializationtime kernel function. We then develop a method called Deep Kernel Shaping (DKS), which accomplishes this using a combination of precise parameter initialization, activation function transformations, and small architectural tweaks, all of which preserve the model class.In our experiments we show that DKS enables SGD training of residual networks without normalization layers on Imagenet and CIFAR-10 classification tasks at speeds comparable to standard ResNetV2 and Wide-ResNet models, with only a small decrease in generalization performance. And when using K-FAC as the optimizer, we achieve similar results for networks without skip connections. Our results apply for a large variety of activation functions, including those which traditionally perform very badly, such as the logistic sigmoid. In addition to DKS, we contribute a detailed analysis of skip connections, normalization layers, special activation functions like RELU and SELU, and various initialization schemes, explaining their effectiveness as alternative (and ultimately incomplete) ways of "shaping" the network's initialization-time kernel.

show abstract

Open-Ended Learning Leads to Generally Capable Agents

Team¹,

Stooke²,

Mahajan³

et al. 2021

Preprint

View full text Add to dashboard Cite

Boat

Dalibard

Schaarschmidt

Yoneki

2017

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Valentin Dalibard

Grandmaster level in StarCraft II using multi-agent reinforcement learning

A Generalized Framework for Population Based Training

Rapid training of deep neural networks without skip connections or normalization layers using Deep Kernel Shaping

Open-Ended Learning Leads to Generally Capable Agents

Boat

Contact Info

Product

Resources

About