Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask

Zhou, Hattie; Lan, Janice; Liu, Rosanne; Yosinski, Jason

doi:10.48550/arxiv.1905.01067

Cited by 49 publications

(42 citation statements)

References 6 publications

(8 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The number of sub-networks -or potential winning tickets -dwindle rapidly if we decrease the size of the full network. To the best of our knowledge, the methods used for finding winning tickets [17,67] have not yet been explored in the case where the optimization method is ES, and much less in the context of indirect encoding. Our results hint that the Lottery Ticket Hypothesis might also hold in the indirect encoding setting that we employ here.…”

Section: Discussionmentioning

confidence: 99%

Evolving and merging hebbian learning rules

Winther

Risi

2021

Proceedings of the Genetic and Evolutionary Computation Conference

View full text Add to dashboard Cite

Generalization to out-of-distribution (OOD) circumstances after training remains a challenge for artificial agents. To improve the robustness displayed by plastic Hebbian neural networks, we evolve a set of Hebbian learning rules, where multiple connections are assigned to a single rule. Inspired by the biological phenomenon of the genomic bottleneck, we show that by allowing multiple connections in the network to share the same local learning rule, it is possible to drastically reduce the number of trainable parameters, while obtaining a more robust agent. During evolution, by iteratively using simple K-Means clustering to combine rules, our Evolve & Merge approach is able to reduce the number of trainable parameters from 61,440 to 1,920, while at the same time improving robustness, all without increasing the number of generations used. While optimization of the agents is done on a standard quadruped robot morphology, we evaluate the agents' performances on slight morphology modifications in a total of 30 unseen morphologies. Our results add to the discussion on generalization, overfitting and OOD adaptation. To create agents that can adapt to a wider array of unexpected situations, Hebbian learning combined with a regularising "genomic bottleneck" could be a promising research direction. CCS CONCEPTS• Computing methodologies → Bio-inspired approaches.

show abstract

Section: Discussionmentioning

confidence: 99%

Evolving and merging hebbian learning rules

Winther

Risi

2021

Proceedings of the Genetic and Evolutionary Computation Conference

View full text Add to dashboard Cite

show abstract

“…Frankle et al [8] use larger architectures by relaxing the restriction of reverting the weights to initial values. Zhou et al [48] show that the difference between the initial weights and fine-tuned weights can be another pruning criterion. The proxies for determining lottery tickets in a data-efficient way have been recently studied extensively as well.…”

Section: Related Workmentioning

confidence: 99%

“…Once we finish training a spatially sparse convnet with dense weights, we prune some of the weights while preserving the test accuracy. In this paper, we utilize several pruning methods that belong to the family of magnitudebased pruning [7,11,20,48]. Specifically, we use three pruning methods with different pruning criteria and apply the criteria on the entire weights (global) or per layer (local).…”

Section: Network Pruningmentioning

confidence: 99%

See 1 more Smart Citation

Putting 3D Spatially Sparse Networks on a Diet

Lee¹,

Choy²,

Park³

2021

Preprint

View full text Add to dashboard Cite

POSTECH 1 NVIDIA 2 (a) Input point cloud (b) Reference Network, 37.85M Param., 71.57 mIoU(%) (c) Our WS 3 -ConvNet, 0.396M Param., 69.42 mIoU(%) Figure 1. Visualization of semantic label prediction of reference neural network (middle) and our WS 3 -ConvNet that is obtained by pruning 99% of the weights from the reference neural network (right). While the pruned model has 100 times smaller parameters, the mIoU over the entire ScanNet [6] validation split is 69.42%, only the 2.15% drops.

show abstract

“…Lottery Tickets. Lottery Tickets [7,8,41] is an interesting phenomenon: if we reset "salient weights" (trained weights with large magnitude) back to the values before optimization but after initialization, prune other weights (often > 90% of total weights) and retrain the model, the test performance is the same or better; if we reinitialize salient weights, the test performance is much worse. In our theory, the salient weights are those lucky regions (E j3 and E j4 in Fig.…”

Section: Introductionmentioning

confidence: 99%

Luck Matters: Understanding Training Dynamics of Deep ReLU Networks

Tian,

Jiang,

Gong

et al. 2019

Preprint

View full text Add to dashboard Cite

We analyze the dynamics of training deep ReLU networks and their implications on generalization capability. Using a teacher-student setting, we discovered a novel relationship between the gradient received by hidden student nodes and the activations of teacher nodes for deep ReLU networks. With this relationship and the assumption of small overlapping teacher node activations, we prove that (1) student nodes whose weights are initialized to be close to teacher nodes converge to them at a faster rate, and (2) in over-parameterized regimes and 2-layer case, while a small set of lucky nodes do converge to the teacher nodes, the fanout weights of other nodes converge to zero. This framework provides insight into multiple puzzling phenomena in deep learning like over-parameterization, implicit regularization, lottery tickets, etc. We verify our assumption by showing that the majority of BatchNorm biases of pre-trained VGG11/13/16/19 models are negative. Experiments on (1) random deep teacher networks with Gaussian inputs, (2) teacher network pre-trained on CIFAR-10 and (3) extensive ablation studies validate our multiple theoretical predictions. Code is available at https://github.com/facebookresearch/luckmatters.

show abstract

Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask

Cited by 49 publications

References 6 publications

Evolving and merging hebbian learning rules

Evolving and merging hebbian learning rules

Putting 3D Spatially Sparse Networks on a Diet

Luck Matters: Understanding Training Dynamics of Deep ReLU Networks

Contact Info

Product

Resources

About