Pruning neural networks without any data by iteratively conserving synaptic flow

Tanaka, Hidenori; Kunin, Daniel; Yamins, Daniel; Ganguli, Surya

doi:10.48550/arxiv.2006.05467

Cited by 56 publications

(76 citation statements)

References 13 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Gradient-based methods infer the statistical performance of a network by leveraging the gradient information at initialization, which can be easily obtained using an automated differentiation tool in today's ML frameworks, such as PyTorch (Paszke et al, 2017) and TensorFlow (Abadi et al, 2016). For example, Snip and Grasp use a mini-batch of training samples and their gradients to calculate their metrics, while Synflow (Tanaka et al, 2020) is sample-free. An alternative stream of work (Turner et al, 2019;Theis et al, 2018) uses approximated secondorder gradients, known as empirical Fisher Information Matrix (FIM), at a random initialization point to infer the performance of a network.…”

Section: Related Workmentioning

confidence: 99%

“…The current practice to efficient MPI is gradient-based methods that leverage the gradient information of a network at initialization to infer its predictive performance Tanaka et al, 2020). Compared to directly measuring the accuracy of candidate networks on a training dataset, gradient-based methods are computationally more efficient since they only require evaluating • We provide a new perspective to view the overall optimization landscape of a network as a combination of sample-wise optimization landscapes.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

GradSign: Model Performance Inference with Theoretical Insights

Zhang

Jia

2021

Preprint

View full text Add to dashboard Cite

A key challenge in neural architecture search (NAS) is quickly inferring the predictive performance of a broad spectrum of networks to discover statistically accurate and computationally efficient ones. We refer to this task as model performance inference (MPI). The current practice for efficient MPI is gradient-based methods that leverage the gradients of a network at initialization to infer its performance. However, existing gradient-based methods rely only on heuristic metrics and lack the necessary theoretical foundations to consolidate their designs. We propose GradSign, an accurate, simple, and flexible metric for model performance inference with theoretical insights. The key idea behind GradSign is a quantity Ψ to analyze the optimization landscape of different networks at the granularity of individual training samples. Theoretically, we show that both the network's training and true population losses are proportionally upper bounded by Ψ under reasonable assumptions. In addition, we design GradSign, an accurate and simple approximation of Ψ using the gradients of a network evaluated at a random initialization state. Evaluation on seven NAS benchmarks across three training datasets shows that GradSign generalizes well to real-world networks and consistently outperforms state-of-the-art gradient-based methods for MPI evaluated by Spearman's ρ and Kendall's Tau. Additionally, we integrate GradSign into four existing NAS algorithms and show that the GradSign-assisted NAS algorithms outperform their vanilla counterparts by improving the accuracies of best-discovered networks by up to 0.3%, 1.1%, and 1.0% on three real-world tasks.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

GradSign: Model Performance Inference with Theoretical Insights

Zhang

Jia

2021

Preprint

View full text Add to dashboard Cite

show abstract

“…SNIP [20] aims to find performant subnetworks with a few mini-batch iterations. GraSP [39] and SynFlow [36] suggest that analyzing gradient-flow between layers enables identifying lottery tickets with a small set of training data or even without data.…”

Section: Related Workmentioning

confidence: 99%

Putting 3D Spatially Sparse Networks on a Diet

Lee¹,

Choy²,

Park³

2021

Preprint

View full text Add to dashboard Cite

POSTECH 1 NVIDIA 2 (a) Input point cloud (b) Reference Network, 37.85M Param., 71.57 mIoU(%) (c) Our WS 3 -ConvNet, 0.396M Param., 69.42 mIoU(%) Figure 1. Visualization of semantic label prediction of reference neural network (middle) and our WS 3 -ConvNet that is obtained by pruning 99% of the weights from the reference neural network (right). While the pruned model has 100 times smaller parameters, the mIoU over the entire ScanNet [6] validation split is 69.42%, only the 2.15% drops.

show abstract

“…Many insightful studies [Morcos et al, 2019, Orseau et al, 2020, Frankle et al, 2019, 2020, Malach et al, 2020, Pensia et al, 2020] are carried out to analyze these tickets, but it remains difficult to generalize to large models due to training cost. In an attempt, follow-up works , Tanaka et al, 2020 show that one can find tickets without training labels. We draw inspiration from one of them, Liu and Zenke [2020], which uses the NTK to avoid using labels in sparsifying networks.…”

Section: Related Workmentioning

confidence: 99%

“…A huge number of studies are carried out to analyze these tickets both empirically and theoretically: Morcos et al [2019] proposed to use one generalized lottery tickets for all vision benchmarks and got comparable results with the specialized lottery tickets; Frankle et al [2019] improves the stability of the lottery tickets by iterative pruning; Frankle et al [2020] found that subnetworks reach full accuracy only if they are stable against SGD noise during training; Orseau et al [2020] provides a logarithmic upper bound for the number of parameters it takes for the optimal sub-networks to exist; Pensia et al [2020] suggests a way to construct the lottery ticket by solving the subset sum problem and it's a proof by construction for the strong lottery ticket hypothesis. Furthermore, follow-up works [Liu and Zenke, 2020, Tanaka et al, 2020 show that we can find tickets without any training labels.…”

Section: M2 Lottery Ticket Hypothesismentioning

confidence: 99%

Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models

Chen¹,

Dao²,

Liang³

et al. 2021

Preprint

View full text Add to dashboard Cite

Overparameterized neural networks generalize well but are expensive to train. Ideally, one would like to reduce their computational cost while retaining their generalization benefits. Sparse model training is a simple and promising approach to achieve this, but there remain challenges as existing methods struggle with accuracy loss, slow training runtime, or difficulty in sparsifying all model components. The core problem is that searching for a sparsity mask over a discrete set of sparse matrices is difficult and expensive. To address this, our main insight is to optimize over a continuous superset of sparse matrices with a fixed structure known as products of butterfly matrices. As butterfly matrices are not hardware efficient, we propose simple variants of butterfly (block and flat) to take advantage of modern hardware. Our method (Pixelated Butterfly) uses a simple fixed sparsity pattern based on flat block butterfly and low-rank matrices to sparsify most network layers (e.g., attention, MLP). We empirically validate that Pixelated Butterfly is 3× faster than butterfly and speeds up training to achieve favorable accuracy-efficiency tradeoffs. On the ImageNet classification and WikiText-103 language modeling tasks, our sparse models train up to 2.5× faster than the dense MLP-Mixer, Vision Transformer, and GPT-2 medium with no drop in accuracy. * Equal contribution. Order determined by coin flip.

show abstract

Pruning neural networks without any data by iteratively conserving synaptic flow

Cited by 56 publications

References 13 publications

GradSign: Model Performance Inference with Theoretical Insights

GradSign: Model Performance Inference with Theoretical Insights

Putting 3D Spatially Sparse Networks on a Diet

Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models

Contact Info

Product

Resources

About