2020
DOI: 10.48550/arxiv.2006.05467
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Pruning neural networks without any data by iteratively conserving synaptic flow

Abstract: Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy both during training and at test time. Recent works have identified, through an expensive sequence of training and pruning cycles, the existence of winning lottery tickets or sparse trainable subnetworks at initialization. This raises a foundational question: can we identify highly sparse trainable subnetworks at initialization, without ever training, or indeed without ever looking … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
76
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

0
8

Authors

Journals

citations
Cited by 56 publications
(76 citation statements)
references
References 13 publications
0
76
0
Order By: Relevance
“…Gradient-based methods infer the statistical performance of a network by leveraging the gradient information at initialization, which can be easily obtained using an automated differentiation tool in today's ML frameworks, such as PyTorch (Paszke et al, 2017) and TensorFlow (Abadi et al, 2016). For example, Snip and Grasp use a mini-batch of training samples and their gradients to calculate their metrics, while Synflow (Tanaka et al, 2020) is sample-free. An alternative stream of work (Turner et al, 2019;Theis et al, 2018) uses approximated secondorder gradients, known as empirical Fisher Information Matrix (FIM), at a random initialization point to infer the performance of a network.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Gradient-based methods infer the statistical performance of a network by leveraging the gradient information at initialization, which can be easily obtained using an automated differentiation tool in today's ML frameworks, such as PyTorch (Paszke et al, 2017) and TensorFlow (Abadi et al, 2016). For example, Snip and Grasp use a mini-batch of training samples and their gradients to calculate their metrics, while Synflow (Tanaka et al, 2020) is sample-free. An alternative stream of work (Turner et al, 2019;Theis et al, 2018) uses approximated secondorder gradients, known as empirical Fisher Information Matrix (FIM), at a random initialization point to infer the performance of a network.…”
Section: Related Workmentioning
confidence: 99%
“…The current practice to efficient MPI is gradient-based methods that leverage the gradient information of a network at initialization to infer its predictive performance Tanaka et al, 2020). Compared to directly measuring the accuracy of candidate networks on a training dataset, gradient-based methods are computationally more efficient since they only require evaluating • We provide a new perspective to view the overall optimization landscape of a network as a combination of sample-wise optimization landscapes.…”
Section: Introductionmentioning
confidence: 99%
“…SNIP [20] aims to find performant subnetworks with a few mini-batch iterations. GraSP [39] and SynFlow [36] suggest that analyzing gradient-flow between layers enables identifying lottery tickets with a small set of training data or even without data.…”
Section: Related Workmentioning
confidence: 99%
“…Many insightful studies [Morcos et al, 2019, Orseau et al, 2020, Frankle et al, 2019, 2020, Malach et al, 2020, Pensia et al, 2020] are carried out to analyze these tickets, but it remains difficult to generalize to large models due to training cost. In an attempt, follow-up works , Tanaka et al, 2020 show that one can find tickets without training labels. We draw inspiration from one of them, Liu and Zenke [2020], which uses the NTK to avoid using labels in sparsifying networks.…”
Section: Related Workmentioning
confidence: 99%
“…A huge number of studies are carried out to analyze these tickets both empirically and theoretically: Morcos et al [2019] proposed to use one generalized lottery tickets for all vision benchmarks and got comparable results with the specialized lottery tickets; Frankle et al [2019] improves the stability of the lottery tickets by iterative pruning; Frankle et al [2020] found that subnetworks reach full accuracy only if they are stable against SGD noise during training; Orseau et al [2020] provides a logarithmic upper bound for the number of parameters it takes for the optimal sub-networks to exist; Pensia et al [2020] suggests a way to construct the lottery ticket by solving the subset sum problem and it's a proof by construction for the strong lottery ticket hypothesis. Furthermore, follow-up works [Liu and Zenke, 2020, Tanaka et al, 2020 show that we can find tickets without any training labels.…”
Section: M2 Lottery Ticket Hypothesismentioning
confidence: 99%