2020
DOI: 10.48550/arxiv.2006.07990
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Optimal Lottery Tickets via SubsetSum: Logarithmic Over-Parameterization is Sufficient

Ankit Pensia,
Shashank Rajput,
Alliot Nagle
et al.

Abstract: The strong lottery ticket hypothesis (LTH) postulates that one can approximate any target neural network by only pruning the weights of a sufficiently over-parameterized random network. A recent work by Malach et al. [1] establishes the first theoretical analysis for the strong LTH: one can provably approximate a neural network of width d and depth l, by pruning a random one that is a factor O(d 4 l 2 ) wider and twice as deep. This polynomial over-parameterization requirement is at odds with recent experiment… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(6 citation statements)
references
References 22 publications
0
6
0
Order By: Relevance
“…Lottery tickets Frankle & Carbin [30] are a set of small sub-networks derived from a larger dense network, which outperforms their parent networks in convergence speed and potentially in generalization. A huge number of studies are carried out to analyze these tickets both empirically and theoretically: Morcos et al [75] proposed to use one generalized lottery tickets for all vision benchmarks and got comparable results with the specialized lottery tickets; Frankle et al [31] improves the stability of the lottery tickets by iterative pruning; Frankle et al [32] found that subnetworks reach full accuracy only if they are stable against SGD noise during training; Orseau et al [78] provides a logarithmic upper bound for the number of parameters it takes for the optimal sub-networks to exist; Pensia et al [81] suggests a way to construct the lottery ticket by solving the subset sum problem and it's a proof by construction for the strong lottery ticket hypothesis. Furthermore, follow-up works [68,102,96] show that we can find tickets without any training labels.…”
Section: A Extended Related Workmentioning
confidence: 99%
“…Lottery tickets Frankle & Carbin [30] are a set of small sub-networks derived from a larger dense network, which outperforms their parent networks in convergence speed and potentially in generalization. A huge number of studies are carried out to analyze these tickets both empirically and theoretically: Morcos et al [75] proposed to use one generalized lottery tickets for all vision benchmarks and got comparable results with the specialized lottery tickets; Frankle et al [31] improves the stability of the lottery tickets by iterative pruning; Frankle et al [32] found that subnetworks reach full accuracy only if they are stable against SGD noise during training; Orseau et al [78] provides a logarithmic upper bound for the number of parameters it takes for the optimal sub-networks to exist; Pensia et al [81] suggests a way to construct the lottery ticket by solving the subset sum problem and it's a proof by construction for the strong lottery ticket hypothesis. Furthermore, follow-up works [68,102,96] show that we can find tickets without any training labels.…”
Section: A Extended Related Workmentioning
confidence: 99%
“…Provable Pruning. Empirical pruning research inspired the development of theoretical foundations for network pruning, including sensitivity-based analysis , coreset methodologies (Mussay et al, 2019;Baykal et al, 2018), and pruning analysis of random networks (Malach et al, 2020;Orseau et al, 2020;Pensia et al, 2020;Ramanujan et al, 2019). Later work analyzed pruned network generalization (Zhang et al, 2021) and the amount of dense network pre-training needed to obtain high-performing sub-networks (Wolfe et al, 2021).…”
Section: Related Workmentioning
confidence: 99%
“…Lottery tickets [Frankle and Carbin, 2018] are a set of small sub-networks derived from a larger dense network, which outperforms their parent networks. Many insightful studies [Morcos et al, 2019, Orseau et al, 2020, Frankle et al, 2019, 2020, Malach et al, 2020, Pensia et al, 2020] are carried out to analyze these tickets, but it remains difficult to generalize to large models due to training cost. In an attempt, follow-up works , Tanaka et al, 2020 show that one can find tickets without training labels.…”
Section: Related Workmentioning
confidence: 99%
“…Lottery tickets Frankle and Carbin [2018] are a set of small sub-networks derived from a larger dense network, which outperforms their parent networks in convergence speed and potentially in generalization. A huge number of studies are carried out to analyze these tickets both empirically and theoretically: Morcos et al [2019] proposed to use one generalized lottery tickets for all vision benchmarks and got comparable results with the specialized lottery tickets; Frankle et al [2019] improves the stability of the lottery tickets by iterative pruning; Frankle et al [2020] found that subnetworks reach full accuracy only if they are stable against SGD noise during training; Orseau et al [2020] provides a logarithmic upper bound for the number of parameters it takes for the optimal sub-networks to exist; Pensia et al [2020] suggests a way to construct the lottery ticket by solving the subset sum problem and it's a proof by construction for the strong lottery ticket hypothesis. Furthermore, follow-up works [Liu and Zenke, 2020, Tanaka et al, 2020 show that we can find tickets without any training labels.…”
Section: M2 Lottery Ticket Hypothesismentioning
confidence: 99%