2019
DOI: 10.48550/arxiv.1905.01067
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask

Abstract: The recent "Lottery Ticket Hypothesis" paper by Frankle & Carbin showed that a simple approach to creating sparse networks (keep the large weights) results in models that are trainable from scratch, but only when starting from the same initial weights. The performance of these networks often exceeds the performance of the non-sparse base model, but for reasons that were not well understood. In this paper we study the three critical components of the Lottery Ticket (LT) algorithm, showing that each may be varie… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

2
39
0

Year Published

2019
2019
2022
2022

Publication Types

Select...
6
1
1

Relationship

0
8

Authors

Journals

citations
Cited by 49 publications
(42 citation statements)
references
References 6 publications
(8 reference statements)
2
39
0
Order By: Relevance
“…The number of sub-networks -or potential winning tickets -dwindle rapidly if we decrease the size of the full network. To the best of our knowledge, the methods used for finding winning tickets [17,67] have not yet been explored in the case where the optimization method is ES, and much less in the context of indirect encoding. Our results hint that the Lottery Ticket Hypothesis might also hold in the indirect encoding setting that we employ here.…”
Section: Discussionmentioning
confidence: 99%
“…The number of sub-networks -or potential winning tickets -dwindle rapidly if we decrease the size of the full network. To the best of our knowledge, the methods used for finding winning tickets [17,67] have not yet been explored in the case where the optimization method is ES, and much less in the context of indirect encoding. Our results hint that the Lottery Ticket Hypothesis might also hold in the indirect encoding setting that we employ here.…”
Section: Discussionmentioning
confidence: 99%
“…Frankle et al [8] use larger architectures by relaxing the restriction of reverting the weights to initial values. Zhou et al [48] show that the difference between the initial weights and fine-tuned weights can be another pruning criterion. The proxies for determining lottery tickets in a data-efficient way have been recently studied extensively as well.…”
Section: Related Workmentioning
confidence: 99%
“…Once we finish training a spatially sparse convnet with dense weights, we prune some of the weights while preserving the test accuracy. In this paper, we utilize several pruning methods that belong to the family of magnitudebased pruning [7,11,20,48]. Specifically, we use three pruning methods with different pruning criteria and apply the criteria on the entire weights (global) or per layer (local).…”
Section: Network Pruningmentioning
confidence: 99%
See 1 more Smart Citation
“…Lottery Tickets. Lottery Tickets [7,8,41] is an interesting phenomenon: if we reset "salient weights" (trained weights with large magnitude) back to the values before optimization but after initialization, prune other weights (often > 90% of total weights) and retrain the model, the test performance is the same or better; if we reinitialize salient weights, the test performance is much worse. In our theory, the salient weights are those lucky regions (E j3 and E j4 in Fig.…”
Section: Introductionmentioning
confidence: 99%