2020
DOI: 10.48550/arxiv.2005.03454
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Successfully Applying the Stabilized Lottery Ticket Hypothesis to the Transformer Architecture

Abstract: Sparse models require less memory for storage and enable a faster inference by reducing the necessary number of FLOPs. This is relevant both for time-critical and on-device computations using neural networks. The stabilized lottery ticket hypothesis states that networks can be pruned after none or few training iterations, using a mask computed based on the unpruned converged model. On the transformer architecture and the WMT 2014 English→German and English→French tasks, we show that stabilized lottery ticket p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
5
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(5 citation statements)
references
References 2 publications
0
5
0
Order By: Relevance
“…(d) Success of LTs. While it is exciting to see widespread applicability of LTs in different domains (Brix, Bahar, and Ney 2020;Li et al 2020;Venkatesh et al 2020), the results presented in this paper suggest this success may be due to the underlying pruning algorithm (and transfer learning) rather than LT initializations themselves.…”
Section: Methodsmentioning
confidence: 73%
“…(d) Success of LTs. While it is exciting to see widespread applicability of LTs in different domains (Brix, Bahar, and Ney 2020;Li et al 2020;Venkatesh et al 2020), the results presented in this paper suggest this success may be due to the underlying pruning algorithm (and transfer learning) rather than LT initializations themselves.…”
Section: Methodsmentioning
confidence: 73%
“…In the context of the compression approach applied to all models of NLP, it exhibits the idea of masking out the weights that have low magnitude and do not contribute much to the output. The core intent of pruning is first to train massive neural networks and then mask out the weights and finally reach a sparse sub-network that does all the heavy lifting of the neural network (Brix et al, 2020) [123], Behnke and Headfield used [124] pruning applied to neural networks to fasten inference and utilized group lasso regularization to help prune the entire heads and feed-forward connections allowing the model to speed up by 51%.…”
Section: Pruningmentioning
confidence: 99%
“…Although IMP methods [84,2,18,5,46] [24,82,51]. Based on IMP technique, researchers found the existence of LTH in various applications including visual recognition tasks [24], natural language processing [4,7,53], reinforcement learning [69,81], generative model [31], low-cost neural network ensembling [46], and improving robustness [8]. Although LTH has been actively explored in ANN domain, LTH for SNNs is rarely studied.…”
Section: Lottery Ticket Hypothesismentioning
confidence: 99%
“…The discovered sub-networks and their corresponding initialization parameters are referred as winning tickets. Based on LTH, a line of works successfully have shown the existence of winning tickets across various tasks such as standard recognition task [80,22,24], reinforcement learning [69,81], natural language processing [4,7,53], and generative model [31]. Along the same line, our primary research objective is to investigate the existence of winning tickets in deep SNNs which has a different type of neuronal dynamics from the common ANNs.…”
Section: Introductionmentioning
confidence: 99%