Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP 2020
DOI: 10.18653/v1/2020.blackboxnlp-1.19
|View full text |Cite
|
Sign up to set email alerts
|

Dissecting Lottery Ticket Transformers: Structural and Behavioral Study of Sparse Neural Machine Translation

Abstract: Recent work on the lottery ticket hypothesis has produced highly sparse Transformers for NMT while maintaining BLEU. However, it is unclear how such pruning techniques affect a model's learned representations. By probing Transformers with more and more lowmagnitude weights pruned away, we find that complex semantic information is first to be degraded. Analysis of internal activations reveals that higher layers diverge most over the course of pruning, gradually becoming less complex than their dense counterpart… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1

Citation Types

0
3
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 27 publications
(29 reference statements)
0
3
0
Order By: Relevance
“…LTH (Frankle and Carbin, 2018) has been widely explored in various applications of deep learning (Brix et al, 2020;Movva and Zhao, 2020;Girish et al, 2020). Most of existing results focus on finding unstructured winning tickets via iterative magnitude pruning and rewinding in randomly initialized networks (Frankle et al, 2019;Renda et al, 2020), where each ticket is a single parameter.…”
Section: Structured and Unstructured Lthsmentioning
confidence: 99%
See 1 more Smart Citation
“…LTH (Frankle and Carbin, 2018) has been widely explored in various applications of deep learning (Brix et al, 2020;Movva and Zhao, 2020;Girish et al, 2020). Most of existing results focus on finding unstructured winning tickets via iterative magnitude pruning and rewinding in randomly initialized networks (Frankle et al, 2019;Renda et al, 2020), where each ticket is a single parameter.…”
Section: Structured and Unstructured Lthsmentioning
confidence: 99%
“…The existence of such a collection of tickets, which is usually referred to as "winning tickets", indicates the potential of training a smaller network to achieve the full model's performance. LTH has been widely explored in across various fields of deep learning (Frankle et al, 2019;You et al, 2019;Brix et al, 2020;Movva and Zhao, 2020;Girish et al, 2020).…”
Section: Introductionmentioning
confidence: 99%
“…However, this method also requires the pre-trained model in order to prune the initial model. Researchers have proposed the methods that perform the pruning from the untrained model [19][20][21][22]. Wang et al [21] added a scalar 'gate value' to measure the effectiveness of each filter in the initial model.…”
Section: Related Workmentioning
confidence: 99%