Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining 2022
DOI: 10.1145/3534678.3539260
|View full text |Cite
|
Sign up to set email alerts
|

Learned Token Pruning for Transformers

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
19
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 36 publications
(19 citation statements)
references
References 4 publications
0
19
0
Order By: Relevance
“…Pruning methods can also differ in the way that token reduction is applied. In fixed rate pruning (Goyal et al, 2020;Rao et al, 2021;Bolya et al, 2023;Liang et al, 2022;Xu et al, 2022) a predefined number of tokens is removed per layer, while in adaptive approaches (Kim et al, 2022;Yin et al, 2021; the tokens are pruned dynamically based on the input.…”
Section: Token Pruningmentioning
confidence: 99%
See 3 more Smart Citations
“…Pruning methods can also differ in the way that token reduction is applied. In fixed rate pruning (Goyal et al, 2020;Rao et al, 2021;Bolya et al, 2023;Liang et al, 2022;Xu et al, 2022) a predefined number of tokens is removed per layer, while in adaptive approaches (Kim et al, 2022;Yin et al, 2021; the tokens are pruned dynamically based on the input.…”
Section: Token Pruningmentioning
confidence: 99%
“…Our learned thresholds approach is conceptually similar to learned token pruning as introduced in Kim et al (2022). In each transformer block an importance score is calculated for every token x i , i ∈ {1, ..., n}, where n = hw is the number of tokens 1 .…”
Section: Learned Thresholds Pruningmentioning
confidence: 99%
See 2 more Smart Citations
“…Tang et al [28] presents a top-down layer by layer patch slimming algorithm to reduce the computational cost in pre-trained Vision Transformers. The core strategy of these algorithms and other similar works [11,13,19] is to abandon redundant tokens to reduce the computational complexity of the model.…”
Section: Related Workmentioning
confidence: 99%