2020
DOI: 10.48550/arxiv.2010.05300
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Glance and Focus: a Dynamic Approach to Reducing Spatial Redundancy in Image Classification

Abstract: The accuracy of deep convolutional neural networks (CNNs) generally improves when fueled with high resolution images. However, this often comes at a high computational cost and high memory footprint. Inspired by the fact that not all regions in an image are task-relevant, we propose a novel framework that performs efficient image classification by processing a sequence of relatively small inputs, which are strategically selected from the original image with reinforcement learning. Such a dynamic decision proce… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
6
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 16 publications
(6 citation statements)
references
References 38 publications
0
6
0
Order By: Relevance
“…There have been a great number of works help models focus on important features and adopt geometric variations in computer vision. All related works fall into two categories, attention-based methods [1,13,14,27] and offset-based methods [5,8,28,34,35]. We mainly review offset-based ones.…”
Section: Deformable-related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…There have been a great number of works help models focus on important features and adopt geometric variations in computer vision. All related works fall into two categories, attention-based methods [1,13,14,27] and offset-based methods [5,8,28,34,35]. We mainly review offset-based ones.…”
Section: Deformable-related Workmentioning
confidence: 99%
“…Unlike our task, region proposal network uses supervision of bounding box annotations. In image classification task, there are also some works explicitly learning positions of the important regions for better performance [8] or faster inference [28]. The learning process are merely guided by cross-entropy loss and final accuracy.…”
Section: Deformable-related Workmentioning
confidence: 99%
“…However, computational efficiency is critical in real-world scenarios, where the executed computation is translated into power consumption or carbon emission. Many works have tried on reducing the computational cost of CNNs via neural architecture search [10,16,25,54,57], knowledge distillation [20,55], dynamic routing [4,13,43,51,56] and pruning [15,18], but how to accelerate the ViT model have been rarely explored.…”
Section: Model Compressionmentioning
confidence: 99%
“…These adaptive computation strategies help speed up the inference time of convolutional neural networks (CNNs) [32], recurrent neural network (RNNs) [14], and also self-attention based methods (BERT) [24]. Besides the model-level adaptation, others further extend this idea to data-level adaptation, by either reducing the spatial redundancy [56] or focusing on key area [52]. However, those methods are limited by the convolutional structure, where only 2D data can be taken as input.…”
Section: Related Workmentioning
confidence: 99%