2018
DOI: 10.1007/978-3-030-01240-3_4
|View full text |Cite
|
Sign up to set email alerts
|

Learning to Zoom: A Saliency-Based Sampling Layer for Neural Networks

Abstract: We introduce a saliency-based distortion layer for convolutional neural networks that helps to improve the spatial sampling of input data for a given task. Our differentiable layer can be added as a preprocessing block to existing task networks and trained altogether in an end-to-end fashion. The effect of the layer is to efficiently estimate how to sample from the original data in order to boost task performance. For example, for an image classification task in which the original data might range in size up t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
109
0

Year Published

2019
2019
2020
2020

Publication Types

Select...
5
2

Relationship

0
7

Authors

Journals

citations
Cited by 126 publications
(110 citation statements)
references
References 28 publications
1
109
0
Order By: Relevance
“…However, 448 input increases the computational cost (i.e., flops) by four times compared to 224 input. SSN [22] obtains a better results than DT-RAM [19], and our TASN can further obtain 2.9% relative improvement. Such improvements mainly come from two aspects: 1) a better sampling mechanism considering spatial distortion (1.2%), and 2) a better fine-grained details optimizing strategy (1.7%).…”
Section: Evaluation and Analysis On Cub-200-2011mentioning
confidence: 81%
See 2 more Smart Citations
“…However, 448 input increases the computational cost (i.e., flops) by four times compared to 224 input. SSN [22] obtains a better results than DT-RAM [19], and our TASN can further obtain 2.9% relative improvement. Such improvements mainly come from two aspects: 1) a better sampling mechanism considering spatial distortion (1.2%), and 2) a better fine-grained details optimizing strategy (1.7%).…”
Section: Evaluation and Analysis On Cub-200-2011mentioning
confidence: 81%
“…But without explicit guidance, it is hard to learn non-uniformed sampling parameters for sophisticated tasks such as fine-grained recognition, thus they finally learned two parts without non-uniformed sampling. SSN [22] firstly proposed to use saliency maps as the guidance of non-uniformed sampling and obtained significant improvements. Different from them, our attention sampler 1) conduct non-uniformed sampling based on trilinear attention maps, and 2) decomposes attention maps into two dimensions to reduce spatial distortion effects.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Spatial Transformer Networks [28,42] learn spatial transformations (warping) of the CNN input. They explore different parameterizations for spatial transformation including affine, projective, splines [28] or specially designed saliency-based layers [42]. Their focus is to undo different data distortions or to "zoom-in" on salient regions, while our approach is focused on efficient downsampling retaining as much information around semantic boundaries as possible.…”
Section: Prior Workmentioning
confidence: 99%
“…The work in [1] has demonstrated the advantages of foveated image processing with regard to improvements in computational efficiency (but did not address CNNs). In recent models of visual saliency using CNNs, images have been applied to networks using a foveal transform [2] [8]. However, those works did not investigate image size reduction and frame-rate speed-up, which is of critical importance for embedded systems.…”
Section: Arxiv:190809000v1 [Cscv] 15 Aug 2019mentioning
confidence: 99%