Proceedings of the 24th ACM International Conference on Multimedia 2016
DOI: 10.1145/2964284.2964316
|View full text |Cite
|
Sign up to set email alerts
|

Transform-Invariant Convolutional Neural Networks for Image Classification and Search

Abstract: Convolutional neural networks (CNNs) have achieved stateof-the-art results on many visual recognition tasks. However, current CNN models still exhibit a poor ability to be invariant to spatial transformations of images. Intuitively, with sufficient layers and parameters, hierarchical combinations of convolution (matrix multiplication and nonlinear activation) and pooling operations should be able to learn a robust mapping from transformed input images to transform-invariant representations. In this paper, we p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
23
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
5
2
1

Relationship

1
7

Authors

Journals

citations
Cited by 30 publications
(23 citation statements)
references
References 32 publications
0
23
0
Order By: Relevance
“…For example, using the invariance metric of Goodfellow et al (2009), Shang et al (2016), their Figure 4c) averaged over multiple types of invariance (e.g., translation, rotation) and over all units within a layer and found a weak, non-monotonic increase in invariance across layers in a CNN similar to AlexNet. Using the same metric but different stimuli, Shen et al, 2016 found no increase and no systematic trend in invariance across layers of their implementation of AlexNet (their Figure 5). Although Güçlü and van Gerven (2015) plot an invariance metric against CNN layer, their metric is the half-width of a response profile, and thus it is unlike our TI selectivity metric.…”
Section: Discussionmentioning
confidence: 91%
See 1 more Smart Citation
“…For example, using the invariance metric of Goodfellow et al (2009), Shang et al (2016), their Figure 4c) averaged over multiple types of invariance (e.g., translation, rotation) and over all units within a layer and found a weak, non-monotonic increase in invariance across layers in a CNN similar to AlexNet. Using the same metric but different stimuli, Shen et al, 2016 found no increase and no systematic trend in invariance across layers of their implementation of AlexNet (their Figure 5). Although Güçlü and van Gerven (2015) plot an invariance metric against CNN layer, their metric is the half-width of a response profile, and thus it is unlike our TI selectivity metric.…”
Section: Discussionmentioning
confidence: 91%
“…Although other studies have examined translation invariance and related properties (rotation and reflection invariance) in artificial networks (Ranzato et al, 2007; Goodfellow et al, 2009; Lenc and Vedaldi, 2014; Zeiler and Fergus, 2013; Fawzi and Frossard, 2015; Güçlü and van Gerven, 2015 , Shang et al, 2016; Shen et al, 2016; Tsai and Cox, 2015), we are unaware of any study that has quantitatively documented a steady layer-to-layer increase of translation invariant form selectivity, measured for single units, across layers throughout a network like AlexNet. For example, using the invariance metric of Goodfellow et al (2009), Shang et al (2016), their Figure 4c) averaged over multiple types of invariance (e.g., translation, rotation) and over all units within a layer and found a weak, non-monotonic increase in invariance across layers in a CNN similar to AlexNet.…”
Section: Discussionmentioning
confidence: 94%
“…As previous research [32,15,23] has pointed out, DCNN features are not invariant to large image transformations, such as scaling and rotation. While scaling has been handled in the original SiamFC tracker, the rotation of the target object is not considered.…”
Section: Angle Estimationmentioning
confidence: 85%
“…Generally speaking, a certain range of background information is beneficial for tracking, but the context contains distracting objects could affect the quality of response maps. Second, the CNN features [ 19 , 20 ] is not invariant to large deformations, such as scale variations, rotation and occlusion. Therefore, Siamese-based trackers cannot handle well such complex geometric transformations.…”
Section: Introductionmentioning
confidence: 99%