Transform-Invariant Convolutional Neural Networks for Image Classification and Search

Shen, Xu; Tian, Xinmei; He, Anfeng; Sun, Shaoyan; Tao, Dacheng

doi:10.1145/2964284.2964316

Cited by 30 publications

(23 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, using the invariance metric of Goodfellow et al (2009), Shang et al (2016), their Figure 4c) averaged over multiple types of invariance (e.g., translation, rotation) and over all units within a layer and found a weak, non-monotonic increase in invariance across layers in a CNN similar to AlexNet. Using the same metric but different stimuli, Shen et al, 2016 found no increase and no systematic trend in invariance across layers of their implementation of AlexNet (their Figure 5). Although Güçlü and van Gerven (2015) plot an invariance metric against CNN layer, their metric is the half-width of a response profile, and thus it is unlike our TI selectivity metric.…”

Section: Discussionmentioning

confidence: 91%

“…Although other studies have examined translation invariance and related properties (rotation and reflection invariance) in artificial networks (Ranzato et al, 2007; Goodfellow et al, 2009; Lenc and Vedaldi, 2014; Zeiler and Fergus, 2013; Fawzi and Frossard, 2015; Güçlü and van Gerven, 2015 , Shang et al, 2016; Shen et al, 2016; Tsai and Cox, 2015), we are unaware of any study that has quantitatively documented a steady layer-to-layer increase of translation invariant form selectivity, measured for single units, across layers throughout a network like AlexNet. For example, using the invariance metric of Goodfellow et al (2009), Shang et al (2016), their Figure 4c) averaged over multiple types of invariance (e.g., translation, rotation) and over all units within a layer and found a weak, non-monotonic increase in invariance across layers in a CNN similar to AlexNet.…”

Section: Discussionmentioning

confidence: 94%

See 1 more Smart Citation

'Artiphysiology' reveals V4-like shape tuning in a deep network trained for image classification

Pospisil

Pasupathy

Bair

2018

eLife

View full text Add to dashboard Cite

Deep networks provide a potentially rich interconnection between neuroscientific and artificial approaches to understanding visual intelligence, but the relationship between artificial and neural representations of complex visual form has not been elucidated at the level of single-unit selectivity. Taking the approach of an electrophysiologist to characterizing single CNN units, we found many units exhibit translation-invariant boundary curvature selectivity approaching that of exemplar neurons in the primate mid-level visual area V4. For some V4-like units, particularly in middle layers, the natural images that drove them best were qualitatively consistent with selectivity for object boundaries. Our results identify a novel image-computable model for V4 boundary curvature selectivity and suggest that such a representation may begin to emerge within an artificial network trained for image categorization, even though boundary information was not provided during training. This raises the possibility that single-unit selectivity in CNNs will become a guide for understanding sensory cortex.

show abstract

Section: Discussionmentioning

confidence: 91%

Section: Discussionmentioning

confidence: 94%

'Artiphysiology' reveals V4-like shape tuning in a deep network trained for image classification

Pospisil

Pasupathy

Bair

2018

eLife

View full text Add to dashboard Cite

show abstract

“…As previous research [32,15,23] has pointed out, DCNN features are not invariant to large image transformations, such as scaling and rotation. While scaling has been handled in the original SiamFC tracker, the rotation of the target object is not considered.…”

Section: Angle Estimationmentioning

confidence: 85%

Towards a Better Match in Siamese Network Based Visual Object Tracker

Luo

Tian

et al. 2019

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite

Recently, Siamese network based trackers have received tremendous interest for their fast tracking speed and high performance. Despite the great success, this tracking framework still suffers from several limitations. First, it cannot properly handle large object rotation. Second, tracking gets easily distracted when the background contains salient objects. In this paper, we propose two simple yet effective mechanisms, namely angle estimation and spatial masking, to address these issues. The objective is to extract more representative features so that a better match can be obtained between the same object from different frames. The resulting tracker, named Siam-BM, not only significantly improves the tracking performance, but more importantly maintains the realtime capability. Evaluations on the VOT2017 dataset show that Siam-BM achieves an EAO of 0.335, which makes it the best-performing realtime tracker to date.

show abstract

“…Generally speaking, a certain range of background information is beneficial for tracking, but the context contains distracting objects could affect the quality of response maps. Second, the CNN features [ 19 , 20 ] is not invariant to large deformations, such as scale variations, rotation and occlusion. Therefore, Siamese-based trackers cannot handle well such complex geometric transformations.…”

Section: Introductionmentioning

confidence: 99%

HROM: Learning High-Resolution Representation and Object-Aware Masks for Visual Object Tracking

Zhang

Zheng

Wang

et al. 2020

Sensors

View full text Add to dashboard Cite

Siamese network-based trackers consider tracking as features cross-correlation between the target template and the search region. Therefore, feature representation plays an important role for constructing a high-performance tracker. However, all existing Siamese networks extract the deep but low-resolution features of the entire patch, which is not robust enough to estimate the target bounding box accurately. In this work, to address this issue, we propose a novel high-resolution Siamese network, which connects the high-to-low resolution convolution streams in parallel as well as repeatedly exchanges the information across resolutions to maintain high-resolution representations. The resulting representation is semantically richer and spatially more precise by a simple yet effective multi-scale feature fusion strategy. Moreover, we exploit attention mechanisms to learn object-aware masks for adaptive feature refinement, and use deformable convolution to handle complex geometric transformations. This makes the target more discriminative against distractors and background. Without bells and whistles, extensive experiments on popular tracking benchmarks containing OTB100, UAV123, VOT2018 and LaSOT demonstrate that the proposed tracker achieves state-of-the-art performance and runs in real time, confirming its efficiency and effectiveness.

show abstract

Transform-Invariant Convolutional Neural Networks for Image Classification and Search

Cited by 30 publications

References 32 publications

'Artiphysiology' reveals V4-like shape tuning in a deep network trained for image classification

'Artiphysiology' reveals V4-like shape tuning in a deep network trained for image classification

Towards a Better Match in Siamese Network Based Visual Object Tracker

HROM: Learning High-Resolution Representation and Object-Aware Masks for Visual Object Tracking

Contact Info

Product

Resources

About