2017
DOI: 10.1109/tmm.2017.2713410
|View full text |Cite
|
Sign up to set email alerts
|

HNIP: Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
48
0

Year Published

2018
2018
2020
2020

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 40 publications
(48 citation statements)
references
References 41 publications
0
48
0
Order By: Relevance
“…It has been pointed out in [5], [7], [9] that the CNN model is sensitive to reversal. Meanwhile, Gong et al [7] have proved that large-amplitude rotation could also lead to significant performance degradation in CNN models.…”
Section: The Orientation-invariant Representation For Cnn Modelsmentioning
confidence: 99%
See 2 more Smart Citations
“…It has been pointed out in [5], [7], [9] that the CNN model is sensitive to reversal. Meanwhile, Gong et al [7] have proved that large-amplitude rotation could also lead to significant performance degradation in CNN models.…”
Section: The Orientation-invariant Representation For Cnn Modelsmentioning
confidence: 99%
“…Recently, with the availability of both large-scale publicly available image datasets and high-performance processors, deep feature models and in particular Convolutional Neural Network (CNN), are viewed as the state-of-the-art in numerous visual tasks [5], [6]. However, many papers in the literature reveal that both deep and handcrafted representation models are sensitive to orientation (reversal or rotation) deformations [5], [7]- [9]. This in turn leads to an overall limitation of their performance in visual processing tasks.…”
Section: Introductionmentioning
confidence: 99%
See 1 more Smart Citation
“…Video retrieval aims to recover spatial-temporal locations of topics of interest from a large video corpora. Unlike typical approaches [1]- [3] that require exemplar videos, we focus on retrieval of activity that matches a user's description, or analyst or user described semantic activity (ADSA) query, from surveillance videos. Surveillance videos pose two unique issues: (a) wide query diversity; (b) the presence of many unrelated, co-occurring activities that share common components.…”
Section: Introductionmentioning
confidence: 99%
“…However, these CNN descriptors with different poolings are less invariant to geometric transformations like rotation and scale changes [4]. To further improve the discriminability of deep descriptors, the authors in work [10] proposed a nested invariance pooling (NIP) method to derive compact deep global descriptors. The authors also showed that the combination of hybrid pooling operations via NIP (HNIP) that incorporated CNN and the global descriptors of CDVS can significantly boost the visual search performance.…”
Section: Introductionmentioning
confidence: 99%