HNIP: Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval

Lin, Jie; Duan, Ling-Yu; Wang, Shiqi; Bai, Yan; Lou, Yihang; Chandrasekhar, Vijay; Huang, Tiejun; Kot, Alex C.; Gao, Wen

doi:10.1109/tmm.2017.2713410

Cited by 40 publications

(48 citation statements)

References 41 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…It has been pointed out in [5], [7], [9] that the CNN model is sensitive to reversal. Meanwhile, Gong et al [7] have proved that large-amplitude rotation could also lead to significant performance degradation in CNN models.…”

Section: The Orientation-invariant Representation For Cnn Modelsmentioning

confidence: 99%

“…Recently, with the availability of both large-scale publicly available image datasets and high-performance processors, deep feature models and in particular Convolutional Neural Network (CNN), are viewed as the state-of-the-art in numerous visual tasks [5], [6]. However, many papers in the literature reveal that both deep and handcrafted representation models are sensitive to orientation (reversal or rotation) deformations [5], [7]- [9]. This in turn leads to an overall limitation of their performance in visual processing tasks.…”

Section: Introductionmentioning

confidence: 99%

“…The reason is primarily that existing deep features together with the handcrafted descriptors are generally not orientation-invariant. As a result the descriptor structures are radically altered by reversal [1], [5], [7], [9]. Consequently, it is difficult to address tasks such as feature correspondence once the orientation (rotation/reversal) of objects is modified.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Polar Transformation on Image Features for Orientation-Invariant Representations

Chen

Luo

Huang

et al. 2019

IEEE Trans. Multimedia

View full text Add to dashboard Cite

The choice of image feature representation plays a crucial role in the analysis of visual information. Although vast numbers of alternative robust feature representation models have been proposed to improve the performance of different visual tasks, most existing feature representations (e.g. handcrafted features or Convolutional Neural Networks (CNN)) have a relatively limited capacity to capture the highly orientationinvariant (rotation/reversal) features. The net consequence is suboptimal visual performance. To address these problems, this study adopts a novel transformational approach, which investigates the potential of using polar feature representations. Our low level consists of a histogram of oriented gradient, which is then binned using annular spatial bin-type cells applied to the polar gradient. This gives gradient binning invariance for feature extraction. In this way, the descriptors have significantly enhanced orientation-invariant capabilities. The proposed feature representation, termed orientation-invariant histograms of oriented gradients (Oi-HOG), is capable of accurately processing visual tasks (e.g., facial expression recognition). In the context of the CNN architecture, we propose two polar convolution operations, referred to as Full Polar Convolution (FPolarConv) and Local Polar Convolution (LPolarConv), and use these to develop polar architectures for the CNN orientation-invariant representation. Experimental results show that the proposed orientation-invariant image representation, based on polar models for both handcrafted features and deep learning features, is both competitive with state-of-the-art methods and maintains a compact representation on a set of challenging benchmark image datasets. Index Terms-Rotation-invariant and reversal-invariant representation, HOG, CNN.

show abstract

Section: The Orientation-invariant Representation For Cnn Modelsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Polar Transformation on Image Features for Orientation-Invariant Representations

Chen

Luo

Huang

et al. 2019

IEEE Trans. Multimedia

View full text Add to dashboard Cite

show abstract

“…Video retrieval aims to recover spatial-temporal locations of topics of interest from a large video corpora. Unlike typical approaches [1]- [3] that require exemplar videos, we focus on retrieval of activity that matches a user's description, or analyst or user described semantic activity (ADSA) query, from surveillance videos. Surveillance videos pose two unique issues: (a) wide query diversity; (b) the presence of many unrelated, co-occurring activities that share common components.…”

Section: Introductionmentioning

confidence: 99%

Probabilistic Semantic Retrieval for Surveillance Videos With Activity Graphs

Chen

Wang

Bai

et al. 2019

IEEE Trans. Multimedia

View full text Add to dashboard Cite

We present a novel framework for finding complex activities matching user-described queries in cluttered surveillance videos. The wide diversity of queries coupled with unavailability of annotated activity data limits our ability to train activity models. To bridge the semantic gap we propose to let users describe an activity as a semantic graph with object attributes and inter-object relationships associated with nodes and edges, respectively. We learn node/edge-level visual predictors during training and, at test-time, propose to retrieve activity by identifying likely locations that match the semantic graph. We formulate a novel CRF based probabilistic activity localization objective that accounts for mis-detections, mis-classifications and track-losses, and outputs a likelihood score for a candidate grounded location of the query in the video. We seek groundings that maximize overall precision and recall. To handle the combinatorial search over all high-probability groundings, we propose a highest precision subgraph matching algorithm. Our method outperforms existing retrieval methods on benchmarked datasets.

show abstract

“…However, these CNN descriptors with different poolings are less invariant to geometric transformations like rotation and scale changes [4]. To further improve the discriminability of deep descriptors, the authors in work [10] proposed a nested invariance pooling (NIP) method to derive compact deep global descriptors. The authors also showed that the combination of hybrid pooling operations via NIP (HNIP) that incorporated CNN and the global descriptors of CDVS can significantly boost the visual search performance.…”

Section: Introductionmentioning

confidence: 99%

Feature Fusion for Image Retrieval With Adaptive Bitrate Allocation and Hard Negative Mining

2019

View full text Add to dashboard Cite

By combining Convolutional Neural Network (CNN) descriptor and Compact Descriptors for Visual Search (CDVS), the visual search performance can be boosted. However, some redundancies still exist in the CDVS representation and the hard negative mining is not very accurate when training CNN embeddings. In this paper, we propose a high performance image retrieval scheme based on descriptor fusion. In detail, we first propose a more compact CDVS descriptor database building scheme through bitrate allocation, which can reduce information redundancy and boost image retrieval performance. We then propose a highly accurate CDVS-guided hard negative mining scheme when training CNN embeddings. In the hard negative selection, the CDVS descriptor and the CNN embedding are adaptively weighted together to achieve more precise decisions. Finally, the retrieval result is further refined through CDVS local descriptor matching by removing the irrelevant targets from the top positions. Extensive experimental results show that the proposed method outperforms the recent hybrid method and several other anchors remarkably, and produces better visual search performance. Codes and some models are available at https://github.com/WendyDong/ImageRetrieval_DF_CDVS. INDEX TERMS Image retrieval, CDVS, CNN, hard negative mining, deep learning.

show abstract

HNIP: Compact Deep Invariant Representations for Video Matching, Localization, and Retrieval

Cited by 40 publications

References 41 publications

Polar Transformation on Image Features for Orientation-Invariant Representations

Polar Transformation on Image Features for Orientation-Invariant Representations

Probabilistic Semantic Retrieval for Surveillance Videos With Activity Graphs

Feature Fusion for Image Retrieval With Adaptive Bitrate Allocation and Hard Negative Mining

Contact Info

Product

Resources

About