Fast Visual Retrieval Using Accelerated Sequence Matching

Abstract:The Spatial Pyramid Matching approach has become very popular to model images as sets of local bag-ofwords. The image comparison is then done region-by-region with an intersection kernel. Despite its success, this model presents some limitations: the grid partitioning is predefined and identical for all images and the matching is sensitive to intra-and inter-class variations. In this paper, we propose a novel approach based on approximate string matching to overcome these limitations and improve the results. First, we introduce a new image representation as strings of ordered bag-of-words. Second, we present a new edit distance specifically adapted to strings of histograms in the context of image comparison. This distance identifies local alignments between subregions and allows to remove sequences of similar subregions to better match two images. Experiments on 15 Scenes and Caltech 101 show that the proposed approach outperforms the classical spatial pyramid representation and most existing concurrent methods for classification presented in recent years.

show abstract

“…The work of (Yeh and Cheng, 2011) is the most similar to our approach. However, their representation is questionable.…”

Section: Introductionmentioning

confidence: 87%

Approximate Image Matching using Strings of Bag-of-Visual Words Representation

Nguyen

Barat

Ducottet

2014

Proceedings of the 9th International Conference on Computer Vision Theory and Applications

View full text Add to dashboard Cite

show abstract

“…In the image domain, several problems have successfully been modelled and solved using strings rather than local feature vectors, e.g. text recognition [9,10,11], shape matching [12,13], image classification [14,15,16,17] and video classification [18]. In this section, we focus on the related work that addresses the question of spatial information and topological relationships within SPR and CNN frameworks for image classification.…”

Section: Related Workmentioning

confidence: 99%

“…Some attempts have been made to introduce order and topological information into this type of representations using strings [16,17]. In [16], the authors use a 4 × 4 partitioning (SPR level 2). Then, an image is represented as a string of 16 local SIFT BoVW obtained following the raster-scan ordering.…”

Section: Spr Based Modelsmentioning

confidence: 99%

String representations and distances in deep Convolutional Neural Networks for image classification

Barat

Ducottet

2016

Pattern Recognition

View full text Add to dashboard Cite

International audienceRecent advances in image classification mostly rely on the use of powerful local features combined with an adapted image representation. Although Convolutional Neural Network (CNN) features learned from ImageNet were shown to be generic and very efficient, they still lack of flexibility to take into account variations in the spatial layout of visual elements. In this paper, we investigate the use of structural representations on top of pre-trained CNN features to improve image classification. Images are represented as strings of CNN features. Similarities between such representations are computed using two new edit distance variants adapted to the image classification domain. Our algorithms have been implemented and tested on several challenging datasets, 15Scenes, Caltech101, Pas-cal VOC 2007 and MIT indoor. The results show that our idea of using structural string representations and distances clearly improves the classification performance over standard approaches based on CNN and SVM with linear kernel, as well as other recognized methods of the literature

show abstract

“…While the feature representation stage can be further classified into two categories: global feature and local feature, each has different design of video content representations and similarity metrics between feature sequences. Yeh et al proposed a global frame-level descriptor [6], which is a compact 16-dimensional feature vector based on computing the spectral properties of a graph built from partitioned blocks of a frame, and a fast sequence matching scheme: dot plot [5]. Chiu et al [13] combines both global and local feature descriptors and integrates min-hashing and spatiotemporal matching to detect video copies.…”

Section: Introductionmentioning

confidence: 99%

A Computationally Efficient Algorithm for Large Scale Near-Duplicate Video Detection

Liu

2015

MultiMedia Modeling

View full text Add to dashboard Cite

Abstract. Large scale near-duplicate video detection is very desirable for web video processing, especially the computational efficiency is essential for practical applications. In this paper, we present a computationally efficient algorithm based on multi-layer video content analysis. Local features are extracted from key frames of videos and indexed by an novel adaptive locality sensitive hashing scheme. By learning several parameters, fast retrieval in the new hashing structure is performed without high dimensional distance computations and achieves better real-time retrieving performance compared with other state-ofthe-art approaches. Then a descriptor filtering method and a two-level matching scheme is performed to generate a relevance score for detection. Experiments on near-duplicate video detection tasks including various transformed videos demonstrate the efficiency gains of the proposed algorithm.

show abstract

Fast Visual Retrieval Using Accelerated Sequence Matching

Cited by 15 publications

References 36 publications

Approximate Image Matching using Strings of Bag-of-Visual Words Representation

Approximate Image Matching using Strings of Bag-of-Visual Words Representation

String representations and distances in deep Convolutional Neural Networks for image classification

A Computationally Efficient Algorithm for Large Scale Near-Duplicate Video Detection

Contact Info

Product

Resources

About