Spatial extensions to bag of visual words

Viitaniemi, Ville; Laaksonen, Jorma

doi:10.1145/1646396.1646441

Cited by 29 publications

(22 citation statements)

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, Viitaniemi et al observed that manually designed tilings achieve reasonable improvement over the SPM on the Pascal VOC dataset [24]. Similar observations have been confirmed on other datasets.…”

Section: Related Workmentioning

confidence: 57%

“…However, the problem is still tractable given the reasonable masks including the commonly used masks in the literature [12,24,30,21]. The numbers of all possible set partitions, tilings and equal tilings on different masks are listed in Table 1, where the Parameter column lists the parameters used in generating the masks 1 , e.g.…”

Section: Tiling Function Domainmentioning

confidence: 99%

“…The SPM assumes that the spatial BoW representation is independent of data. However, evidence has shown that manually defined representations [24,21,7,30,13] considering salient spatial layouts outperform the predefined BoW representation on many problems.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Towards Efficient Learning of Optimal Spatial Bag-of-Words Representations

Jiang

Tong

Meng

et al. 2014

Proceedings of International Conference on Multimedia Retrieval

View full text Add to dashboard Cite

Spatial Pyramid Matching (SPM) assumes that the spatial Bag-of-Words (BoW) representation is independent of data. However, evidence has shown that the assumption usually leads to a suboptimal representation. In this paper, we propose a novel method called Jensen-Shannon (JS) Tiling to learn the BoW representation from data directly at the BoW level. The proposed JS Tiling is especially appropriate for large-scale datasets as it is orders of magnitude faster than existing methods, but with comparable or even better classification precision. Experimental results on four benchmarks including two TRECVID12 datasets validate that JS Tiling outperforms the SPM and the state-of-the-art methods. The runtime comparison demonstrates that selecting BoW representations by JS Tiling is more than 1,000 times faster than running classifiers. Besides, JS Tiling is an important component contributing to CMU Teams' final submission in TRECVID 2012 Multimedia Event Detection.

show abstract

Section: Related Workmentioning

confidence: 57%

Section: Tiling Function Domainmentioning

confidence: 99%

See 1 more Smart Citation

Towards Efficient Learning of Optimal Spatial Bag-of-Words Representations

Jiang

Tong

Meng

et al. 2014

Proceedings of International Conference on Multimedia Retrieval

View full text Add to dashboard Cite

show abstract

“…The final spatial pyramid kernel is implemented as concatenating weighted histograms of all features at all sub-regions. The traditional bag-of-visual words scheme discards any spatial information; hence many methods utilizing this concept also introduce different spatial extensions [7,24].…”

Section: Related Workmentioning

confidence: 99%

Nearest-Neighbor based Metric Functions for indoor scene recognition

Çakir¹,

Güdükbay²,

Ulusoy³

2011

Computer Vision and Image Understanding

View full text Add to dashboard Cite

a b s t r a c tIndoor scene recognition is a challenging problem in the classical scene recognition domain due to the severe intra-class variations and inter-class similarities of man-made indoor structures. State-of-theart scene recognition techniques such as capturing holistic representations of an image demonstrate low performance on indoor scenes. Other methods that introduce intermediate steps such as identifying objects and associating them with scenes have the handicap of successfully localizing and recognizing the objects in a highly cluttered and sophisticated environment.We propose a classification method that can handle such difficulties of the problem domain by employing a metric function based on the Nearest-Neighbor classification procedure using the bag-of-visual words scheme, the so-called codebooks. Considering the codebook construction as a Voronoi tessellation of the feature space, we have observed that, given an image, a learned weighted distance of the extracted feature vectors to the center of the Voronoi cells gives a strong indication of the image's category. Our method outperforms state-of-the-art approaches on an indoor scene recognition benchmark and achieves competitive results on a general scene dataset, using a single type of descriptor.

show abstract

“…In (Sharma and Jurie, 2011), Sharma et al propose to learn the best discriminative grid splitting optimizing a given classification task. In (Viitaniemi and Laaksonen, 2009), Viitaniemi et al compare techniques of soft tiling and hard tiling. Furthermore, some works propose to learn or adapt weights rather than using fixed ones, as in (Harada et al, 2011).…”

Section: Introductionmentioning

confidence: 99%

Approximate Image Matching using Strings of Bag-of-Visual Words Representation

Nguyen

Barat

Ducottet

2014

Proceedings of the 9th International Conference on Computer Vision Theory and Applications

View full text Add to dashboard Cite

Abstract:The Spatial Pyramid Matching approach has become very popular to model images as sets of local bag-ofwords. The image comparison is then done region-by-region with an intersection kernel. Despite its success, this model presents some limitations: the grid partitioning is predefined and identical for all images and the matching is sensitive to intra-and inter-class variations. In this paper, we propose a novel approach based on approximate string matching to overcome these limitations and improve the results. First, we introduce a new image representation as strings of ordered bag-of-words. Second, we present a new edit distance specifically adapted to strings of histograms in the context of image comparison. This distance identifies local alignments between subregions and allows to remove sequences of similar subregions to better match two images. Experiments on 15 Scenes and Caltech 101 show that the proposed approach outperforms the classical spatial pyramid representation and most existing concurrent methods for classification presented in recent years.

show abstract

Spatial extensions to bag of visual words

Cited by 29 publications

References 16 publications

Towards Efficient Learning of Optimal Spatial Bag-of-Words Representations

Towards Efficient Learning of Optimal Spatial Bag-of-Words Representations

Nearest-Neighbor based Metric Functions for indoor scene recognition

Approximate Image Matching using Strings of Bag-of-Visual Words Representation

Contact Info

Product

Resources

About