Modeling scenes with local descriptors and latent aspects

Quelhas, Pedro; Monay, Florent; Odobez, Jean-Marc; Gática-Pérez, Daniel; Tuytelaars, Tinne; Gool, Luc Van

doi:10.1109/iccv.2005.152

Cited by 301 publications

(242 citation statements)

References 17 publications

Supporting

Mentioning

234

Contrasting

Unclassified

Order By: Relevance

“…For image representation, there is still no such approach that would be adequate for a wide variety of image processing problems. However, among the proposed representations, a consensus is emerging on using local descriptors for various tasks, for example (Lowe, 2004, Quelhas et al, 2005. This type of representation segments the image into regions of interest, and extracts visual features from each region.…”

Section: Image Representationmentioning

confidence: 99%

See 1 more Smart Citation

Large Scale Online Learning of Image Similarity through Ranking

Chechik

Sharma

Shalit

et al. 2009

Lecture Notes in Computer Science

418

624

View full text Add to dashboard Cite

Learning a measure of similarity between pairs of objects is an important generic problem in machine learning. It is particularly useful in large scale applications like searching for an image that is similar to a given image or finding videos that are relevant to a given video. In these tasks, users look for objects that are not only visually similar but also semantically related to a given object. Unfortunately, the approaches that exist today for learning such semantic similarity do not scale to large datasets. This is both because typically their CPU and storage requirements grow quadratically with the sample size, and because many methods impose complex positivity constraints on the space of learned similarity functions.The current paper presents OASIS, an Online Algorithm for Scalable Image Similarity learning that learns a bilinear similarity measure over sparse representations. OASIS is an online dual approach using the passive-aggressive family of learning algorithms with a large margin criterion and an efficient hinge loss cost. Our experiments show that OASIS is both fast and accurate at a wide range of scales: for a dataset with thousands of images, it achieves better results than existing state-of-the-art methods, while being an order of magnitude faster. For large, web scale, datasets, OASIS can be trained on more than two million images from 150K text queries within 3 days on a single CPU. On this large scale dataset, human evaluations showed that 35% of the ten nearest neighbors of a given test image, as found by OASIS, were semantically relevant to that image. This suggests that query independent similarity could be accurately learned even for large scale datasets that could not be handled before.

show abstract

Section: Image Representationmentioning

confidence: 99%

“…The mapping of the descriptors to discrete indexes is performed according to a codebook C, which is typically learned from the local descriptors of the training images through kmeans clustering (Duygulu et al, 2002, Jeon and Manmatha, 2004, Quelhas et al, 2005. The assignment of the weight p i of visterm i in image p is as follows:…”

Section: Image Representationmentioning

confidence: 99%

Large Scale Online Learning of Image Similarity through Ranking

Chechik

Sharma

Shalit

et al. 2009

Lecture Notes in Computer Science

418

624

View full text Add to dashboard Cite

show abstract

“…Another similar part-based image represenations that are proposed recentlty are visterms [15,23,24], SIFT-bags [39] blobs [7], and VLAD [14] vector representation of an image which aggregates descriptors based on a locality criterion in the feature space. The different approach is the one proposed by Morand et al [21].…”

Section: Analogy Between Information Retrieval and Cbirmentioning

confidence: 99%

Toward a higher-level visual representation for content-based image retrieval

Sayad

Martinet

Urruty

et al. 2010

Multimed Tools Appl

View full text Add to dashboard Cite

Having effective methods to access the desired images is essential nowadays with the availability of a huge amount of digital images. The proposed approach is based on an analogy between content-based image retrieval and text retrieval. The aim of the approach is to build a meaningful mid-level representation of images to be used later on for matching between a query image and other images in the desired database. The approach is based firstly on constructing different visual words using local patch extraction and fusion of descriptors. Secondly, we introduce a new method using multilayer pLSA to eliminate the noisiest words generated by the vocabulary building process. Thirdly, a new spatial weighting scheme is introduced that consists of weighting visual words according to the probability of each visual word to belong to each of the n Gaussian. Finally, we construct visual phrases from groups of visual words that are involved in strong association rules. Experimental results show that our approach outperforms the results of traditional image retrieval techniques.

show abstract

“…While most segmentation approaches segment image pixels or blocks based on their luminance, color or texture, in this work we consider local image regions characterized by viewpoint invariant descriptors [10]. This region representation, robust with respect to partial occlusion, clutter, and changes in viewpoint and illumination, has shown its applicability in a number of vision tasks [2,16,8,20,3,14,15]. Although local invariant regions do not provide a full segmentation of an image, they often occupy a considerable part of the scene and thus can define a "sparse" segmentation ( Fig.…”

Section: Introductionmentioning

confidence: 99%