Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary

Duygulu, Pınar; Barnard, Kobus; Freitas, João F. G. de; Forsyth, David

doi:10.1007/3-540-47979-1_7

Cited by 1,158 publications

(1,372 citation statements)

References 2 publications

Supporting

Mentioning

1,334

Contrasting

Unclassified

Order By: Relevance

“…The mapping of the descriptors to discrete indexes is performed according to a codebook C, which is typically learned from the local descriptors of the training images through kmeans clustering (Duygulu et al, 2002, Jeon and Manmatha, 2004, Quelhas et al, 2005. The assignment of the weight p i of visterm i in image p is as follows:…”

Section: Image Representationmentioning

confidence: 99%

Large Scale Online Learning of Image Similarity through Ranking

Chechik

Sharma

Shalit

et al. 2009

Lecture Notes in Computer Science

417

624

View full text Add to dashboard Cite

Learning a measure of similarity between pairs of objects is an important generic problem in machine learning. It is particularly useful in large scale applications like searching for an image that is similar to a given image or finding videos that are relevant to a given video. In these tasks, users look for objects that are not only visually similar but also semantically related to a given object. Unfortunately, the approaches that exist today for learning such semantic similarity do not scale to large datasets. This is both because typically their CPU and storage requirements grow quadratically with the sample size, and because many methods impose complex positivity constraints on the space of learned similarity functions.The current paper presents OASIS, an Online Algorithm for Scalable Image Similarity learning that learns a bilinear similarity measure over sparse representations. OASIS is an online dual approach using the passive-aggressive family of learning algorithms with a large margin criterion and an efficient hinge loss cost. Our experiments show that OASIS is both fast and accurate at a wide range of scales: for a dataset with thousands of images, it achieves better results than existing state-of-the-art methods, while being an order of magnitude faster. For large, web scale, datasets, OASIS can be trained on more than two million images from 150K text queries within 3 days on a single CPU. On this large scale dataset, human evaluations showed that 35% of the ten nearest neighbors of a given test image, as found by OASIS, were semantically relevant to that image. This suggests that query independent similarity could be accurately learned even for large scale datasets that could not be handled before.

show abstract

Section: Image Representationmentioning

confidence: 99%

Large Scale Online Learning of Image Similarity through Ranking

Chechik

Sharma

Shalit

et al. 2009

Lecture Notes in Computer Science

417

624

View full text Add to dashboard Cite

show abstract

“…These are Corel5k [4], ESP Game [20] and IAPRTC-12 [7]. While Corel-5k has become the de-facto dataset in this domain, the other two datasets are very challenging with significant diversity among their samples.…”

Section: Datasets and Featuresmentioning

confidence: 99%

“…cars, tracks, prototype grass, flowers, petals sky, grass, plane, lion tree, grass, tiger, park Figure 1: Example images from the Corel-5k dataset [4] and corresponding ground-truth labels. First image is an example of incomplete-labeling (tagged with "car" but not with "vehicle"); second image is an example of label-ambiguity (tagged with "flowers", though "blooms" would also have been equally correct); & third and fourth images are examples of structural-overlap ("lion" and "tiger" are two different but structurally related labels).…”

Section: Introductionmentioning

confidence: 99%

“…different labels, structurally they are very similar. Figure 1 shows such examples from Corel5k dataset [4]. All these issues combinedly give rise to the existence of sets of confusing labels within a vocabulary.…”

Section: Introductionmentioning

confidence: 99%

“…Among the image annotation models being proposed in the past, generative or nearestneighbour (NN)-based models [5,8,11,19,23] have particularly been shown to be successful for large vocabulary datasets such as Corel-5k [4], ESP Game [20] and IAPRTC-12 [7]. The reason behind this is that in NN-based models, given a sample, the labels that are not present in the ground-truth of its neighbouring samples are simply ignored, rather than being considered as negative.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Exploring SVM for Image Annotation in Presence of Confusing Labels

Verma¹,

Jawahar²

2013

Procedings of the British Machine Vision Conference 2013

View full text Add to dashboard Cite

We address the problem of automatic image annotation in large vocabulary datasets. In such datasets, for a given label, there could be several other labels that act as its confusing labels. Three possible factors for this are (i) incomplete-labeling ("cars" vs. "vehicle"), (ii) label-ambiguity ("flowers" vs. "blooms"), and (iii) structural-overlap ("lion" vs. "tiger"). While previous studies in this domain have mostly focused on nearest-neighbour based models, we show that even the conventional one-vs-rest SVM significantly outperforms several benchmark models. We also demonstrate that with a simple modification in the hinge-loss of SVM, it is possible to significantly improve its performance. In particular, we introduce a tolerance-parameter in the hinge-loss. This makes the new model more tolerant against the errors in the classification of samples tagged with confusing labels as compared to other samples. This tolerance parameter is automatically determined using visual similarity and dataset statistics. Experimental evaluations demonstrate that our method (referred to as SVM with Variable Tolerance or SVM-VT) shows promising results on the task of image annotation on three challenging datasets, and establishes a baseline for such models in this domain.

show abstract