A Hybrid Model for Automatic Image Annotation

Murthy, Venkatesh; Can, Ethem F.; Manmatha, R.

doi:10.1145/2578726.2578774

Cited by 40 publications

(17 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Most papers use one type of metric, but there is a small number of papers which use other metrics (For instance, In [8] the precision and recall are computed per word but they are computed only for non-zero recall words and their average over all non-zero recall words are reported). We use the standard (most widely reported type of evaluation where the recall and precision are computed per word and their average over all the words are reported [13,1,12,15,10,2]. We strictly adhere to computing the recall and precision per word (for all the words) and reporting their means over all the words, thus making it a fair comparison to the majority of the works in this area.…”

Section: Methodsmentioning

confidence: 99%

“…Request permissions from permissions@acm.org. visual appearance [10,2,4,13]. In these papers the goal is to predict a fixed number of tags for a given test image that accurately describe the visual content.…”

Section: Introductionmentioning

confidence: 99%

“…One variant, CCA-KNN significantly outperforms all the previously published results. We are able to achieve this without requiring any computationally expensive metric learning approaches as used by almost all successful models [13,1,12,15]. Some papers using CCA for combing image and text have been previously proposed [3,1], but the key differences are, we use CNN features as opposed to multiple handcrafted features (representing images) and we use word embedding vectors instead of binary vectors (representing tags).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Automatic Image Annotation using Deep Learning Representations

Murthy

Maji

Manmatha

2015

Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

Self Cite

111

View full text Add to dashboard Cite

We propose simple and effective models for the image annotation that make use of Convolutional Neural Network (CNN) features extracted from an image and word embedding vectors to represent their associated tags. Our first set of models is based on the Canonical Correlation Analysis (CCA) framework that helps in modeling both views -visual features (CNN feature) and textual features (word embedding vectors) of the data. Results on all three variants of the CCA models, namely linear CCA, kernel CCA and CCA with k-nearest neighbor (CCA-KNN) clustering, are reported. The best results are obtained using CCA-KNN which outperforms previous results on the Corel-5k and the ESP-Game datasets and achieves comparable results on the IAPRTC-12 dataset. In our experiments we evaluate CNN features in the existing models which bring out the advantages of it over dozens of handcrafted features. We also demonstrate that word embedding vectors perform better than binary vectors as a representation of the tags associated with an image. In addition we compare the CCA model to a simple CNN based linear regression model, which allows the CNN layers to be trained using back-propagation.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Automatic Image Annotation using Deep Learning Representations

Murthy

Maji

Manmatha

2015

Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

Self Cite

111

View full text Add to dashboard Cite

show abstract

“…MCS is able to solve different classification tasks, as medical data classification [25], medical image retrieval [21], image annotation [19], and image retrieval [33]. In Multiple Queries for Image Retrieval (MQIR) systems [2] the end user would like to retrieve an interesting image by multiple queries (e.g., by more query images), which allows for a more expressive formulation of the query object, including different viewpoints and/or viewing conditions.…”

Section: Introductionmentioning

confidence: 99%

Content-Based Image Retrieval for Multiple Objects Search

Szücs

Papp

2017

Cybernetics and Information Technologies

View full text Add to dashboard Cite

“…In this paper, we address the issue of large scale image annotation, namely a large number of images with tags engaged. Many existing methods [2,19,14,10,3,15,1] for image annotation are established on small datasets, such as Corel5K [6], IAPRTC-12 [9] and ESP-game [18]. These datasets have only around 5000 to 20000 images, and these methods are difficult to be applied to large datasets.…”

Section: Introductionmentioning

confidence: 99%

Large Scale Image Annotation via Deep Representation Learning and Tag Embedding Learning

Wang

Kang

et al. 2015

Proceedings of the 5th ACM on International Conference on Multimedia Retrieval

View full text Add to dashboard Cite

In this paper, we focus on the issue of large scale image annotation, whereas most existing methods are devised for small datasets. A novel model based on deep representation learning and tag embedding learning is proposed. Specifically, the proposed model learns an unified latent space for image visual features and tag embeddings simultaneously. Furthermore, a metric matrix is introduced to estimate the relevance scores between images and tags. Finally, an objective function modeling triplet relationships (irrelevant tag, image, relevant tag) is proposed with maximum margin pursuit. The proposed model is easy to tackle new images and tags via online learning and has a relatively low test computation complexity. Experimental results on NUS-WIDE dataset demonstrate the effectiveness of the proposed model.

show abstract

A Hybrid Model for Automatic Image Annotation

Cited by 40 publications

References 20 publications

Automatic Image Annotation using Deep Learning Representations

Automatic Image Annotation using Deep Learning Representations

Content-Based Image Retrieval for Multiple Objects Search

Large Scale Image Annotation via Deep Representation Learning and Tag Embedding Learning

Contact Info

Product

Resources

About