Diverse Image Annotation

Wu, Baoyuan; Jia, Fan; Liu, Wei; Ghanem, Bernard

doi:10.1109/cvpr.2017.656

Cited by 29 publications

(18 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The reason is that D 2 IA-GAN may include more irrelevant tags, as the random noise combined with the image feature not only brings in diversity, but also uncertainty. Note that due to the randomness of sampling, the results of single subset by DIA presented here are slightly different with those reported in [23].…”

Section: Quantitative Resultscontrasting

confidence: 86%

“…They are also different in the training process, which will be reviewed in the Section 4. Besides, in DIA [23], 'diverse/diversity' refers to the semantic difference between tags in the same tag subset, to which we use the word 'distinct/distinctiveness' for the same meaning in this work. We use 'diverse/diversity' to indicate the semantic difference between multiple tag subsets for the same image.…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Tagging Like Humans: Diverse and Distinct Image Annotation

Chen

Sun

et al. 2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Self Cite

View full text Add to dashboard Cite

In this work we propose a new automatic image annotation model, dubbed diverse and distinct image annotation (D 2 IA). The generative model D 2 IA is inspired by the ensemble of human annotations, which create semantically relevant, yet distinct and diverse tags. In D 2 IA, we generate a relevant and distinct tag subset, in which the tags are relevant to the image contents and semantically distinct to each other, using sequential sampling from a determinantal point process (DPP) model. Multiple such tag subsets that cover diverse semantic aspects or diverse semantic levels of the image contents are generated by randomly perturbing the DPP sampling process. We leverage a generative adversarial network (GAN) model to train D 2 IA. Extensive experiments including quantitative and qualitative comparisons, as well as human subject studies, on two benchmark datasets demonstrate that the proposed model can produce more diverse and distinct tags than the state-of-the-arts.

show abstract

Section: Quantitative Resultscontrasting

confidence: 86%

Section: Related Workmentioning

confidence: 99%

Tagging Like Humans: Diverse and Distinct Image Annotation

Chen

Sun

et al. 2018

2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition

Self Cite

View full text Add to dashboard Cite

show abstract

“…Dense captioning [9] aims to identify all the salient regions in an image and describe each with a caption. Diverse image annotation [44] focuses on describing as much of the image as possible with a limited number of tags. Entity-aware captioning [25] employs hashtags as additional input.…”

Section: Descriptive Captioningmentioning

confidence: 99%

Towards Unique and Informative Captioning of Images

Wang

Feng

Narasimhan

et al. 2020

Preprint

View full text Add to dashboard Cite

Despite considerable progress, state of the art image captioning models produce generic captions, leaving out important image details. Furthermore, these systems may even misrepresent the image in order to produce a simpler caption consisting of common concepts. In this paper, we first analyze both modern captioning systems and evaluation metrics through empirical experiments to quantify these phenomena. We find that modern captioning systems return higher likelihoods for incorrect distractor sentences compared to ground truth captions, and that evaluation metrics like SPICE can be 'topped' using simple captioning systems relying on object detectors. Inspired by these observations, we design a new metric (SPICE-U) by introducing a notion of uniqueness over the concepts generated in a caption. We show that SPICE-U is better correlated with human judgements compared to SPICE, and effectively captures notions of diversity and descriptiveness. Finally, we also demonstrate a general technique to improve any existing captioning model -by using mutual information as a re-ranking objective during decoding. Empirically, this results in more unique and informative captions, and improves three different state-of-the-art models on SPICE-U as well as average score over existing metrics. 3

show abstract

“…We evaluate accuracy at k = 1 and k = 10, which measure how often the first ranked hashtag is in the groundtruth and how often at least one of the 10 highest ranked hashtags is in the groundtruth respectively. A desired feature of a tagging system is the ability to infer diverse and distinct tags [42,43]. In order to measure the variety of tags predicted by the models, we measure the percentage of all the test tags predicted at least once in the whole test set (%pred) and the percentage of all the test tags correctly predicted at least once (%cpred), considering the top 10 tags predicted for each image.…”

Section: Image Taggingmentioning

confidence: 99%

Location Sensitive Image Retrieval and Tagging

Moreno¹,

Gibert²,

Gómez³

et al. 2020

Preprint

View full text Add to dashboard Cite

People from different parts of the globe describe objects and concepts in distinct manners. Visual appearance can thus vary across different geographic locations, which makes location a relevant contextual information when analysing visual data. In this work, we address the task of image retrieval related to a given tag conditioned on a certain location on Earth. We present LocSens, a model that learns to rank triplets of images, tags and coordinates by plausibility, and two training strategies to balance the location influence in the final ranking. LocSens learns to fuse textual and location information of multimodal queries to retrieve related images at different levels of location granularity, and successfully utilizes location information to improve image tagging.

show abstract

Diverse Image Annotation

Abstract: In this work, we study a new image annotation task called diverse image annotation (DIA)

Cited by 29 publications

References 30 publications

Tagging Like Humans: Diverse and Distinct Image Annotation

Tagging Like Humans: Diverse and Distinct Image Annotation

Towards Unique and Informative Captioning of Images

Location Sensitive Image Retrieval and Tagging

Contact Info

Product

Resources

About