Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua 2015
DOI: 10.3115/v1/n15-1053
|View full text |Cite
|
Sign up to set email alerts
|

Déjà Image-Captions: A Corpus of Expressive Descriptions in Repetition

Abstract: We present a new approach to harvesting a large-scale, high quality image-caption corpus that makes a better use of already existing web data with no additional human efforts. The key idea is to focus on Déjà Image-Captions: naturally existing image descriptions that are repeated almost verbatim -by more than one individual for different images. The resulting corpus provides association structure between 4 million images with 180K unique captions, capturing a rich spectrum of everyday narratives including figu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
26
0

Year Published

2015
2015
2023
2023

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 26 publications
(26 citation statements)
references
References 27 publications
0
26
0
Order By: Relevance
“…An extensive overview of the datasets available for image captioning is provided by [3]. The three biggest datasets are MS COCO [17], SBU1M Captions [20], Deja-Image Captions [4]. Work done by [14] and [29] has achieved state-of-the-art results in image captioning.…”
Section: Description Of Images-in-isolationmentioning
confidence: 99%
“…An extensive overview of the datasets available for image captioning is provided by [3]. The three biggest datasets are MS COCO [17], SBU1M Captions [20], Deja-Image Captions [4]. Work done by [14] and [29] has achieved state-of-the-art results in image captioning.…”
Section: Description Of Images-in-isolationmentioning
confidence: 99%
“…• Déjà Images Dataset (Chen et al, 2015) consists of 180K unique user-generated captions associated with 4M Flickr images, where one caption is aligned with multiple images. This dataset was collected by querying Flickr for 693 high frequency nouns, then further filtered to have at least one verb and be judged as "good" captions by workers on Amazon's Mechanical Turk (Turkers).…”
Section: User-generated Captionsmentioning
confidence: 99%
“…Moreover, there are studies collecting paraphrases from captions to videos (Chen and Dolan, 2011) and images (Chen et al, 2015). One advantage of leveraging crowdsourcing is that annotation is done inexpensively, but it requires careful task design to gather valid data from non-expert annotators.…”
Section: Related Workmentioning
confidence: 99%