2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2016
DOI: 10.1109/cvprw.2016.61
|View full text |Cite
|
Sign up to set email alerts
|

Rich Image Captioning in the Wild

Abstract: We present an image caption system that addresses new challenges of automatically describing images in the wild. The challenges include high quality caption quality with respect to human judgments, out-of-domain data handling, and low latency required in many applications. Built on top of a state-of-the-art framework, we developed a deep vision model that detects a broad range of visual concepts, an entity recognition model that identifies celebrities and landmarks, and a confidence model for the caption outpu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

1
63
0

Year Published

2017
2017
2021
2021

Publication Types

Select...
5
3
1

Relationship

0
9

Authors

Journals

citations
Cited by 100 publications
(64 citation statements)
references
References 26 publications
1
63
0
Order By: Relevance
“…Recent progress in image captioning, the task of generating natural language descriptions of visual content [11,12,18,19,43,46], can be largely attributed to the publicly available large-scale datasets of image-caption pairs [6,16,50] as well as steady modeling improvements [4,26,37,48]. However, these models generalize poorly to images in the wild [39] despite impressive benchmark performance, because they are trained on datasets which cover a tiny fraction of the long-tailed distribution of visual concepts in the real world. For example, models trained on COCO Captions [6] can typically describe images containing dogs, people and umbrellas, but not accordions or dolphins.…”
Section: Introductionmentioning
confidence: 99%
“…Recent progress in image captioning, the task of generating natural language descriptions of visual content [11,12,18,19,43,46], can be largely attributed to the publicly available large-scale datasets of image-caption pairs [6,16,50] as well as steady modeling improvements [4,26,37,48]. However, these models generalize poorly to images in the wild [39] despite impressive benchmark performance, because they are trained on datasets which cover a tiny fraction of the long-tailed distribution of visual concepts in the real world. For example, models trained on COCO Captions [6] can typically describe images containing dogs, people and umbrellas, but not accordions or dolphins.…”
Section: Introductionmentioning
confidence: 99%
“…also Section 2). Recently, the captionbot system [10] was proposed to generate captions for a given image. However, all these approaches focus on the description of what is to be explicitly found, i.e., depictable, within pictures.…”
Section: Introductionmentioning
confidence: 99%
“…It takes the previously generated words and then finds a sentence/caption that has the highest likelihood to caption the image that contains every word it has detected. Tran et al [17] have produced a system that could richly caption images. Their research claims to be able to detect and classify a large range of visual concepts.…”
Section: Recent Research On Image Captioningmentioning
confidence: 99%
“…Tran et al [17] Rich description -Adding specifics to image, such as person and location  Presenting a caption model for open domain images, which utilizes a composite approach.  Enriching existing frameworks with visual concepts such as landmarks and celebrity identification.…”
Section: Related Workmentioning
confidence: 99%