CVPR 2011 2011
DOI: 10.1109/cvpr.2011.5995466
|View full text |Cite
|
Sign up to set email alerts
|

Baby talk: Understanding and generating simple image descriptions

Abstract: We posit that visually descriptive language offers computer vision researchers both information about the world, and information about how people describe the world. The potential benefit from this source is made more significant due to the enormous amount of language data easily available today. We present a system to automatically generate natural language descriptions from images that exploits both statistics gleaned from parsing large quantities of text data and recognition algorithms from computer vision.… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

1
442
0
2

Year Published

2012
2012
2018
2018

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 491 publications
(445 citation statements)
references
References 25 publications
1
442
0
2
Order By: Relevance
“…A good caption for such an image is often only loosely related to the content of the image. The setting of this work is therefore different from that in [5][6][7][8][9][10][11][12], where the objective is to generate a caption that describes what is depicted in the image.…”
Section: Related Workmentioning
confidence: 99%
See 3 more Smart Citations
“…A good caption for such an image is often only loosely related to the content of the image. The setting of this work is therefore different from that in [5][6][7][8][9][10][11][12], where the objective is to generate a caption that describes what is depicted in the image.…”
Section: Related Workmentioning
confidence: 99%
“…However, even with 1 million images, it is unrealistic to expect that every possible query image with various objects and actions can be represented and found in such dataset. In contrast to this caption transfer approach, the work in [6][7][8][9][10][11][12] adopts the conventional content selection and surface realisation approach. Starting from the output of visual processing engines e.g.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…These can again be used both for image retrieval and generating short descriptive sentences given an image. Kulkarni et al [6] push this work a step further by generating more complex, natural language descriptions which are able to describe multiple objects, their attributes, and their spatial relations. Our work can be considered as a reverse process of this line of work.…”
Section: Introductionmentioning
confidence: 99%