2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2019
DOI: 10.1109/cvpr.2019.01280
|View full text |Cite
|
Sign up to set email alerts
|

Engaging Image Captioning via Personality

Abstract: Standard image captioning tasks such as COCO and Flickr30k are factual, neutral in tone and (to a human) state the obvious (e.g., "a man playing a guitar"). While such tasks are useful to verify that a machine understands the content of an image, they are not engaging to humans as captions. With this in mind we define a new task, PERSONALITY-CAPTIONS, where the goal is to be as engaging to humans as possible by incorporating controllable style and personality traits. We collect and release a large dataset of 2… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
108
0
1

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 131 publications
(110 citation statements)
references
References 49 publications
1
108
0
1
Order By: Relevance
“…Again, the results reveal that the ranking of the three systems is identical across all evaluation scores, even though BISON measures different aspects of the system than the captioning scores. In line with prior work [50], we find that the UpDown captioning system outperforms its competitors in terms of all evaluation measures, including BISON.…”
Section: Resultssupporting
confidence: 89%
See 1 more Smart Citation
“…Again, the results reveal that the ranking of the three systems is identical across all evaluation scores, even though BISON measures different aspects of the system than the captioning scores. In line with prior work [50], we find that the UpDown captioning system outperforms its competitors in terms of all evaluation measures, including BISON.…”
Section: Resultssupporting
confidence: 89%
“…As a result, the evaluations may be sensitive to changes in the reference caption set and incorrectly assess the semantics of the generated caption. We perform an analysis designed to study these effects on the COCO captions validation set by asking human annotators to assess image captions generated by the state-of-the-art UpDown [4,50] captioning system 1 . Specifically, we followed the COCO guidelines for human evaluation [1] and asked annotators to evaluate the "correctness" of image-caption pairs on a Likert scale from 1 (low) to 5 (high).…”
Section: Figurementioning
confidence: 99%
“…Thus, methods developed on such datasets might not be easily adopted in the wild. Nevertheless, great efforts have been made to extend captioning to out-of-domain data [3,9,69] or different styles beyond mere factual descriptions [22,55]. In this work we explore unsupervised captioning, where image and language sources are independent.…”
Section: Language Domainmentioning
confidence: 99%
“…jealous girlfriend) versus high-level personality models such as the Big Five. We believe that TV Tropes is better for our purpose of fictional character modeling than data sources used in works such as Shuster et al (2019) because TV Tropes' content providers are rewarded for correctly providing content through community acknowledgement.…”
Section: Human Level Attributes (Hla)mentioning
confidence: 99%