2020
DOI: 10.1007/s10994-020-05919-y
|View full text |Cite
|
Sign up to set email alerts
|

Boost image captioning with knowledge reasoning

Abstract: Automatically generating a human-like description for a given image is a potential research in artificial intelligence, which has attracted a great of attention recently. Most of the existing attention methods explore the mapping relationships between words in sentence and regions in image, such unpredictable matching manner sometimes causes inharmonious alignments that may reduce the quality of generated captions. In this paper, we make our efforts to reason about more accurate and meaningful captions. We fir… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 26 publications
(7 citation statements)
references
References 39 publications
0
7
0
Order By: Relevance
“…The caption will read as though she might be looking for a roadside direction sign board to take down an address or waiting for a bus with her bags by combining external knowledge (Knowledge Graph). 193 To get the caption of an image by Knowledge Graph, we must go with the following steps:…”
Section: Knowledge Graph-based Methods For Image Captioningmentioning
confidence: 99%
“…The caption will read as though she might be looking for a roadside direction sign board to take down an address or waiting for a bus with her bags by combining external knowledge (Knowledge Graph). 193 To get the caption of an image by Knowledge Graph, we must go with the following steps:…”
Section: Knowledge Graph-based Methods For Image Captioningmentioning
confidence: 99%
“…Importantly, the anchor is separate from the image itself and can be non-visual. In previous research, the connection to external knowledge was often established through object detection or image classification (Mogadala et al, 2018;Zhou et al, 2019;Huang et al, 2020;Bai et al, 2021), leaving unexplored the potential benefits of utilizing the associated nonvisual data. For example, certain elements of image metadata, such as the coordinates of its location or the date and time of its creation, can be used as an anchor, since they provide information about the circumstances in which the image originated and thus can help identify relevant entities and events.…”
Section: Identification Of Relevant Knowledgementioning
confidence: 99%
“…Integrating external encyclopedic data into image captioning has not been the focus of much prior research, although the few existing works (Mogadala et al, 2018;Zhou et al, 2019;Huang et al, 2020;Bai et al, 2021) show its potential for improving informativeness and overall quality of the generated captions.…”
Section: Enhancing Caption Generation With Encyclopedic Datamentioning
confidence: 99%
See 2 more Smart Citations