2020
DOI: 10.1109/tmm.2019.2931815
|View full text |Cite
|
Sign up to set email alerts
|

Recall What You See Continually Using GridLSTM in Image Captioning

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 35 publications
(6 citation statements)
references
References 37 publications
0
6
0
Order By: Relevance
“…Human beings have the potential to extract visual information from images [1][2][3]. e main objective is to utilize this human ability to generate meaning-full textual information from digital images to design automatic medical image captioning [4,5].…”
Section: Introductionmentioning
confidence: 99%
“…Human beings have the potential to extract visual information from images [1][2][3]. e main objective is to utilize this human ability to generate meaning-full textual information from digital images to design automatic medical image captioning [4,5].…”
Section: Introductionmentioning
confidence: 99%
“…In the past few years, fully-supervised image captioning has been studied extensively [6]. The majority of the proposed models adopt the encoder-decoder paradigm where one Convolutional Neural Network (CNN) is leveraged to encode an input image firstly and one Recurrent Neural Network (RNN) is utilized to output a description for the image subsequently [1], [31], [33]. These models are trained to maximize the probability of generating the ground-truth captions, depending on enormous image-caption pairs.…”
Section: A Image Captioningmentioning
confidence: 99%
“…Similar to the standard image captioning models, the UIC model adopts the encoder-decoder framework [31] to encode an image into features and decode these features into captions. Any CNN backbone networks can be used for the feature extraction.…”
Section: E Unpaired Image Captioningmentioning
confidence: 99%
See 1 more Smart Citation
“…Image Captioning. Classical image captioning implements the encoder-decoder architecture, which first encodes images into features and decodes these image features into sentences [46,51,53] later. The goal of these models is to maximize the probability of generating the correct captions, relying on tremendous image-caption pairs [24].…”
Section: Related Workmentioning
confidence: 99%