2021
DOI: 10.48550/arxiv.2107.13114
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Thorough Review on Recent Deep Learning Methodologies for Image Captioning

Ahmed Elhagry,
Karima Kadaoui

Abstract: Image Captioning is a task that combines computer vision and natural language processing, where it aims to generate descriptive legends for images. It is a two-fold process relying on accurate image understanding and correct language understanding both syntactically and semantically. It is becoming increasingly difficult to keep up with the latest research and findings in the field of image captioning due to the growing amount of knowledge available on the topic. There is not, however, enough coverage of those… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
2
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
1

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 22 publications
0
2
0
Order By: Relevance
“…In particular, Hossain et al [4] demonstrate how well the model can suddenly adjust its focus to the essential item when creating the corresponding words. They develop two mechanisms-a "hard" deterministic attention mechanism and a "soft" deterministic attention mechanism-and train them using conventional back-propagation techniques while maximizing an approximation of the variational lower bound or something analogous [5]. This approach also has the benefit of roughly depicting what it "sees" to glean insights.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…In particular, Hossain et al [4] demonstrate how well the model can suddenly adjust its focus to the essential item when creating the corresponding words. They develop two mechanisms-a "hard" deterministic attention mechanism and a "soft" deterministic attention mechanism-and train them using conventional back-propagation techniques while maximizing an approximation of the variational lower bound or something analogous [5]. This approach also has the benefit of roughly depicting what it "sees" to glean insights.…”
Section: Methodsmentioning
confidence: 99%
“…This, however, is insufficient since the brain quickly converts a vast amount of visual data into descriptive language. A model is proposed concerning a propose a predictive model utilizing a deep reoccurring architecture after being impressed by the most recent advancement in translation software that recursive human brains (RNN) can complete the translation that typically requires a series of subtasks, and even in a more accurate and much simpler way [5]. Instead of using the decoder RNN, which is initially learned for a classification job, the deep neural network (CNN) is employed.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…With the enormous surge of social media usage and the continued rapid increase in the visual data generated by users from all around the world, the task of image caption generation is gaining more and more attention and attracting a considerable amount of research efforts from Natural Language Processing (NLP) community and computer vision community [1]. Image captioning; known to be the generation of meaningful captions for images, is a challenging task for machine learning-based models, as they are required to combine both visual and linguistic understanding that includes steps of reasoning to generate high quality captions given an image [2]. However, despite the challenging nature of the image captioning task, recent research endeavors were capable of achieving considerable advances and achievements on this task.…”
Section: Introductionmentioning
confidence: 99%
“…It is noticed that they prefer to focus on specific aspects of this emerging vision to language tasks, such as the technical framework, evaluation indicators, training strategies, or publicly available datasets. However, the existing studies on the review of image captioning have been considered slightly out of vogue or fail to provide a comprehensive overview of the current research, including technologies, benchmark datasets, and evaluation metrics [3,4,[120][121][122]. There is still a lack of literature that comprehensively reviews the research status, innovative technologies, and development prospects.…”
Section: Introductionmentioning
confidence: 99%