2023
DOI: 10.3390/e25040553
|View full text |Cite
|
Sign up to set email alerts
|

Supervised Deep Learning Techniques for Image Description: A Systematic Review

Abstract: Automatic image description, also known as image captioning, aims to describe the elements included in an image and their relationships. This task involves two research fields: computer vision and natural language processing; thus, it has received much attention in computer science. In this review paper, we follow the Kitchenham review methodology to present the most relevant approaches to image description methodologies based on deep learning. We focused on works using convolutional neural networks (CNN) to e… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 78 publications
0
3
0
Order By: Relevance
“…Nevertheless, it should be acknowledged that other supervised machine learning methods not only exist but may also surpass decision treebased approaches in certain situations. As mentioned earlier, probably a prime example would be the versatile family artificial neural networks encompassing multilayer perceptrons and convolutional neural networks which are especially adept in computer vision applications [36][37][38] . The second limitation of our study concerns the relatively limited size of the sample used for training and testing the random forest and gradient boosting models.…”
Section: Discussionmentioning
confidence: 99%
“…Nevertheless, it should be acknowledged that other supervised machine learning methods not only exist but may also surpass decision treebased approaches in certain situations. As mentioned earlier, probably a prime example would be the versatile family artificial neural networks encompassing multilayer perceptrons and convolutional neural networks which are especially adept in computer vision applications [36][37][38] . The second limitation of our study concerns the relatively limited size of the sample used for training and testing the random forest and gradient boosting models.…”
Section: Discussionmentioning
confidence: 99%
“…Inspired by the U-Net [64], AutoST-Net combines encoder-decoder [65] architecture with an attention mechanism. The detailed architecture is shown in Figure 3.…”
Section: Autost-netmentioning
confidence: 99%
“…Image captioning [12] is a computer-vision task that generates natural language descriptions for images. Deep-learning techniques, including encoder-decoder architectures and attention mechanisms, have been employed for this purpose.…”
Section: Overview Of Computer Vision and Image-processing Techniques ...mentioning
confidence: 99%