2022
DOI: 10.1155/2022/9638438
|View full text |Cite|
|
Sign up to set email alerts
|

Medical Image Captioning Using Optimized Deep Learning Model

Abstract: Medical image captioning provides the visual information of medical images in the form of natural language. It requires an efficient approach to understand and evaluate the similarity between visual and textual elements and to generate a sequence of output words. A novel show, attend, and tell model (ATM) is implemented, which considers a visual attention approach using an encoder-decoder model. But the show, attend, and tell model is sensitive to its initial parameters. Therefore, a Strength Pareto Evolutiona… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
1
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
2

Relationship

0
10

Authors

Journals

citations
Cited by 15 publications
(5 citation statements)
references
References 49 publications
(47 reference statements)
0
1
0
Order By: Relevance
“…A revolutionary show attends and tells (ATM) paradigm has been created by Singh et al (2022) and put into practice. There has been developed an encoder-decoder structure-based visual attention method.…”
Section: Related Workmentioning
confidence: 99%
“…A revolutionary show attends and tells (ATM) paradigm has been created by Singh et al (2022) and put into practice. There has been developed an encoder-decoder structure-based visual attention method.…”
Section: Related Workmentioning
confidence: 99%
“…In this paper [8] author introduce visual attention mechanism based on the encoder-decoder structure, a novel show, attend, and tell model has been designed and implemented. In this paper, SPEA-II has been used to tune the initial attributes of an SPEA-II-based.…”
Section: Literature Surveymentioning
confidence: 99%
“…Fig. 1 demonstrates a typical deeplearning-based approach for image-to-text description [5].Concerning the encoder-decoder image captioning approaches, Convolutional Neural Networks (CNNs) have been exploited as encoders for visual feature extraction from the images, and Recurrent Neural Networks (RNNs), "especially LSTM (Long Short-Term Memory) networks" have been exploited as decoders for transforming the obtained features into various natural languages [6,7]. However, encoder-decoder-based approaches are not capable of analyzing the images over time and considering the spatial prospects of images that are pertinent to the image description (alternatively, creating descriptions for the entire scene).…”
Section: Introductionmentioning
confidence: 99%