2023
DOI: 10.3390/jimaging9070140
|View full text |Cite
|
Sign up to set email alerts
|

Conv-ViT: A Convolution and Vision Transformer-Based Hybrid Feature Extraction Method for Retinal Disease Detection

Abstract: The current advancement towards retinal disease detection mainly focused on distinct feature extraction using either a convolutional neural network (CNN) or a transformer-based end-to-end deep learning (DL) model. The individual end-to-end DL models are capable of only processing texture or shape-based information for performing detection tasks. However, extraction of only texture- or shape-based features does not provide the model robustness needed to classify different types of retinal diseases. Therefore, c… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 14 publications
(8 citation statements)
references
References 32 publications
(53 reference statements)
0
3
0
Order By: Relevance
“…However, it leads to an increase in the dimensionality of the feature vectors. Furthermore, it does not address the issue of high data requirements, a characteristic commonly associated with purely attention-based models such as ViTs [ 65 ]. Table 2 briefly reviews the studies that utilized hybrid and ensemble methods in 2023.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…However, it leads to an increase in the dimensionality of the feature vectors. Furthermore, it does not address the issue of high data requirements, a characteristic commonly associated with purely attention-based models such as ViTs [ 65 ]. Table 2 briefly reviews the studies that utilized hybrid and ensemble methods in 2023.…”
Section: Related Workmentioning
confidence: 99%
“…In numerous studies in the realm of medical image classification, different CNNs have served as classifiers or feature extractors because of their advantage in automatic generalizable feature extraction of OCT images, surpassing traditional image processing-based feature extractors. Some of the reviewed studies have incorporated attention blocks into their proposed hybrid architectures as global feature selectors to improve the performance of traditional CNNs [ 56 , 65 ]. Global attention blocks [ 56 ] generate attention maps by utilizing multiple low-cost max and average pooling layers, and do not include the standard high performance self-attention.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…Primarily, the Inceptionv3 model is used for the generation of feature vectors. The InceptionV3 is selected in the initial feature extraction model, due to its capability to extract higherlevel features with different variations of the 277 filters [23]. Furthermore, without compromising the model efficacy, the architecture of InceptionV3 presents the features of reduction dimension by applying two (3x3) layers rather than a single (5x5) convolutional layer because with a similar amount of filters 𝑎(5x5) convolutional layer is 25/9 = 2.78 times computational expensive than 𝑎(3x3) convolution layers.…”
Section: A Feature Extraction: Inceptionv3 Modelmentioning
confidence: 99%
“…Artificial neural networks (ANNs), which were initially developed in the 1950s, have had a checkered history, at times appreciated for their unique computational capabilities and at other times disparaged for being no better than statistical methods. Opinions shifted about a decade ago with deep neural networks, whose performance swiftly overshadowed that of other learners across various scientific (e.g., [1,2]), medical (e.g., [3,4]), and engineering domains (e.g., [5,6]). The prowess of deep learners is especially exemplified by the remarkable achievements of Convolutional Neural Networks (CNNs), one of the most renowned and robust deep-learning architectures.…”
Section: Introductionmentioning
confidence: 99%