Conv-ViT: A Convolution and Vision Transformer-Based Hybrid Feature Extraction Method for Retinal Disease Detection

Dutta, Pramit; Sathi, Khaleda Akhter; Hossain, Md. Azad; Dewan, M. Ali Akber

doi:10.3390/jimaging9070140

Cited by 14 publications

(8 citation statements)

References 32 publications

(53 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, it leads to an increase in the dimensionality of the feature vectors. Furthermore, it does not address the issue of high data requirements, a characteristic commonly associated with purely attention-based models such as ViTs [ 65 ]. Table 2 briefly reviews the studies that utilized hybrid and ensemble methods in 2023.…”

Section: Related Workmentioning

confidence: 99%

“…In numerous studies in the realm of medical image classification, different CNNs have served as classifiers or feature extractors because of their advantage in automatic generalizable feature extraction of OCT images, surpassing traditional image processing-based feature extractors. Some of the reviewed studies have incorporated attention blocks into their proposed hybrid architectures as global feature selectors to improve the performance of traditional CNNs [ 56 , 65 ]. Global attention blocks [ 56 ] generate attention maps by utilizing multiple low-cost max and average pooling layers, and do not include the standard high performance self-attention.…”

Section: Related Workmentioning

confidence: 99%

“…Global attention blocks [ 56 ] generate attention maps by utilizing multiple low-cost max and average pooling layers, and do not include the standard high performance self-attention. The CNN-transformer [ 65 ] architecture is an ensemble of the ViT, ResNet50, and Inception-v3 models and does not address the high-data requirement of ViT, particularly in non-transfer learning scenarios. MedViT is the backbone model of the proposed stitching method in this study.…”

Section: Related Workmentioning

confidence: 99%

See 2 more Smart Citations

Stitched vision transformer for age-related macular degeneration detection using retinal optical coherence tomography images

Azizi,

Abhari,

Sajedi

2024

PLoS ONE

View full text Add to dashboard Cite

Age-related macular degeneration (AMD) is an eye disease that leads to the deterioration of the central vision area of the eye and can gradually result in vision loss in elderly individuals. Early identification of this disease can significantly impact patient treatment outcomes. Furthermore, given the increasing elderly population globally, the importance of automated methods for rapidly monitoring at-risk individuals and accurately diagnosing AMD is growing daily. One standard method for diagnosing AMD is using optical coherence tomography (OCT) images as a non-invasive imaging technology. In recent years, numerous deep neural networks have been proposed for the classification of OCT images. Utilizing pre-trained neural networks can speed up model deployment in related tasks without compromising accuracy. However, most previous methods overlook the feasibility of leveraging pre-existing trained networks to search for an optimal architecture for AMD staging on a new target dataset. In this study, our objective was to achieve an optimal architecture in the efficiency-accuracy trade-off for classifying retinal OCT images. To this end, we employed pre-trained medical vision transformer (MedViT) models. MedViT combines convolutional and transformer neural networks, explicitly designed for medical image classification. Our approach involved pre-training two distinct MedViT models on a source dataset with labels identical to those in the target dataset. This pre-training was conducted in a supervised manner. Subsequently, we evaluated the performance of the pre-trained MedViT models for classifying retinal OCT images from the target Noor Eye Hospital (NEH) dataset into the normal, drusen, and choroidal neovascularization (CNV) classes in zero-shot settings and through five-fold cross-validation. Then, we proposed a stitching approach to search for an optimal model from two MedViT family models. The proposed stitching method is an efficient architecture search algorithm known as stitchable neural networks. Stitchable neural networks create a candidate model in search space for each pair of stitchable layers by inserting a linear layer between them. A pair of stitchable layers consists of layers, each selected from one input model. While stitchable neural networks had previously been tested on more extensive and general datasets, this study demonstrated that stitching networks could also be helpful in smaller medical datasets. The results of this approach indicate that when pre-trained models were available for OCT images from another dataset, it was possible to achieve a model in 100 epochs with an accuracy of over 94.9% in classifying images from the NEH dataset. The results of this study demonstrate the efficacy of stitchable neural networks as a fine-tuning method for OCT image classification. This approach not only leads to higher accuracy but also considers architecture optimization at a reasonable computational cost.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Stitched vision transformer for age-related macular degeneration detection using retinal optical coherence tomography images

Azizi,

Abhari,

Sajedi

2024

PLoS ONE

View full text Add to dashboard Cite

show abstract

“…Primarily, the Inceptionv3 model is used for the generation of feature vectors. The InceptionV3 is selected in the initial feature extraction model, due to its capability to extract higherlevel features with different variations of the 277 filters [23]. Furthermore, without compromising the model efficacy, the architecture of InceptionV3 presents the features of reduction dimension by applying two (3x3) layers rather than a single (5x5) convolutional layer because with a similar amount of filters 𝑎(5x5) convolutional layer is 25/9 = 2.78 times computational expensive than 𝑎(3x3) convolution layers.…”

Section: A Feature Extraction: Inceptionv3 Modelmentioning

confidence: 99%

Laryngeal Cancer Detection and Classification Using Aquila Optimization Algorithm With Deep Learning on Throat Region Images

Alrowais,

Mahmood,

Alotaibi

et al. 2023

IEEE Access

View full text Add to dashboard Cite

Laryngeal cancer detection on throat area images is a vital application of medical image diagnosis and computer vision (CV) in the healthcare domain. It contains the analysis and detection of cancerous or abnormal tissues from the larynx, an essential part of the respiratory and vocal systems. Several machine learning (ML) and deep learning (DL) systems are executed for classifying the extraction features as both cancerous and healthy tissue. Convolutional Neural Networks (CNNs) and recurrent neural networks (RNNs) have shown promise in this context. With this motivation, this study designs a new Laryngeal Cancer Detection and Classification using the Aquila Optimization Algorithm with Deep Learning (LCDC-AOADL) technique on neck region images. The purpose of the LCDC-AOADL technique is to examine the histopathological images for the recognition and classification of Laryngeal Cancer. In the presented LCDC-AOADL technique, the Inceptionv3 model is used for the feature extraction process. Besides, the LCDC-AOADL technique employed a deep belief network (DBN) model for the identification and classification of LC. Moreover, the AOA is utilized for the hyperparameter tuning of the DBN model which results in improved detection rate. The simulation analysis of the LCDC-AOADL method is validated on the benchmark Laryngeal dataset. The experimental results pointed out the enhanced detection results of the LCDC-AOADL technique over other recent approaches with a maximum accuracy of 96.02%, precision of 92.10%, recall of 91.87%, and F-score of 91.86%.

show abstract

“…Artificial neural networks (ANNs), which were initially developed in the 1950s, have had a checkered history, at times appreciated for their unique computational capabilities and at other times disparaged for being no better than statistical methods. Opinions shifted about a decade ago with deep neural networks, whose performance swiftly overshadowed that of other learners across various scientific (e.g., [1,2]), medical (e.g., [3,4]), and engineering domains (e.g., [5,6]). The prowess of deep learners is especially exemplified by the remarkable achievements of Convolutional Neural Networks (CNNs), one of the most renowned and robust deep-learning architectures.…”

Section: Introductionmentioning

confidence: 99%

Comparison of Different Methods for Building Ensembles of Convolutional Neural Networks

Nanni,

Loreggia,

Brahnam

2023

Electronics

View full text Add to dashboard Cite

In computer vision and image analysis, Convolutional Neural Networks (CNNs) and other deep-learning models are at the forefront of research and development. These advanced models have proven to be highly effective in tasks related to computer vision. One technique that has gained prominence in recent years is the construction of ensembles using deep CNNs. These ensembles typically involve combining multiple pretrained CNNs to create a more powerful and robust network. The purpose of this study is to evaluate the effectiveness of building CNN ensembles by combining several advanced techniques. Tested here are CNN ensembles constructed by replacing ReLU layers with different activation functions, employing various data-augmentation techniques, and utilizing several algorithms, including some novel ones, that perturb network weights. Experimental results performed across many datasets representing different tasks demonstrate that our proposed methods for building deep ensembles produces superior results.

show abstract

Conv-ViT: A Convolution and Vision Transformer-Based Hybrid Feature Extraction Method for Retinal Disease Detection

Cited by 14 publications

References 32 publications

Stitched vision transformer for age-related macular degeneration detection using retinal optical coherence tomography images

Stitched vision transformer for age-related macular degeneration detection using retinal optical coherence tomography images

Laryngeal Cancer Detection and Classification Using Aquila Optimization Algorithm With Deep Learning on Throat Region Images

Comparison of Different Methods for Building Ensembles of Convolutional Neural Networks

Contact Info

Product

Resources

About