ViT-P: Classification of Genitourinary Syndrome of Menopause From OCT Images Based on Vision Transformer Models

Wang, Haoran; Ji, Yanju; Song, Kaiwen; Sun, Mingyang; Lv, Peitong; Zhang, Tianyu

doi:10.1109/tim.2021.3122121

Cited by 22 publications

(14 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Without compromising the accuracy, half of the layers from the model are pruned to reduce parameters and complexity. Moreover, Wang et al [32] proposed vision transformer-plus (ViT-P) architecture which made a balance between category imbalances by applying deep convolutional generative adversarial networks (DCGAN). Then, channel attention correlated with different channels and obtains important features of each channel for the classification task.…”

Section: Related Workmentioning

confidence: 99%

“…Then, channel attention correlated with different channels and obtains important features of each channel for the classification task. The performance of the architectures used in works [31] and [32] is limited by the core two limitations of the ViT model. In summary, the existing transformer-based classification model suffers from the calculation of self-attention leads to computational complexity quadratic to the number of pixels and the requirement of an enormous dataset for superior classification results.…”

Section: Related Workmentioning

confidence: 99%

“…As the transformer process sequence of image patch tokens, a fixed-size input image is initially converted to non-overlapping patches of fixed size. The raw image, 𝐼 with dimension (H×W) ∈ ℝ 32 and the patch with resolution (P×P) ∈ ℝ 2 generates a total number of N ∈ (H×W)/P 2 ∈ ℝ 256 patches. Equation ( 1) presents the formulation of patch matrix, 𝐼 𝑃𝑎𝑡𝑐ℎ𝑖𝑛𝑔 from the raw image.…”

Section: A Image Processingmentioning

confidence: 99%

See 2 more Smart Citations

LCDEiT: A Linear Complexity Data-Efficient Image Transformer for MRI Brain Tumor Classification

et al. 2023

View full text Add to dashboard Cite

Current deep learning-assisted brain tumor classification models sustain inductive bias and parameter dependency problems for extracting texture-based image information. Thereby concerning these problems, the recent development of the vision transformer model has substituted the DL model for classification tasks. However, the high performance of the vision transformer model depends on a large-scale dataset as well as self-attention calculations between the number of image patches which result in a quadratic computational complexity. To address these problems, the vision transformer must be data-efficient to be well-trained with a limited amount of data, and the computational complexity must be linear with the number of image patches. Consequently, this paper presents a novel linear-complexity data-efficient image transformer called LCDEiT for training with small-size datasets by using a teacher-student strategy and linear computational complexity concerning the number of patches using an external attention mechanism. The teacher model comprised a custom gated-pooled convolutional neural network to provide knowledge to the transformer-based student model for the classification of MRI brain tumors. The average classification accuracy and F1-score for two benchmark datasets including Figshare and BraTS-21 are found 98.11% and 97.86% and 93.69% and 93.68% respectively. The results indicate that the proposed model could have a great impact on medical imaging-based diagnosis where data availability and faster computations are the main concern.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: A Image Processingmentioning

confidence: 99%

See 1 more Smart Citation

LCDEiT: A Linear Complexity Data-Efficient Image Transformer for MRI Brain Tumor Classification

et al. 2023

View full text Add to dashboard Cite

show abstract

“…The ViT model has been demonstrated to achieve comparable or better image classification results than traditional CNNs [23][24][25]. Specifically, ViT leverages embeddings from the transformer encoder for image classification.…”

Section: Vision Transformermentioning

confidence: 99%

Image-Based Vehicle Classification by Synergizing Features from Supervised and Self-Supervised Learning Paradigms

Yang

2023

Eng

View full text Add to dashboard Cite

This paper introduces a novel approach to leveraging features learned from both supervised and self-supervised paradigms, to improve image classification tasks, specifically for vehicle classification. Two state-of-the-art self-supervised learning methods, DINO and data2vec, were evaluated and compared for their representation learning of vehicle images. The former contrasts local and global views while the latter uses masked prediction on multiple layered representations. In the latter case, supervised learning is employed to finetune a pretrained YOLOR object detector for detecting vehicle wheels, from which definitive wheel positional features are retrieved. The representations learned from these self-supervised learning methods were combined with the wheel positional features for the vehicle classification task. Particularly, a random wheel masking strategy was utilized to finetune the previously learned representations in harmony with the wheel positional features during the training of the classifier. Our experiments show that the data2vec-distilled representations, which are consistent with our wheel masking strategy, outperformed the DINO counterpart, resulting in a celebrated Top-1 classification accuracy of 97.2% for classifying the 13 vehicle classes defined by the Federal Highway Administration.

show abstract

“…OCT. Wang et al [95] developed an architecture named ViT-P to classify OCT images using the GSM dataset and UCSD dataset [40]. The developed method is composed of a proposed slim model and several transformer encoders.…”

Section: Classificationmentioning

confidence: 99%

Recent Progress in Transformer-based Medical Image Analysis

Liu¹,

Shen²

2022

Preprint

View full text Add to dashboard Cite

The transformer has dominated the natural language processing (NLP) field for a long time. Recently, the transformer-based method has been adopted into the computer vision (CV) field and shows promising results. As an important branch of the CV field, medical image analysis joins the wave of the transformer-based method rightfully. In this review, we illustrate the principle of the attention mechanism, and the detailed structures of the transformer, and depict how the transformer is adopted into medical image analysis. We organize the transformer-based medical image analysis applications in a sequence of different tasks, including classification, segmentation, synthesis, registration, localization, detection, captioning, and denoising. For the mainstream classification and segmentation tasks, we further divided the corresponding works based on different medical imaging modalities. The datasets corresponding to the related works are also organized. We include thirteen modalities and more than twenty objects in our work.

show abstract

ViT-P: Classification of Genitourinary Syndrome of Menopause From OCT Images Based on Vision Transformer Models

Cited by 22 publications

References 29 publications

LCDEiT: A Linear Complexity Data-Efficient Image Transformer for MRI Brain Tumor Classification

LCDEiT: A Linear Complexity Data-Efficient Image Transformer for MRI Brain Tumor Classification

Image-Based Vehicle Classification by Synergizing Features from Supervised and Self-Supervised Learning Paradigms

Recent Progress in Transformer-based Medical Image Analysis

Contact Info

Product

Resources

About