2021
DOI: 10.1109/tim.2021.3122121
|View full text |Cite
|
Sign up to set email alerts
|

ViT-P: Classification of Genitourinary Syndrome of Menopause From OCT Images Based on Vision Transformer Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
14
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
1
1

Relationship

0
7

Authors

Journals

citations
Cited by 22 publications
(14 citation statements)
references
References 29 publications
0
14
0
Order By: Relevance
“…Without compromising the accuracy, half of the layers from the model are pruned to reduce parameters and complexity. Moreover, Wang et al [32] proposed vision transformer-plus (ViT-P) architecture which made a balance between category imbalances by applying deep convolutional generative adversarial networks (DCGAN). Then, channel attention correlated with different channels and obtains important features of each channel for the classification task.…”
Section: Related Workmentioning
confidence: 99%
See 2 more Smart Citations
“…Without compromising the accuracy, half of the layers from the model are pruned to reduce parameters and complexity. Moreover, Wang et al [32] proposed vision transformer-plus (ViT-P) architecture which made a balance between category imbalances by applying deep convolutional generative adversarial networks (DCGAN). Then, channel attention correlated with different channels and obtains important features of each channel for the classification task.…”
Section: Related Workmentioning
confidence: 99%
“…Then, channel attention correlated with different channels and obtains important features of each channel for the classification task. The performance of the architectures used in works [31] and [32] is limited by the core two limitations of the ViT model. In summary, the existing transformer-based classification model suffers from the calculation of self-attention leads to computational complexity quadratic to the number of pixels and the requirement of an enormous dataset for superior classification results.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…The ViT model has been demonstrated to achieve comparable or better image classification results than traditional CNNs [23][24][25]. Specifically, ViT leverages embeddings from the transformer encoder for image classification.…”
Section: Vision Transformermentioning
confidence: 99%
“…OCT. Wang et al [95] developed an architecture named ViT-P to classify OCT images using the GSM dataset and UCSD dataset [40]. The developed method is composed of a proposed slim model and several transformer encoders.…”
Section: Classificationmentioning
confidence: 99%