Rethinking Query, Key, and Value Embedding in Vision Transformer under Tiny Model Constraints

Ahn, Jaesin; Hong, Jiuk; Ju, Jeongwoo; Jung, Heechul

doi:10.48550/arxiv.2111.10017

Cited by 2 publications

(2 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This technique is very effective in terms of computational complexity and latency in the real world. [21]. Figure 2 illustrates the architecture in its smallest configuration.…”

Section: Architecturementioning

confidence: 99%

Diabetic Retinopathy Classification Using Swin Transformer with Multi Wavelet

Dihin,

AlShemmary,

Al-Jawher

2023

Jour. Kufa Math. Comp.

View full text Add to dashboard Cite

Diabetic retinopathy (DR) impacts over a third of individuals diagnosed with diabetes and stands as the leading cause of vision loss in working-age adults worldwide. Therefore, the early detection and treatment of DR can play a crucial role in minimizing vision loss. This research paper proposes a novel technique that combines Wavelet and multi-Wavelet transforms with Swin Transformer to automatically identify the progression level of diabetic retinopathy. A notable innovation of this study lies in the implementation of the multi-Wavelet transform for extracting relevant features. By incorporating the resulting images into the Swin Transformer model, a unique approach is introduced during the feature extraction phase. The researchers conducted experiments using the publicly available Kaggle APTOS 2019 dataset, which comprises 3662 images. The achieved training accuracy in the experiments was an impressive 97.78%, with a test accuracy of 97.54%. The highest accuracy observed during training reached 98.09%. In comparison, when applying the multi-Wavelet approach to multiclass classification, the training and validation accuracies were 91.60% and 82.42%, respectively, with a testing accuracy of 82%. These results indicate that the multi-Wavelet approach outperforms alternative methods in the study. The model demonstrated exceptional performance in binary classification tasks, exhibiting high accuracies on both the training and test sets. However, it is important to note that the model's accuracy decreased when employed in multiclass classification, emphasizing the need for further investigation and refinement to handle more diverse classification scenarios.

show abstract

“…This technique is very effective in terms of computational complexity and latency in the real world. [21]. Figure 2 illustrates the architecture in its smallest configuration.…”

Section: Architecturementioning

confidence: 99%

Diabetic Retinopathy Classification Using Swin Transformer with Multi Wavelet

Dihin,

AlShemmary,

Al-Jawher

2023

Jour. Kufa Math. Comp.

View full text Add to dashboard Cite

show abstract

“…This dataset was collected using the same methods as CIFARn-10o. CIFAR-100 classesvv are mutually exclusive of CIFAR-10 classes, CIFAR-10 and CIFAR-100 are subsets of the 808 million annotated tiny image datasets [21]. CIFAR-10 and CIFAR-100n bdatasets consist of 502,000 training and 10j,000 testy images of 321×327 resolution with a total number of classes 10v and 100u, respectively [22], [21].…”

Section: Dataset Application Of Swin Transformermentioning

confidence: 99%

Implementation Of The Swin Transformer and Its Application In Image Classification

A. Dihin,

Al Shemmary,

Al-Jawher

2023

J.port.sci.res.

View full text Add to dashboard Cite

There are big differences between the field of view of the calculator and the field of natural languages, for example, in the field of vision, the difference is in the size of the object as well as in the accuracy of the pixels in the image, and this contradicts the words in the text, and this makes the adaptation of the transformers to see somewhat difficult.Very recently a vision transformer named Swin Transformer was introduced by the Microsoft research team in Asia to achieve state-of-the-art results for machine translation. The computational complexity is linear and proportional to the size of the input image, because the processing of subjective attention is within each local window separately, and thus results in processor maps that are hierarchical and in deeper layers, and thus serve as the backbone of the calculator's vision in image classification and dense recognition applications. This work focuses on applying the Swin transformer to a demonstrated mathematical example with step-by-step analysis. Additionally, extensive experimental results were carried out on several standardized databases from CIFAR-10, CIFAR-100, and MNIST. Their results showed that the Swin Transformer can achieve flexible memory savings. Test accuracy for CIFAR-10 gave a 71.54% score, while for the CIFAR-100 dataset the accuracy was 46.1%. Similarly, when the Swin transformer was applied to the MNIST dataset, the accuracy increased in comparison with other vision transformer results.

show abstract

Rethinking Query, Key, and Value Embedding in Vision Transformer under Tiny Model Constraints

Cited by 2 publications

References 27 publications

Diabetic Retinopathy Classification Using Swin Transformer with Multi Wavelet

Diabetic Retinopathy Classification Using Swin Transformer with Multi Wavelet

Implementation Of The Swin Transformer and Its Application In Image Classification

Contact Info

Product

Resources

About