2023
DOI: 10.3390/electronics12112442
|View full text |Cite
|
Sign up to set email alerts
|

Shifted Window Vision Transformer for Blood Cell Classification

Abstract: Blood cells play an important role in the metabolism of the human body, and the status of blood cells can be used for clinical diagnoses, such as the ratio of different blood cells. Therefore, blood cell classification is a primary task, which requires much time for manual analysis. The recent advances in computer vision can be beneficial to free doctors from tedious tasks. In this paper, a novel automated blood cell classification model based on the shifted window vision transformer (SW-ViT) is proposed. The … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(2 citation statements)
references
References 28 publications
0
2
0
Order By: Relevance
“…However, when classifying and recognizing images, the initial status parameters for the CNN can have a great impact on the network training, and a poor choice can cause the network to not work or potentially fall into local minima, underfitting, and overfitting. In 2019 researchers started to try to apply transformers to the CV domain, and finally in 2021, those involved proved that transformers have better scalability than CNNs, can handle sequential types of inputs, and are significantly better than CNNs when training larger models on larger datasets [ 21 ]. Alexey et al, proposed the vision transformer (ViT) model by directly applying the transformer architecture to image classification tasks, representing the input image as a feature vector that can be used for subsequent tasks; ViT significantly improves the performance of traditional image classification tasks [ 22 ].…”
Section: Introductionmentioning
confidence: 99%
“…However, when classifying and recognizing images, the initial status parameters for the CNN can have a great impact on the network training, and a poor choice can cause the network to not work or potentially fall into local minima, underfitting, and overfitting. In 2019 researchers started to try to apply transformers to the CV domain, and finally in 2021, those involved proved that transformers have better scalability than CNNs, can handle sequential types of inputs, and are significantly better than CNNs when training larger models on larger datasets [ 21 ]. Alexey et al, proposed the vision transformer (ViT) model by directly applying the transformer architecture to image classification tasks, representing the input image as a feature vector that can be used for subsequent tasks; ViT significantly improves the performance of traditional image classification tasks [ 22 ].…”
Section: Introductionmentioning
confidence: 99%
“…Transformers have become the foundation for many advanced language models, such as BERT, ChatGPT [23], and T5, and have significantly advanced the capabilities of language understanding and generation systems. Vision transformers (ViTs) [24] are an adaptation of the classical transformer architecture that apply self-attention mechanisms to process image data [25], making them an exemplary powerful model for tasks in computer vision, showcasing the extension of transformers' effectiveness beyond NLP. Figure 1 shows the relationship between AI, ML, DL, and Transformers.…”
mentioning
confidence: 99%