2022
DOI: 10.3390/electronics11172747
|View full text |Cite
|
Sign up to set email alerts
|

Transformer-Based Disease Identification for Small-Scale Imbalanced Capsule Endoscopy Dataset

Abstract: Vision Transformer (ViT) is emerging as a new leader in computer vision with its outstanding performance in many tasks (e.g., ImageNet-22k, JFT-300M). However, the success of ViT relies on pretraining on large datasets. It is difficult for us to use ViT to train from scratch on a small-scale imbalanced capsule endoscopic image dataset. This paper adopts a Transformer neural network with a spatial pooling configuration. Transfomer’s self-attention mechanism enables it to capture long-range information effective… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
3
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
8

Relationship

1
7

Authors

Journals

citations
Cited by 21 publications
(6 citation statements)
references
References 56 publications
0
3
0
Order By: Relevance
“…The transformer is currently a state-of-the-art model for computer vision and NLP tasks. As a result, recent works [13,15] have introduced the application of transformers for processing and analyzing WCE images. These studies have demonstrated the effectiveness of utilizing the transformer architecture in achieving high performance when applied to WCE images.…”
Section: Discussionmentioning
confidence: 99%
See 1 more Smart Citation
“…The transformer is currently a state-of-the-art model for computer vision and NLP tasks. As a result, recent works [13,15] have introduced the application of transformers for processing and analyzing WCE images. These studies have demonstrated the effectiveness of utilizing the transformer architecture in achieving high performance when applied to WCE images.…”
Section: Discussionmentioning
confidence: 99%
“…Furthermore, the vision Transformer (ViT), a model that modified the original transformer for computer vision, has also performed well in image classification [12]. Because ViT performs well in computer vision tasks, some studies have employed the transformer architecture to analyze WCE images [13][14][15].…”
Section: Introductionmentioning
confidence: 99%
“…The HiFuse Tiny, HiFuse Small, and HiFuse Base models attained accuracy rates of 84.85%, 85.00%, and 84.35%, respectively. Bai et al [ 35 ] improved a ViT-based architecture for the classification of wireless capsule endoscopy images. They obtained 79.15% accuracy with the Kvasir-Capsule dataset utilized to evaluate the performance of the ViT-based architecture.…”
Section: Related Workmentioning
confidence: 99%
“…In this circumstance, other compensating sensors like GSR should activate to detect the stress and pain condition for pain report. Machine learning techniques ( Bai et al, 2021 ; Bai et al, 2022 ) may also be employed to help the system learn the pain feature of particular patients, enabling more real-time feedback when the pain features appear in patients’ daily activities.…”
Section: Limitation and Future Workmentioning
confidence: 99%