2023
DOI: 10.1049/ipr2.12929
|View full text |Cite
|
Sign up to set email alerts
|

SE‐Swin: An improved Swin‐Transfomer network of self‐ensemble feature extraction framework for image retrieval

Yixuan Xu,
Xianbing Wang,
Hua Zhang
et al.

Abstract: The Swin‐Transformer is a variant of the Vision Transformer, which constructs a hierarchical Transformer that computes representations with shifted windows and window multi‐head self‐attention. This method can handle the scale invariance problem and performs well in many computer vision tasks. In image retrieval, high‐quality feature descriptors are necessary to improve retrieval accuracy. This paper proposes a self‐ensemble Swin‐Transformer network structure to fuse the features of different layers of the Swi… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
2
2

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(2 citation statements)
references
References 29 publications
(32 reference statements)
0
0
0
Order By: Relevance
“…Swin Transformer [19] has gained popularity in various fields due to its exceptional image processing capabilities. For example, Üzen et al utilized Swin Transformer to detect surface defects at the pixel level [20] . Xu et al proposed a self-integrated Swin Transformer network structure, which combines the features of different layers of the Swin Transformer network and removes noisy points present in a single layer, thereby enhancing the retrieval performance [21] .…”
Section: Introductionmentioning
confidence: 99%
“…Swin Transformer [19] has gained popularity in various fields due to its exceptional image processing capabilities. For example, Üzen et al utilized Swin Transformer to detect surface defects at the pixel level [20] . Xu et al proposed a self-integrated Swin Transformer network structure, which combines the features of different layers of the Swin Transformer network and removes noisy points present in a single layer, thereby enhancing the retrieval performance [21] .…”
Section: Introductionmentioning
confidence: 99%
“…Dosovitskiy et al applied the Transformer neural network to image classification, proposing the ViT (Vision Transformer) neural network, which broke the monopoly of convolutional neural networks in classification tasks [27]. Swin Transformer, based on ViT, optimized attention mechanisms and sampling effects, and learned the hierarchical structure of convolutional neural networks, significantly improving training speed and accuracy [28]. Researchers have applied it to the identification of forest fires, achieving rapid and real-time monitoring of forest fires [29].…”
Section: Introductionmentioning
confidence: 99%