2023
DOI: 10.3390/app13063707
|View full text |Cite
|
Sign up to set email alerts
|

CNN Attention Enhanced ViT Network for Occluded Person Re-Identification

Abstract: Person re-identification (ReID) is often affected by occlusion, which makes most of the features extracted by ReID models contain a lot of identity-independent noise. Recently, the use of Vision Transformer (ViT) has enabled significant progress in various visual artificial intelligence tasks. However, ViT suffers from insufficient local information extraction capability, which should be of concern to researchers in the field of occluded ReID. This paper conducts a study to exploit the potential of attention m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
6
1

Relationship

0
7

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 40 publications
0
4
0
Order By: Relevance
“…It balances high classification accuracy with minimal parameters and latency. Wang et al [44] developed RepViT, a pure CNN architecture incorporating lightweight ViT [45] principles to refine traditional lightweight CNNs, achieving low latency and high accuracy. EfficientViT by Liu et al [46] improves memory and computational efficiency with a sandwich layout and cascaded group attention while preserving accuracy.…”
Section: B Backbone Network For Object Detectionmentioning
confidence: 99%
“…It balances high classification accuracy with minimal parameters and latency. Wang et al [44] developed RepViT, a pure CNN architecture incorporating lightweight ViT [45] principles to refine traditional lightweight CNNs, achieving low latency and high accuracy. EfficientViT by Liu et al [46] improves memory and computational efficiency with a sandwich layout and cascaded group attention while preserving accuracy.…”
Section: B Backbone Network For Object Detectionmentioning
confidence: 99%
“…Meanwhile, MobileFormer devised an architecture that parallels MobileNet and converters via a two-way bridge [43]. Integrating these efficient architecture choices for lightweight ViTs, Wang et al [59] pro-gressively enhanced the mobility of standard lightweight CNNs, notably MobileNetV3, culminating in the development of RepViT. Despite adopting a MetaFormer structure [60], RepViT is solely composed of convolutional layers, showcasing superior performance and efficiency compared to most advanced lightweight ViTs across various computer vision tasks.…”
Section: Network Lightweightmentioning
confidence: 99%
“…In the automated defect detection method based on artificial intelligence technology, the ability of the computer to automatically learn and recognize various types of flaws is enabled by the use of deep learning technology and an enormous quantity of data, thus realizing efficient and accurate automated detection. In supervised learning methods [3], to train the model using supervised learning techniques, a lot of labeled data is needed, but the cost of such data acquisition and labeling is high, which is not conducive to practical application. Whereas, in self-supervised learning methods [4], only a large amount of unlabeled data needs to be utilized for learning, which reduces the labeling cost.…”
Section: Introductionmentioning
confidence: 99%