Self-Supervised Pre-Training for Transformer-Based Person Re-Identification

Luo, Hao; Wang, Pichao; Xu, Yi; Ding, Fang; Zhou, Yanxin; Wang, Fan; Li, Hao; Jin, Rong

doi:10.48550/arxiv.2111.12084

Cited by 4 publications

(4 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…TransReID (He et al 2021) is the first pure transformerbased method on re-ID, it proposes a jigsaw patches module (JPM) which shuffles patch embeddings and re-groups them for further feature learning to extract several local features and aggregates them to get robust feature with global context. TransReID-SSL (Luo et al 2021) uses a massive person re-ID dataset LU P erson (Fu et al 2021) to train a stronger pre-trained model by DINO (Caron et al 2021).…”

Section: Representation Learning In Re-idmentioning

confidence: 99%

DC-Former: Diverse and Compact Transformer for Person Re-Identification

Li¹,

Zhang²,

Wang³

et al. 2023

Preprint

View full text Add to dashboard Cite

In person re-identification (re-ID) task, it is still challenging to learn discriminative representation by deep learning, due to limited data. Generally speaking, the model will get better performance when increasing the amount of data. The addition of similar classes strengthens the ability of the classifier to identify similar identities, thereby improving the discrimination of representation. In this paper, we propose a Diverse and Compact Transformer (DC-Former) that can achieve a similar effect by splitting embedding space into multiple diverse and compact subspaces. Compact embedding subspace helps model learn more robust and discriminative embedding to identify similar classes. And the fusion of these diverse embeddings containing more fine-grained information can further improve the effect of re-ID. Specifically, multiple class tokens are used in vision transformer to represent multiple embedding spaces. Then, a self-diverse constraint (SDC) is applied to these spaces to push them away from each other, which makes each embedding space diverse and compact. Further, a dynamic weight controller (DWC) is further designed for balancing the relative importance among them during training. The experimental results of our method are promising, which surpass previous state-of-the-art methods on several commonly used person re-ID benchmarks. https://github.com/ant-research/Diverseand-Compact-Transformer

show abstract

Section: Representation Learning In Re-idmentioning

confidence: 99%

DC-Former: Diverse and Compact Transformer for Person Re-Identification

Li¹,

Zhang²,

Wang³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Based on ViT [41], [39] applies pure Transformer to supervised ReID for the first time, which introduces side information to improve the robustness of features. [47] further proposed self-supervised pre-training for Transformer-based person ReID, which mitigates the gap between the pre-training and ReID datasets from the perspective of data and model structure.…”

Section: Transformer-related Person Reidmentioning

confidence: 99%

“…Vision Transformer usually yields better generalization ability than common CNN networks under distribution shift [49]. However, existing pure transformerbased ReID models are only used in supervised and pretrained ReID [47,39]. The generalization of Transformer is still unknown in DG ReID.…”

Section: Introductionmentioning

confidence: 99%

Meta Distribution Alignment for Generalizable Person Re-Identification

Song

Luo

et al. 2022

2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

View full text Add to dashboard Cite

Domain generalization person re-identification (DG-ReID) aims to train a model on source domains and generalize well on unseen domains. Vision Transformer usually yields better generalization ability than common CNN networks under distribution shifts. However, Transformerbased ReID models inevitably over-fit to domain-specific biases due to the supervised learning strategy on the source domain. We observe that while the global images of different IDs should have different features, their similar local parts (e.g., black backpack) are not bounded by this constraint. Motivated by this, we propose a pure Transformer model (termed Part-aware Transformer) for DG-ReID by designing a proxy task, named Cross-ID Similarity Learning (CSL), to mine local visual information shared by different IDs. This proxy task allows the model to learn generic features because it only cares about the visual similarity of the parts regardless of the ID labels, thus alleviating the side effect of domain-specific biases. Based on the local similarity obtained in CSL, a Part-guided Self-Distillation (PSD) is proposed to further improve the generalization of global features. Our method achieves stateof-the-art performance under most DG ReID settings. Under the Market→Duke setting, our method exceeds stateof-the-art by 10.9% and 12.8% in Rank1 and mAP, respectively. The code is available at https://github.com/ liyuke65535/Part-Aware-Transformer.

show abstract

“…We utilize non-parametric InfoNCE loss within batches. We use ViT with ICS in TransReID-SSL [22] as our backbone. Based on self-attention in Transformer, a features fusion method is proposed to generate the refined feature for an original query feature.…”

Section: R5 -Migmentioning

confidence: 99%