An Efficient Axial-Attention Network for Video-Based Person Re-Identification

Zhang, Fuping; Zhang, Tianzhao; Sun, Ruoxi; Huang, Chao; Wei, Jianming

doi:10.1109/lsp.2022.3178673

Cited by 3 publications

(5 citation statements)

References 27 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We implement DCCAL based on our previous EAAN work [49], and choose the ResNet50-3D-EAAN model trained on DukeV or MARS as the backbone network. T=6 frames are selected with the RRS strategy to represent a video sequence, and the images are resized to 256х128 and augmented by random erasing.…”

Section: B Implementation Detailsmentioning

confidence: 99%

“…The experimental results are shown in Table I. We choose the ResNet50-3D pre-trained on the MARS dataset and the DukeV dataset in EAAN [49], respectively, as the backbone networks. eps is valued in four ways, respectively taking the mean value of the distance matrix D (𝑒𝑝𝑠 𝐷 ̅ ), about half of 𝑒𝑝𝑠 𝐷 ̅ (𝑒𝑝𝑠 𝐷 ̅ 2 ⁄ ), the mean value of the best eps of each round of dynamic clustering in the whole training process (𝑒𝑝𝑠 𝑏𝑒𝑠𝑡 ̅̅̅̅̅̅ ), and the best eps of each round of dynamic clustering (𝑒𝑝𝑠 𝑏𝑒𝑠𝑡 ).…”

Section: ) Analysis Of Dynamic Clusteringmentioning

confidence: 99%

“…KL divergence is often used to measure the distance between two distributions, and KL divergence loss can be utilized to gradually narrow the gap during continuous training and model optimization. We take advantage of this characteristic As shown in Table II, we select ResNet50-3D and ResNet50-3D-EAAN [49] as the backbone network, respectively. When MARS→DukeV, the KL loss improves the Rank-1 and mAP of the model with the ResNet50-3D backbone network by 3.4% and 2.7%, in sequence.…”

Section: ) Analysis Of Kl Divergence Lossmentioning

confidence: 99%

“…We choose ResNet50-3D-EAAN [49] as the backbone network of our DCCAL model and study its performance as shown in Table IV. The combination of the KL loss and the CAL module can improve the DCCAL model by 7.2%/10.0% and 3.9%/6.3% in Rank-1/mAP when MARS→DukeV and in reverse, respectively.…”

Section: ) Analysis Of Dccal Modelmentioning

confidence: 99%

See 3 more Smart Citations

Unsupervised Domain Adaptation Via Dynamic Clustering and Co-Segment Attentive Learning for Video-Based Person Re-Identification

Zhang,

Chen,

et al. 2024

IEEE Access

Self Cite

View full text Add to dashboard Cite

Currently, supervised person re-identification (Re-ID) models trained on labeled datasets can achieve high recognition performance in the same data domain. However, accuracy drops dramatically when these models are directly applied to other unlabeled datasets or natural environments, due to a significant sample distribution gap between the two domains. Unsupervised Domain Adaptation (UDA) methods can solve this problem by fine-tuning the model on the target dataset with pseudo-labels generated by the clustering method. Yet, these methods are primarily aimed at the image-based person Re-ID domain. This is because the background noise and interference information are complex and changeable in the video scenarios, resulting in large intra-class distances and small inter-class spaces, which easily lead to noisy labels. Huge domain gap and noisy labels hinder clustering and training processes heavily in the video-based person Re-ID. To address the problem, we propose a novel UDA method via Dynamic Clustering and Co-segment Attentive Learning (DCCAL) for it. DCCAL includes a Dynamic Clustering (DC) module and a Co-segment Attentive Learning (CAL) module to alleviate noisy labels by clustering pedestrians adaptively within different generation processes and reducing domain gap with a co-segmentation-based attention mechanism, respectively. Additionally, we introduce Kullback-Leibler (KL) divergence loss to reduce the distribution of features between two domains for better performance. Experimental results on two large-scale video-based person Re-ID datasets, MARS and DukeMTMC-VideoReID (DukeV), demonstrate exceptional precision performance. Our method outperforms state-of-the-art semi-supervised and unsupervised approaches by 1.1% in Rank-1 and 1.5% in mAP on DukeV, as well as 3.1% and 2.1% in Rank-1 and mAP on MARS, respectively. INDEX TERMSPerson re-identification, Unsupervised domain adaptation, Dynamic clustering, Co-segment attentive learning. I. INTRODUCTIONPerson Re-ID aims at retrieving a pedestrian across different non-overlapping cameras or from the same camera at different times. It primarily consists of image-and video-based methods that exploit spatial and spatiotemporal clues to represent a person in image or video sequences, respectively. Existing approaches predominantly rely on supervised learning with labeled datasets. However, labeling work is costly and impractical in realistic environments. Trained models often struggle to adapt to the target domain. The main reason for this is that pedestrians are easily affected by many factors such as illumination, viewpoint, background noise, occlusion, resolution, appearance, posture, etc. It will result in large intraclass differences in the same dataset and a great gap in sample distribution between two different domains, especially in the video-based person Re-ID. To meet the common and realistic difficulty, semi-supervised, unsupervised, and UDA methods for the person Re-ID are studied in many works.Semi-supervised methods use limited labeled samples in a d...

show abstract

Section: B Implementation Detailsmentioning

confidence: 99%

Section: ) Analysis Of Dynamic Clusteringmentioning

confidence: 99%

Section: ) Analysis Of Kl Divergence Lossmentioning

confidence: 99%

Section: ) Analysis Of Dccal Modelmentioning

confidence: 99%

See 2 more Smart Citations

Unsupervised Domain Adaptation Via Dynamic Clustering and Co-Segment Attentive Learning for Video-Based Person Re-Identification

Zhang,

Chen,

et al. 2024

IEEE Access

Self Cite

View full text Add to dashboard Cite

show abstract

“…However, cropping images using a fixed interval brings misalignments of local features, since some person images are acquired with inaccurate detection boxes, such as boxes with the person not centered or boxes with partial bodies. Therefore, the attention scheme [10][11][12] has been introduced to enforce the model to capture cardinal discriminative local features, which boosts the performance of person ReID models greatly. These methods usually focus on the existence of discriminative patterns without regard for positions and orientations.…”

Section: Introductionmentioning

confidence: 99%

An Orientation-Aware Attention Network for Person Re-Identification

Xu,

Chen,

Chai

2024

Electronics

View full text Add to dashboard Cite

Humans always identify persons through their characteristics, salient attributes, and these attributes’ locations on the body. Most person re-identification methods focus on global and local features corresponding to the former two discriminations, cropping person images into horizontal strips to obtain coarse locations of body parts. However, discriminative clues corresponding to location differences cannot be discovered, so persons with similar appearances are often confused because of their alike components. To address the above problem, we introduce pixel-wise relative positions for the invariance of their orientations in viewpoint changes. To cope with the scale change of relative position, we combine relative positions with self-attention modules that perform on multi-level features. Moreover, in the data augmentation stage, mirrored images are given new labels due to the conversion of the relative position along a horizontal orientation and change in visual chirality. Extensive experiments on four challenging benchmarks demonstrate that the proposed approach shows its superiority and effectiveness in discovering discriminating features.

show abstract