Exploit the Unknown Gradually: One-Shot Video-Based Person Re-identification by Stepwise Learning

Wu, Yu; Lin, Yutian; Dong, Xuanyi; Yan, Yan; Ouyang, Wanli; Yang, Yi

doi:10.1109/cvpr.2018.00543

Cited by 360 publications

(294 citation statements)

References 33 publications

Supporting

Mentioning

285

Contrasting

Order By: Relevance

“…Suh et al [39] propose a two-stream architecture to jointly learn the appearance feature and part feature, and fuse the image level features through a pooling strategy. Average pooling is also used in recent works [21,47], which apply unsupervised learning for video person ReID. Temporal pooling exhibits promising efficiency, but extracts frame features independently and ignores the temporal orders among adjacent frames.…”

Section: Related Workmentioning

confidence: 99%

“…As shown in our experiments and visualizations, GLTR presents strong discriminative power and robustness. We test our approach on a newly proposed Large-Scale Video dataset for person ReID (LS-VID) and four widely used video ReID datasets, including PRID [14], iLIDS-VID [43], MARS [56], and DukeMTMC-VideoReID [47,34], respectively. Experimental results show that GLTR achieves consistent performance superiority on those datasets.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Global-Local Temporal Representations for Video Person Re-Identification

Zhang

Wang³

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

204

123

View full text Add to dashboard Cite

This paper proposes the Global-Local Temporal Representation (GLTR) to exploit the multi-scale temporal cues in video sequences for video person Re-Identification (ReID). GLTR is constructed by first modeling the short-term temporal cues among adjacent frames, then capturing the long-term relations among inconsecutive frames. Specifically, the short-term temporal cues are modeled by parallel dilated convolutions with different temporal dilation rates to represent the motion and appearance of pedestrian. The long-term relations are captured by a temporal selfattention model to alleviate the occlusions and noises in video sequences. The short and long-term temporal cues are aggregated as the final GLTR by a simple single-stream CNN. GLTR shows substantial superiority to existing features learned with body part cues or metric learning on four widely-used video ReID datasets. For instance, it achieves Rank-1 Accuracy of 87.02% on MARS dataset without reranking, better than current state-of-the art.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Global-Local Temporal Representations for Video Person Re-Identification

Zhang

Wang³

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

204

123

View full text Add to dashboard Cite

show abstract

“…Similar to our work, cross-camera tracklet association (labeling) [21], [28], [39] is more scalable for unsupervised Re-ID with no extra data or assumption on the similarity between source and target domains. Due to the limitation of existing datasets, most of them do not perform Re-ID learning in a pure unsupervised way.…”

Section: Related Workmentioning

confidence: 63%

Progressive Unsupervised Person Re-Identification by Tracklet Association With Spatio-Temporal Regularization

Xie

Zhou

et al. 2021

IEEE Trans. Multimedia

View full text Add to dashboard Cite

Existing methods for person re-identification (Re-ID) are mostly based on supervised learning which requires numerous manually labeled samples across all camera views for training. Such a paradigm suffers the scalability issue since in real-world Re-ID application, it is difficult to exhaustively label abundant identities over multiple disjoint camera views. To this end, we propose a progressive deep learning method for unsupervised person Re-ID in the wild by Tracklet Association with Spatio-Temporal Regularization (TASTR). In our approach, we first collect tracklet data within each camera by automatic person detection and tracking. Then, an initial Re-ID model is trained based on within-camera triplet construction for person representation learning. After that, based on the person visual feature and spatio-temporal constraint, we associate cross-camera tracklets to generate cross-camera triplets and update the Re-ID model. Lastly, with the refined Re-ID model, better visual feature of person can be extracted, which further promote the association of cross-camera tracklets. The last two steps are iterated multiple times to progressively upgrade the Re-ID model. To facilitate the study, we have collected a new 4K UHD video dataset named Campus-4K with full frames and full spatio-temporal information. Experimental results show that with the spatio-temporal constraint in the training phase, the proposed approach outperforms the state-of-the-art unsupervised methods by notable margins on DukeMTMC-reID, and achieves competitive performance to fully supervised methods on both DukeMTMC-reID and Campus-4K datasets.

show abstract

“…DukeMTMC-VideoReID dataset is another large scale benchmark dataset for video-based person Re-ID, which is derived from the DukeMTMC dataset [56] and re-organized by Wu et al [57]. The DukeMTMC-VideoReID dataset contains totally 4,832 tracklets and 1,812 identities, it is separated into 702, 702 and 408 identities for training, testing and distraction.…”

Section: A Datasetsmentioning

confidence: 99%

Few-Shot Deep Adversarial Learning for Video-Based Person Re-Identification

Wang

Yin

et al. 2020

IEEE Trans. on Image Process.

View full text Add to dashboard Cite

Recent years have witnessed a great development of deep learning based video person re-identification (Re-ID).A key factor for video person Re-ID is how to effectively construct discriminative video feature representations for the robustness to many complicated situations like occlusions. Recent part-based approaches employ spatial and temporal attention to extract the representative local features. While the correlations between the parts are ignored in the previous methods, to leverage the relations of different parts, we propose an innovative adaptive graph representation learning scheme for video person Re-ID, which enables the contextual interactions between the relevant regional features. Specifically, we exploit pose alignment connection and feature affinity connection to construct an adaptive structure-aware adjacency graph, which models the intrinsic relations between graph nodes. We perform feature propagation on the adjacency graph to refine the original regional features iteratively, the neighbor nodes information is taken into account for part feature representation. To learn the compact and discriminative representations, we further propose a novel temporal resolution-aware regularization, which enforces the consistency among different temporal resolutions for the same identities. We conduct extensive evaluations on four benchmarks, i.e. iLIDS-VID, PRID2011, MARS, and DukeMTMC-VideoReID, the experimental results achieve the competitive performance which demonstrates the effectiveness of our proposed method.

show abstract

Exploit the Unknown Gradually: One-Shot Video-Based Person Re-identification by Stepwise Learning

Cited by 360 publications

References 33 publications

Global-Local Temporal Representations for Video Person Re-Identification

Global-Local Temporal Representations for Video Person Re-Identification

Progressive Unsupervised Person Re-Identification by Tracklet Association With Spatio-Temporal Regularization

Few-Shot Deep Adversarial Learning for Video-Based Person Re-Identification

Contact Info

Product

Resources

About