Temporal Complementary Learning for Video Person Re-Identification

Hou, Ruibing; Chang, Hong; Ma, Bingpeng; Shan, Shiguang; Chen, Xilin

doi:10.48550/arxiv.2007.09357

Cited by 3 publications

(1 citation statement)

References 48 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Thus, in video-based person Re-ID, some existing works [22,39,11,40] concentrate on extracting attentive spatial features in the spatial view. Meanwhile, some methods [26,7,25,17] attempt to obtain temporal observations by temporal learning mechanisms. Besides, some approaches [23,21,34,13] utilize 3D-CNN to jointly explore spatial-temporal cues.…”

Section: Related Work 21 Video-based Person Re-identificationmentioning

confidence: 99%

A Video Is Worth Three Views: Trigeminal Transformers for Video-based Person Re-identification

Liu,

Zhang,

et al. 2021

Preprint

View full text Add to dashboard Cite

Video-based person re-identification (Re-ID) aims to retrieve video sequences of the same person under nonoverlapping cameras. Previous methods usually focus on limited views, such as spatial, temporal or spatial-temporal view, which lack of the observations in different feature domains. To capture richer perceptions and extract more comprehensive video representations, in this paper we propose a novel framework named Trigeminal Transformers (TMT) for video-based person Re-ID. More specifically, we design a trigeminal feature extractor to jointly transform raw video data into spatial, temporal and spatialtemporal domain. Besides, inspired by the great success of vision transformer, we introduce the transformer structure for video-based person Re-ID. In our work, three selfview transformers are proposed to exploit the relationships between local features for information enhancement in spatial, temporal and spatial-temporal domains. Moreover, a cross-view transformer is proposed to aggregate the multiview features for comprehensive video representations. The experimental results indicate that our approach can achieve better performance than other state-of-the-art approaches on public Re-ID benchmarks. We will release the code for model reproduction.

show abstract

Section: Related Work 21 Video-based Person Re-identificationmentioning

confidence: 99%

A Video Is Worth Three Views: Trigeminal Transformers for Video-based Person Re-identification

Liu,

Zhang,

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

Watching You: Global-guided Reciprocal Learning for Video-based Person Re-identification

Liu¹,

Zhang²,

Yu³

et al. 2021

Preprint

View full text Add to dashboard Cite

Video-based person re-identification (Re-ID) aims to automatically retrieve video sequences of the same person under non-overlapping cameras. To achieve this goal, it is the key to fully utilize abundant spatial and temporal cues in videos. Existing methods usually focus on the most conspicuous image regions, thus they may easily miss out fine-grained clues due to the person varieties in image sequences. To address above issues, in this paper, we propose a novel Global-guided Reciprocal Learning (GRL) framework for video-based person Re-ID. Specifically, we first propose a Global-guided Correlation Estimation (GCE) to generate feature correlation maps of local features and global features, which help to localize the high-and lowcorrelation regions for identifying the same person. After that, the discriminative features are disentangled into high-correlation features and low-correlation features under the guidance of the global representations. Moreover, a novel Temporal Reciprocal Learning (TRL) mechanism is designed to sequentially enhance the high-correlation semantic information and accumulate the low-correlation sub-critical clues. Extensive experiments are conducted on three public benchmarks. The experimental results indicate that our approach can achieve better performance than other state-of-the-art approaches. The code is released at https://github.com/flysnowtiger/GRL.

show abstract