Adaptation and Re-identification Network: An Unsupervised Deep Transfer Learning Approach to Person Re-identification

Li, Yu-Jhe; Yang, Fu-En; Liu, Yen‐Cheng; Yeh, Yu-Ying; Du, Xiaofei; Wang, Yu-Chiang Frank

doi:10.1109/cvprw.2018.00054

Cited by 103 publications

(57 citation statements)

References 18 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The idea of using a deep learning architecture for person re-identification stems from Siamese CNN with either two or three branches for pairwise verification loss [25] or triplet loss [26,27] respectively, by proposing new layers [1] or by fusing features from different body parts with a multi-scale CNN structure [2,3]. Another trend of using deep learning architecture is transfer learning [4,25,29], for when the distribution of the training data from the source domain is different from that of the target domain. The most common deep transfer learning strategy for re-identification [4] is to pre-train a base network on a large scale or combination of different datasets as source dataset, and transfer learned representation to the target dataset.…”

Section: Related Workmentioning

confidence: 99%

RGB-Depth Cross-Modal Person Re-identification

Hafner

Bhuiyan

Kooij

et al. 2019

2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)

View full text Add to dashboard Cite

Person re-identification is a key challenge for surveillance across multiple sensors. Prompted by the advent of powerful deep learning models for visual recognition, and inexpensive RGBD cameras and sensor-rich mobile robotic platforms, e.g. self-driving vehicles, we investigate the relatively unexplored problem of cross-modal re-identification of persons between RGB (color) and depth images. The considerable divergence in data distributions across different sensor modalities introduces additional challenges to the typical difficulties like distinct viewpoints, occlusions, and pose and illumination variation. While some work has investigated re-identification across RGB and infrared, we take inspiration from successes in transfer learning from RGB to depth in object detection tasks. Our main contribution is a novel cross-modal distillation network for robust person re-identification, which learns a shared feature representation space of person's appearance in both RGB and depth images. The proposed network was compared to conventional and deep learning approaches proposed for other cross-domain re-identification tasks. Results obtained on the public BIWI and RobotPKU datasets indicate that the proposed method can significantly outperform the state-of-the-art approaches by up to 10.5% mAP, demonstrating the benefit of the proposed distillation paradigm. This paper focuses on deep neural networks for cross-modal person re-identification that allow sensing between RGB and depth modalities. Although some methods have been proposed for cross-modal re-identification between RGB and infrared images [10, 11, 12, 13], almost no research addressing RGB and depth images exists [16, 17]. However, sensing across RGB and depth modalities is important in many real-world scenarios. This is the case, for example, with video surveillance systems that must recognize individuals in poorly illuminated environments [14]. Another use case are autonomous self-driving vehicles, which require tracking pedestrians around their vicinity, where some regions are covered by lidar sensors, and others by RGB cameras. Besides these practical applications, research in cross-modal re-identification can also help legal interpretation of depth-based images concerning privacy data protection (e.g. within GDPR). While it is clear that person data from a RGB camera is highly sensible concerning data privacy, it is still unclear how much private information can be extracted from depth images. In this paper, a new cross-modal distillation network is proposed for robust person re-identification across RGB and depth sensors. The task is addressed by creating a common embedding of images from both the depth and RGB modalities, as visualized in Figure'1. The proposed method exploits a two-step optimization process. In the first step a

show abstract

Section: Related Workmentioning

confidence: 99%

RGB-Depth Cross-Modal Person Re-identification

Hafner

Bhuiyan

Kooij

et al. 2019

2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)

View full text Add to dashboard Cite

show abstract

“…We compare our method with the state-of-the-art unsupervised Re-ID methods on Market-1501, DukeMTMC-reID and MSMT17 datasets. The compared methods include two handcrafted feature based methods: LOMO [10] and BoW [42], seven unsupervised domain adaptation methods without considering latent label information: TJ-AIDL [8], PTGAN [13], SPGAN [12], MMFA [48], HHL [33], ARN [49] and ECN [34] and five pseudo label estimation methods: CAMEL [14], PUL [15], UDAR [16], MAR [17] and SSG [18]. The results are shown in Table 1, Table 2 and Table 3, respectively.…”

Section: Comparison With State-of-the-art Methodsmentioning

confidence: 99%

Exploring Latent Information for Unsupervised Person Re-Identification by Discriminative Learning Networks

Zhang

Sun

et al. 2020

IEEE Access

View full text Add to dashboard Cite

For unsupervised domain adaption in person re-identification (Re-ID) tasks, the generally used label estimation approaches simply use the global features or the uniform part features. They often neglect the variations of samples having the same identity caused by occlusion, misalignment and uncontrollable camera settings. In this paper, we propose a discriminative learning network with target domain latent information (LatentDLN) to enhance the generalization ability of the Re-ID model. Specifically, to generate a discriminative and robust representation, two types of latent information in the samples from the target domain are explored by the multi-branch deep structure. First, the key points based valid region information is used to leverage the local and global cues in human body, and then a heuristic distance metric learning method based on the global features and the local features is proposed to effectively evaluate the similarity between different images. Second, the camera style transferred images are used as augmentation data to bridge the gap between different cameras in target domains. Moreover, the re-rank mechanism based on reciprocal neighbors is designed to improve the quality of the label estimation. Experimental results on Market-1501, DukeMTMC-ReID and MSMT17 datasets validate the significant effectiveness of the proposed LatentDLN for unsupervised Re-ID. INDEX TERMS Person re-identification, unsupervised domain adaptation, unsupervised learning.

show abstract

“…Market-1501. In Table 1, we compare our proposed model with the use of Bag-of-Words (BoW) [58] for matching (i.e., no transfer), four unsupervised re-ID approaches, including UMDL [42], PUL [15], CAMEL [54] and TAUDL [29], and seven cross-dataset re-ID methods, including PTGAN [51], SPGAN [12], TJ-AIDL [49], MMFA [35], HHL [61], CFSM [3] and ARN [32]. From this table, we see that our model achieved very promising [12] and HHL [61], we note that our model is able to generate cross-domain images conditioned on various poses rather than few camera styles.…”

Section: Quantitative Comparisonsmentioning

confidence: 99%

Cross-Dataset Person Re-Identification via Unsupervised Pose Disentanglement and Adaptation

Lin

et al. 2019

2019 IEEE/CVF International Conference on Computer Vision (ICCV)

Self Cite

176

View full text Add to dashboard Cite

Person re-identification (re-ID) aims at recognizing the same person from images taken across different cameras. On the other hand, cross-dataset/domain re-ID focuses on leveraging labeled image data from source to target domains, while target-domain training data are without label information. In order to introduce discriminative ability and to generalize the re-ID model to the unsupervised target domain, our proposed Pose Disentanglement and Adaptation Network (PDA-Net) learns deep image representation with pose and domain information properly disentangled. Our model allows pose-guided image recovery and translation by observing images from either domain, without predefined pose category nor identity supervision. Our qualitative and quantitative results on two benchmark datasets confirm the effectiveness of our approach and its superiority over state-of-the-art cross-dataset re-ID approaches.

show abstract

Adaptation and Re-identification Network: An Unsupervised Deep Transfer Learning Approach to Person Re-identification

Cited by 103 publications

References 18 publications

RGB-Depth Cross-Modal Person Re-identification

RGB-Depth Cross-Modal Person Re-identification

Exploring Latent Information for Unsupervised Person Re-Identification by Discriminative Learning Networks

Cross-Dataset Person Re-Identification via Unsupervised Pose Disentanglement and Adaptation

Contact Info

Product

Resources

About