A Mask Based Deep Ranking Neural Network for Person Retrieval

Qi, Lei; Huo, Jing; Wang, Lei; Shi, Yinghuan

doi:10.1109/icme.2019.00092

Cited by 47 publications

(32 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [42], body poses/parts are first detected and deep neural networks are designed for representation learning on both the local parts and global region. Some works rely on constrained attention selection mechanisms from human mask/part/pose to implicitly calibrate misaligned images [32,25,45,14,34].…”

Section: Related Workmentioning

confidence: 99%

Densely Semantically Aligned Person Re-Identification

Zhang

Lan

Zeng

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

273

189

View full text Add to dashboard Cite

We propose a densely semantically aligned person reidentification framework. It fundamentally addresses the body misalignment problem caused by pose/viewpoint variations, imperfect person detection, occlusion, etc. By leveraging the estimation of the dense semantics of a person image, we construct a set of densely semantically aligned part images (DSAP-images), where the same spatial positions have the same semantics across different images. We design a two-stream network that consists of a main full image stream (MF-Stream) and a densely semantically-aligned guiding stream (DSAG-Stream). The DSAG-Stream, with the DSAP-images as input, acts as a regulator to guide the MF-Stream to learn densely semantically aligned features from the original image. In the inference, the DSAG-Stream is discarded and only the MF-Stream is needed, which makes the inference system computationally efficient and robust. To the best of our knowledge, we are the first to make use of fine grained semantics to address the misalignment problems for re-ID. Our method achieves rank-1 accuracy of 78.9% (new protocol) on the CUHK03 dataset, 90.4% on the CUHK01 dataset, and 95.7% on the Mar-ket1501 dataset, outperforming state-of-the-art methods.

show abstract

Section: Related Workmentioning

confidence: 99%

Densely Semantically Aligned Person Re-Identification

Zhang

Lan

Zeng

et al. 2019

2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

273

189

View full text Add to dashboard Cite

show abstract

“…Song et al [32] use the source image and the corresponding binary segmentation mask as inputs to extract discriminative features that are invariant to background clutter. Qi et al [24] adopt both the source image and the masked image as the network inputs, where a multi-layer fusion scheme and a ranking loss are developed to fuse the different levels of features and optimize the network, respectively. The mask-guided methods can extract aligned local features and focus on foreground areas by exploiting the results from semantic segmentation.…”

Section: A General Person Re-id Methodsmentioning

confidence: 99%

Semantic-Aware Occlusion-Robust Network for Occluded Person Re-Identification

Zhang

Yan

Xue

et al. 2021

IEEE Trans. Circuits Syst. Video Technol.

View full text Add to dashboard Cite

In recent years, deep learning-based person reidentification (Re-ID) methods have made significant progress. However, the performance of these methods substantially decreases when dealing with occlusion, which is ubiquitous in realistic scenarios. In this paper, we propose a novel semanticaware occlusion-robust network (SORN) that effectively exploits the intrinsic relationship between the tasks of person Re-ID and semantic segmentation for occluded person Re-ID. Specifically, the SORN is composed of three branches, including a local branch, a global branch, and a semantic branch. In particular, the local branch extracts part-based local features, and the global branch leverages a novel spatial-patch contrastive loss (SPC) to extract occlusion-robust global features. Meanwhile, the semantic branch generates a foreground-background mask for a pedestrian image, which indicates the non-occluded areas of the human body. The three branches are jointly trained in a unified multi-task learning network. Finally, pedestrian matching is performed based on the local features extracted from the nonoccluded areas and the global features extracted from the whole pedestrian image. Extensive experimental results on a large-scale occluded person Re-ID dataset (i.e., Occluded-DukeMTMC) and two partial person Re-ID datasets (i.e., Partial-REID and Partial-iLIDS) show the superiority of the proposed method compared with several state-of-the-art methods for occluded and partial person Re-ID. We also demonstrate the effectiveness of the proposed method on two general person Re-ID datasets (i.e., Market-1501 and DukeMTMC-reID).

show abstract

“…For example, Song et al [20] proposed a mask-guided background features and pulling body features closer to the full image. Qi et al [21] used the mask image together with the raw image as inputs and generated fusing features from different levels. Although these methods used mask information, they did not pay enough attention to masks.…”

Section: Related Workmentioning

confidence: 99%

“…Thus, combining segmentation and person ReID becomes a new way to obtain body regions explicitly. Qi et al [21] designed two branches, which use both raw and masked images as inputs, while Song et al [20] concatenated them to become a single image. However, due to huge difference between segmentation and ReID datasets in resolution, image size, and object classes, body mask generating faces many challenges.…”

Section: Related Workmentioning

confidence: 99%

A Dynamic Part-Attention Model for Person Re-Identification

Yao

Xiong

et al. 2019

Sensors

View full text Add to dashboard Cite

Person re-identification (ReID) is gaining more attention due to its important applications in pedestrian tracking and security prevention. Recently developed part-based methods have proven beneficial for stronger and explicit feature descriptions, but how to find real significant parts and reduce miscorrelation between images to improve accuracy of ReID still leaves much room to improve. In this paper, we propose a dynamic part-attention (DPA) method based on masks, which aims to improve the use of variable attention parts. Particularly, a two-branch network with a dynamic loss function is designed to extract features of the global image and the parts of the body separately. With the comprehensive but targeting learning strategy, the proposed method can capture discriminative features based, but not depending on, masks, which guides the whole network to focus on body features more consciously and achieves more robust performance. Our method achieves rank-1 accuracy of 91.68% on public dataset Market1501, and experimental results on three public datasets indicate that the proposed method is effective and achieves favorable accuracy when compared with the state-of-the-art methods.

show abstract

A Mask Based Deep Ranking Neural Network for Person Retrieval

Cited by 47 publications

References 20 publications

Densely Semantically Aligned Person Re-Identification

Densely Semantically Aligned Person Re-Identification

Semantic-Aware Occlusion-Robust Network for Occluded Person Re-Identification

A Dynamic Part-Attention Model for Person Re-Identification

Contact Info

Product

Resources

About