Given a query image from a camera, person re‐identification (Re‐ID) can retrieve the images of the same identity from a gallery, the images of which are captured by the other cameras. Therefore, person Re‐ID has been widely used in the field of video surveillance. However, person Re‐ID still suffers from a series of challenges, such as illumination changes, pose variations, and occlusions. Although the person Re‐ID methods based on attention mechanism give an effective and feasible solution for the above challenges, attention mechanism may make a network focus too much on the most salient discriminative features and ignore other potential discriminative features. To solve this problem, we propose a two‐level salient feature complementary network (TSFC‐Net) to extract the most salient discriminative features and the secondary salient discriminative features of pedestrian images for person Re‐ID. Specifically, TSFC‐Net first extracts the most salient discriminative features of pedestrian images by embedding the spatial and channel attention modules in the backbone network, and then extracts the secondary salient discriminative features of pedestrian images by a secondary salient feature mining module (SSFM). Since the final features of pedestrian images fuse the most salient discriminative features and the secondary salient discriminative features, TSFC‐Net can significantly improve the richness and discrimination capability of pedestrian representations. In addition, we conduct extensive experiments on the Market‐1501, DukeMTMC‐reID, and CUHK03 data sets, and the experimental results indicate that our TSFC‐Net has a better performance compared with most of the state‐of‐the‐art person Re‐ID methods.
Powerful local features can be extracted from multiple body regions of a pedestrian. Early person re-identification research has focused on extracting local features by locating regions with specific pre-defined semantics, which is not effective and increases the complexity of the network. In this paper, we propose a multiple granularity person re-identification network based on representation learning and metric learning for learning discriminative representations of pedestrian images. Multiple granularity person re-identification network consists of a multiple granularity feature extraction part and a combined loss part. In particular, the multiple granularity feature extraction part extracts global features and local features of different granularities from the feature maps of Conv4 and Conv5 of the ResNet50 backbone network, respectively, the extracted feature information is more comprehensive and discriminative. The combined loss part employs a joint representation learning and metric learning approach for supervised learning, which enables the model to learn more optimal parameters. The experimental results show that the Rank-1 accuracy of the multiple granularity person re-identification network reaches 95.2% and 88.2% on the Market1501 dataset and DukeMTMC-reID dataset, respectively, which illustrates the effectiveness of the model.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.