Person re-identification (re-id) is a significant application in public security and attracts much more research interest due to its significant application in reality. Most person re-id models focus on imagebased or video-based re-id problems. In fact, image-to-video person re-id has important significance in lost-human location, criminal-tracking, and pedestrian video retrieval. In image-to-video person re-id task, the key challenge of this issue is how to build an accurate connection between appearance image features and spatio-temporal video features due to the huge cross-media gap in different modalities. Although existing image-to-video person re-id models have achieved good effectiveness, there is still a large distance away from practical application. These methods only consider the similarity measurement of cross-media features, which are extracted from the original whole image/video without any importance. However, the main useful and discriminative information is always contained in human body parts (torso, elbow, wrist, knee, and ankle), while pedestrian image/video backgrounds retain lots of useless information. In this paper, we present a Cross-media Body-part Attention Network (CBAN) for image-to-video person re-id, which can extract the cross-media body part attention features from images/videos (by CNN/LSTM), and simultaneously ignore the useless information in the background by using a part attention mechanism. Besides, our network can alleviate the inherent cross-media gap by a novel media-pulling constraint term. The extensive experiments are conducted on three large scale datasets (Market1501, Mars and CUHK03) and two small datasets (PRID-2011, iLIDS-VID), and the results show our CBAN approach can solve the image-to-video person re-id problem effectively with a body-part attention mechanism. INDEX TERMS Image-to-video person re-identification, joint-attention, cross-media gap, media-pulling constraint.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.