Matching people across non-overlapping camera views, known as person re-identification, is challenging due to the lack of spatial and temporal constraints and large visual appearance changes caused by variations in view angle, lighting, background clutter and occlusion. To address these challenges, most previous approaches aim to extract visual features that are both distinctive and stable under appearance changes. However, most visual features and their combinations under realistic conditions are neither stable nor distinctive thus should not be used indiscriminately. In this paper, we propose to formulate person re-identification as a distance learning problem, which aims to learn the optimal distance that can maximises matching accuracy regardless the choice of representation. To that end, we introduce a novel Probabilistic Relative Distance Comparison (PRDC) model, which differs from most existing distance learning methods in that, rather than minimising intra-class variation whilst maximising intra-class variation, it aims to maximise the probability of a pair of true match having a smaller distance than that of a wrong match pair. This makes our model more tolerant to appearance changes and less susceptible to model over-fitting. Extensive experiments are carried out to demonstrate that 1) by formulating the person re-identification problem as a distance learning problem, notable improvement on matching accuracy can be obtained against conventional person re-identification techniques, which is particularly significant when the training sample size is small; and 2) our PRDC outperforms not only existing distance learning methods but also alternative learning methods based on boosting and learning to rank.
Solving the person re-identification problem involves matching observations of individuals across disjoint camera views. The problem becomes particularly hard in a busy public scene as the number of possible matches is very high. This is further compounded by significant appearance changes due to varying lighting conditions, viewing angles and body poses across camera views. To address this problem, existing approaches focus on extracting or learning discriminative features followed by template matching using a distance measure. The novelty of this work is that we reformulate the person reidentification problem as a ranking problem and learn a subspace where the potential true match is given highest ranking rather than any direct distance measure. By doing so, we convert the person re-identification problem from an absolute scoring problem to a relative ranking problem. We further develop an novel Ensemble RankSVM to overcome the scalability limitation problem suffered by existing SVM-based ranking methods. This new model reduces significantly memory usage therefore is much more scalable, whilst maintaining high-level performance. We present extensive experiments to demonstrate the performance gain of the proposed ranking approach over existing template matching and classification models.
Person re-identification (Re-ID) is an important problem in video surveillance, aiming to match pedestrian images across camera views. Currently, most works focus on RGB-based Re-ID. However, in some applications, RGB images are not suitable, e.g. in a dark environment or at night. Infrared (IR) imaging becomes necessary in many visual systems. To that end, matching RGB images with infrared images is required, which are heterogeneous with very different visual characteristics. For person Re-ID, this is a very challenging cross-modality problem that has not been studied so far. In this work, we address the RGB-IR cross-modality Re-ID problem and contribute a new multiple modality Re-ID dataset named SYSU-MM01, including RGB and IR images of 491 identities from 6 cameras, giving in total 287,628 RGB images and 15,792 IR images. To explore the RGB-IR Re-ID problem, we evaluate existing popular cross-domain models, including three commonly used neural network structures (one-stream, twostream and asymmetric FC layer) and analyse the relation between them. We further propose deep zero-padding for training one-stream network towards automatically evolving domain-specific nodes in the network for cross-modality matching. Our experiments show that RGB-IR crossmodality matching is very challenging but still feasible using the proposed model with deep zero-padding, giving the best performance. Our dataset is available at http:// isee.sysu.edu.cn/project/RGBIRReID.htm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.