Embedding Deep Metric for Person Re-identification: A Study Against Large Variations

Shi, Hailin; Yang, Yang; Zhu, Xiangyu; Liao, Shengcai; Lei, Zhen; Zheng, Wei‐Shi; Li, Stan Z.

doi:10.1007/978-3-319-46448-0_44

Cited by 248 publications

(196 citation statements)

References 41 publications

Supporting

Mentioning

196

Contrasting

Order By: Relevance

“…These global metrics [16][17][18][19] project features into low dimension subspace where they tend to maximize the discrimination among different persons; however, these metrics still suffer a great challenge from impostor (an impostor is a person that belongs to the other person and, however, possess higher similarity with the given query than the right Gallery sample) samples [20,21]. Though, in past some attempts are made to eliminate impostors [14,[20][21][22], however, all these attempts have not given due consideration of different transform modals on which the reidentification images lie [23].…”

Section: Introductionmentioning

confidence: 99%

Impostor Resilient Multimodal Metric Learning for Person Reidentification

Syed

Han

et al. 2018

Advances in Multimedia

View full text Add to dashboard Cite

In person reidentification distance metric learning suffers a great challenge from impostor persons. Mostly, distance metrics are learned by maximizing the similarity between positive pair against impostors that lie on different transform modals. In addition, these impostors are obtained from Gallery view for query sample only, while the Gallery sample is totally ignored. In real world, a given pair of query and Gallery experience different changes in pose, viewpoint, and lighting. Thus, impostors only from Gallery view can not optimally maximize their similarity. Therefore, to resolve these issues we have proposed an impostor resilient multimodal metric (IRM3). IRM3 is learned for each modal transform in the image space and uses impostors from both Probe and Gallery views to effectively restrict large number of impostors. Learned IRM3 is then evaluated on three benchmark datasets, VIPeR, CUHK01, and CUHK03, and shows significant improvement in performance compared to many previous approaches.

show abstract

Section: Introductionmentioning

confidence: 99%

Impostor Resilient Multimodal Metric Learning for Person Reidentification

Syed

Han

et al. 2018

Advances in Multimedia

View full text Add to dashboard Cite

show abstract

“…Compared to first-order alternatives, our energy function is more robust against misalignment between sketch and photo channels, and can accommodate better the more detailed but noisier fine-grained feature map representation. Mahalanobis distance [41,35] is another example of a higher-order energy function in that it does O(N 2 ) comparisons for N channels. However it is based on elementwise difference followed by bilinear product so the effect is to learn which dimension pairs are important to match, rather than compensate for misalignment and noise between the input vectors.…”

Section: Shortcuts and Layer Fusion In Deep Learningmentioning

confidence: 99%

“…They have been used mainly for multi-view fusion, for example, fusing the text and image embeddings in visual question answering [21] and zero-shot recognition [4]. Outer product based distance is also used for formulating higher-order losses in Mahalanobis metric learning [41,35]. Given two vectors x and y, a Mahalanobis distance is defined as:…”

Section: Ranking Scorementioning

confidence: 99%

Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval

Song

et al. 2017

2017 IEEE International Conference on Computer Vision (ICCV)

221

217

View full text Add to dashboard Cite

Human sketches are unique in being able to capture both the spatial topology of a visual object, as well as its subtle appearance details. Fine-grained sketch-based image retrieval (FG-SBIR) importantly leverages on such fine-grained characteristics of sketches to conduct instancelevel retrieval of photos. Nevertheless, human sketches are often highly abstract and iconic, resulting in severe misalignments with candidate photos which in turn make subtle visual detail matching difficult. Existing FG-SBIR approaches focus only on coarse holistic matching via deep cross-domain representation learning, yet ignore explicitly accounting for fine-grained details and their spatial context. In this paper, a novel deep FG-SBIR model is proposed which differs significantly from the existing models in that: (1) It is spatially aware, achieved by introducing an attention module that is sensitive to the spatial position of visual details; (2) It combines coarse and fine semantic information via a shortcut connection fusion block; and (3) It models feature correlation and is robust to misalignments between the extracted features across the two domains by introducing a novel higher-order learnable energy function (HOLEF) based loss. Extensive experiments show that the proposed deep spatial-semantic attention model significantly outperforms the state-of-the-art.

show abstract

“…Ahmed et al [15] presented a deep convolutional architecture that captured local relationships between person images based on mid-level features. Generally, deep learning is usually utilized to learn feature representations by using deep convolutional features [14][15][16][17] or from the fully connected features [18][19][20] in person re-identification works.…”

Section: Related Workmentioning

confidence: 99%

“…(i) Feature construction and learning aim at designing or studying discriminative appearance descriptions [8][9][10][11][12][13][14][15][16][17][18][19][20] that are robust for distinguishing different pedestrians across arbitrary cameras. However, handcrafted feature construction is extremely challenging due to miscellaneous and complicated variations.…”

Section: Introductionmentioning

confidence: 99%

Co-Metric Learning for Person Re-Identification

Leng

2018

Advances in Multimedia

View full text Add to dashboard Cite

Person re-identification, aiming to identify the same pedestrian images across disjoint camera views, is a key technique of intelligent video surveillance. Although existing methods have developed both theories and experimental results, most of effective ones pertain to fully supervised training styles, which suffer the small sample size (SSS) problem a lot, especially in label-insufficient practical applications. To bridge SSS problem and learning model with small labels, a novel semisupervised co-metric learning framework is proposed to learn a discriminative Mahalanobis-like distance matrix for label-insufficient person re-identification. Different from typical co-training task that contains multiview data originally, single-view person images are firstly decomposed into pseudo two views, and then metric learning models are produced and jointly updated based on both pseudo-labels and references iteratively. Experiments carried out on three representative person re-identification datasets show that the proposed method performs better than state of the art and possesses low label sensitivity.

show abstract

Embedding Deep Metric for Person Re-identification: A Study Against Large Variations

Cited by 248 publications

References 41 publications

Impostor Resilient Multimodal Metric Learning for Person Reidentification

Impostor Resilient Multimodal Metric Learning for Person Reidentification

Deep Spatial-Semantic Attention for Fine-Grained Sketch-Based Image Retrieval

Co-Metric Learning for Person Re-Identification

Contact Info

Product

Resources

About