Proceedings of the 29th ACM International Conference on Multimedia 2021
DOI: 10.1145/3474085.3475643
|View full text |Cite
|
Sign up to set email alerts
|

MSO: Multi-Feature Space Joint Optimization Network for RGB-Infrared Person Re-Identification

Abstract: The RGB-infrared cross-modality person re-identification (ReID) task aims to recognize the images of the same identity between the visible modality and the infrared modality. Existing methods mainly use a two-stream architecture to eliminate the discrepancy between the two modalities in the final common feature space, which ignore the single space of each modality in the shallow layers. To solve it, in this paper, we present a novel multi-feature space joint optimization (MSO) network, which can learn modality… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
1
1

Relationship

0
9

Authors

Journals

citations
Cited by 34 publications
(7 citation statements)
references
References 48 publications
(93 reference statements)
0
7
0
Order By: Relevance
“…These SOTA approaches contain Hi-CMD [48] adopting ID-discriminative factors to robust cross-modality match,JSIA [49], CoAL [54], and DF 2 AM [56]using two-level alignment approaches for cross-modality person matching task, HC [47] using multiple loss functions (enumerate angular triplet (EAT) loss, hetero-center loss, and cross-modality knowledge distillation (CMKD) loss) to enhance the feature distinctiveness, CMSP [51] and ATTRI [52] adopting extra constraints to increase intra-class cross-modality similarity while mitigating modality-specific information, HAT [53] generating additional modality between both visible and infrared modalities for alleviating the modality differences, NFS [55] utilizing a BN-oriented/feature search space to achieve standard optimization/automatic feature selection for the cross-modality work. MSO [67] proposed a perceptual edge features (PEF) loss to optimize their network. G2DA [68] proposed a Geometry-Guided Dual-Alignment method to reduce crossmodality differences between part semantics and structural relations.…”
Section: Comparison To State-of-the-art(sota) Methodsmentioning
confidence: 99%
“…These SOTA approaches contain Hi-CMD [48] adopting ID-discriminative factors to robust cross-modality match,JSIA [49], CoAL [54], and DF 2 AM [56]using two-level alignment approaches for cross-modality person matching task, HC [47] using multiple loss functions (enumerate angular triplet (EAT) loss, hetero-center loss, and cross-modality knowledge distillation (CMKD) loss) to enhance the feature distinctiveness, CMSP [51] and ATTRI [52] adopting extra constraints to increase intra-class cross-modality similarity while mitigating modality-specific information, HAT [53] generating additional modality between both visible and infrared modalities for alleviating the modality differences, NFS [55] utilizing a BN-oriented/feature search space to achieve standard optimization/automatic feature selection for the cross-modality work. MSO [67] proposed a perceptual edge features (PEF) loss to optimize their network. G2DA [68] proposed a Geometry-Guided Dual-Alignment method to reduce crossmodality differences between part semantics and structural relations.…”
Section: Comparison To State-of-the-art(sota) Methodsmentioning
confidence: 99%
“…When color information is not reliable for pedestrian matching, shape information serves as a complement option [47][48][49]. Moreover, with the shape representation as input of the network, the modality discrepancy is also reduced [47]. However, compared with RGB or IR images as input of the network, shape representation greatly loses identity-related information which is also inferior for person re-identification task.…”
Section: Shape-guided Consistency Learningmentioning
confidence: 99%
“…For visible-infrared person re-identification problem, due to the difference of infrared images and visible images, color, as a common cue for person re-identification task, is not reliable. When color information is not reliable for pedestrian matching, shape information serves as a complement option [47][48][49]. Moreover, with the shape representation as input of the network, the modality discrepancy is also reduced [47].…”
Section: Shape-guided Consistency Learningmentioning
confidence: 99%
“…The representation learning methods [8,11] focus on how to improve the feature extraction process in both modals. Metric learning methods [16,18,19] focus on modifying the loss function during model training to optimize the feature metric space. Both representation learning methods and metric learning methods attempt to align modals during the feature-learning phase, but often have serious artificial design traces and bring complex network structures and calculations, which reduce the transferability of the model.…”
Section: Visible-infrared Person Re-identificationmentioning
confidence: 99%