2018
DOI: 10.1007/978-3-030-01270-0_4
|View full text |Cite
|
Sign up to set email alerts
|

Improving Deep Visual Representation for Person Re-identification by Global and Local Image-language Association

Abstract: Person re-identification is an important task that requires learning discriminative visual features for distinguishing different person identities. Diverse auxiliary information has been utilized to improve the visual feature learning. In this paper, we propose to exploit natural language description as additional training supervisions for effective visual features. Compared with other auxiliary information, language can describe a specific person from more compact and semantic visual aspects, thus is compleme… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
71
0

Year Published

2019
2019
2021
2021

Publication Types

Select...
5
2
2

Relationship

1
8

Authors

Journals

citations
Cited by 129 publications
(77 citation statements)
references
References 64 publications
0
71
0
Order By: Relevance
“…And they only pay attention to one single direction when using the fine-grained matching or attention scheme for representation enhancement, i.e., only using text for weighting different visual components. Chen et al [5] improve visual representations by global and local cross-modal associations. The global image-language association is established according to the identity labels, and the local association focuses on improving the visual representations by phrase reconstruction.…”
Section: Description-based Person Re-identificationmentioning
confidence: 99%
“…And they only pay attention to one single direction when using the fine-grained matching or attention scheme for representation enhancement, i.e., only using text for weighting different visual components. Chen et al [5] improve visual representations by global and local cross-modal associations. The global image-language association is established according to the identity labels, and the local association focuses on improving the visual representations by phrase reconstruction.…”
Section: Description-based Person Re-identificationmentioning
confidence: 99%
“…(3) Sentenceaware context object erasing, where we erase a dominant context region, based on the sentence-aware object-level attention weights over context objects. Note that (2) and (3) are two complementary approaches for sentence-aware visual erasing. With training samples generated online by the erasing operation, the model cannot access the most dominant information, and is forced to further discover complementary textual-visual correspondences previously ignored.…”
Section: Introductionmentioning
confidence: 99%
“…We compared our approach with following nine state-of-the-art (SOTA) approaches: CNN-RNN [ 44 ], NeuralTalk [ 45 ], GNA-RNN [ 6 ], Latent Co-attention [ 31 ], PWM + ATH [ 32 ], GLA [ 46 ], Dual Path [ 17 ], CMPM + CMPC [ 7 ], and PMA [ 8 ].…”
Section: Experiments and Discussionmentioning
confidence: 99%