2021
DOI: 10.48550/arxiv.2109.11159
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

OH-Former: Omni-Relational High-Order Transformer for Person Re-Identification

Abstract: Transformers have shown preferable performance on many vision tasks. However, for the task of person re-identification (ReID), vanilla transformers leave the rich contexts on highorder feature relations under-exploited and deteriorate local feature details, which are insufficient due to the dramatic variations of pedestrians. In this work, we propose an Omni-Relational High-Order Transformer (OH-Former) to model omni-relational features for ReID. First, to strengthen the capacity of visual representation, inst… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3
1

Relationship

1
3

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 47 publications
(60 reference statements)
0
4
0
Order By: Relevance
“…Many works (Zhu et al 2021;Chen et al 2021;Lai, Chai, and Wei 2021) devote to extract partial region representation by transformer. For example, AAformer (Zhu et al 2021) uses the additional learnable vectors of 'part tokens' to learn the part representations by clustering the patch embeddings into several groups and integrates the part into the self-attention for alignment.…”
Section: Representation Learning In Re-idmentioning
confidence: 99%
“…Many works (Zhu et al 2021;Chen et al 2021;Lai, Chai, and Wei 2021) devote to extract partial region representation by transformer. For example, AAformer (Zhu et al 2021) uses the additional learnable vectors of 'part tokens' to learn the part representations by clustering the patch embeddings into several groups and integrates the part into the self-attention for alignment.…”
Section: Representation Learning In Re-idmentioning
confidence: 99%
“…Following previous works [9,37] in person ReID, we use the common cross-entropy (CE) loss and triplet loss to train the model. The CE loss has no label smoothing operation and can be defined as follow:…”
Section: The Objective Of Trainingmentioning
confidence: 99%
“…Since they lack inductive bias, they indeed learn inductive biases from amounts of data implicitly and lag behind CNNs in the low data regime [15]. Recently, some works try to introduce CNNs into vision transformers explicitly [9,11,18,30,[51][52][53]. However, their forcefully modified structure destroyed the intrinsic properties in transformers.…”
Section: Related Workmentioning
confidence: 99%