2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022
DOI: 10.1109/cvpr52688.2022.00465
|View full text |Cite
|
Sign up to set email alerts
|

Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
40
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 86 publications
(40 citation statements)
references
References 36 publications
0
40
0
Order By: Relevance
“…CNN-based methods in the 1st group leverage the cross-image information, which mine the relations among Instance-Level features. Transformer-based DCAL (Zhu et al 2022) proposes cross-attention to learn local features better, which is classified as a Part-Level method. However, our proposed Identity-Level method are introduced to extract more unified features among all intra-identity instances and push inter-identity instances far away.…”
Section: Comparisons With State-of-the-art Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…CNN-based methods in the 1st group leverage the cross-image information, which mine the relations among Instance-Level features. Transformer-based DCAL (Zhu et al 2022) proposes cross-attention to learn local features better, which is classified as a Part-Level method. However, our proposed Identity-Level method are introduced to extract more unified features among all intra-identity instances and push inter-identity instances far away.…”
Section: Comparisons With State-of-the-art Methodsmentioning
confidence: 99%
“…(1) Global-Level methods (Luo et al 2019b;Chen et al 2020Chen et al , 2019Zhang et al 2020;Fang et al 2019;Si et al 2018;Zheng et al 2019;Chen et al 2018;Luo et al 2019a), which directly learn the global representation by optimizing Instance-Level features of each single complete image, as shown in Figure 1.b. (2) Part-Level methods (Sun et al 2018;Wang et al 2018;Li et al 2021b;Zhu et al 2020;He and Liu 2020;Zhang et al 2019;Jin et al 2020;Zhu et al 2021Zhu et al , 2022, as shown in Figure 1.c, which learn local aggregated features from different parts.…”
Section: Introductionmentioning
confidence: 99%
“…Zhang et al [30] take multi-scale features from CNN backbones as input, and use transformer to learn cross-scale relations. Zhu et al [31] propose global-local cross-attention to learn the interactions between global feature and local high-response patches, which can help reinforce the spatial-wise discriminative clues for recognition. However, all the above methods focus on pre-designing several fixed scales on the input feature, and it is impossible to cover all scales.…”
Section: Cross-scale Interactionmentioning
confidence: 99%
“…Domain shift arises from differences in cameras, imaging altitude, illumination and water column properties. In addition, classifying benthic species and physical features is a fine-grained classification problem as there are many classes with high intra-class variations and also similar inter-class features, increasing the difficulty of the classification task [7], [8].…”
Section: Introductionmentioning
confidence: 99%