Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification

Zhu, Haowei; Ke, Wenjing; Liu, Dong; Liu, Ji; Liu, Tian; Shan, Yi

doi:10.1109/cvpr52688.2022.00465

Cited by 86 publications

(40 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…CNN-based methods in the 1st group leverage the cross-image information, which mine the relations among Instance-Level features. Transformer-based DCAL (Zhu et al 2022) proposes cross-attention to learn local features better, which is classified as a Part-Level method. However, our proposed Identity-Level method are introduced to extract more unified features among all intra-identity instances and push inter-identity instances far away.…”

Section: Comparisons With State-of-the-art Methodsmentioning

confidence: 99%

“…(1) Global-Level methods (Luo et al 2019b;Chen et al 2020Chen et al , 2019Zhang et al 2020;Fang et al 2019;Si et al 2018;Zheng et al 2019;Chen et al 2018;Luo et al 2019a), which directly learn the global representation by optimizing Instance-Level features of each single complete image, as shown in Figure 1.b. (2) Part-Level methods (Sun et al 2018;Wang et al 2018;Li et al 2021b;Zhu et al 2020;He and Liu 2020;Zhang et al 2019;Jin et al 2020;Zhu et al 2021Zhu et al , 2022, as shown in Figure 1.c, which learn local aggregated features from different parts.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

X-ReID: Cross-Instance Transformer for Identity-Level Person Re-Identification

Shen¹,

He²,

Guo³

et al. 2023

Preprint

View full text Add to dashboard Cite

Currently, most existing person re-identification methods use Instance-Level features, which are extracted only from a single image. However, these Instance-Level features can easily ignore the discriminative information due to the appearance of each identity varies greatly in different images. Thus, it is necessary to exploit Identity-Level features, which can be shared across different images of each identity. In this paper, we propose to promote Instance-Level features to Identity-Level features by employing cross-attention to incorporate information from one image to another of the same identity, thus more unified and discriminative pedestrian information can be obtained. We propose a novel training framework named X-ReID. Specifically, a Cross Intra-Identity Instances module (IntraX) fuses different intra-identity instances to transfer Identity-Level knowledge and make Instance-Level features more compact. A Cross Inter-Identity Instances module (InterX) involves hard positive and hard negative instances to improve the attention response to the same identity instead of different identity, which minimizes intra-identity variation and maximizes inter-identity variation. Extensive experiments on benchmark datasets show the superiority of our method over existing works. Particularly, on the challenging MSMT17, our proposed method gains 1.1% mAP improvements when compared to the second place.

show abstract

Section: Comparisons With State-of-the-art Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

X-ReID: Cross-Instance Transformer for Identity-Level Person Re-Identification

Shen¹,

He²,

Guo³

et al. 2023

Preprint

View full text Add to dashboard Cite

show abstract

“…Zhang et al [30] take multi-scale features from CNN backbones as input, and use transformer to learn cross-scale relations. Zhu et al [31] propose global-local cross-attention to learn the interactions between global feature and local high-response patches, which can help reinforce the spatial-wise discriminative clues for recognition. However, all the above methods focus on pre-designing several fixed scales on the input feature, and it is impossible to cover all scales.…”

Section: Cross-scale Interactionmentioning

confidence: 99%

An adaptive cross‐scale transformer based on graph signal processing for person re‐identification

Zhou

Hou

et al. 2023

IET Image Processing

View full text Add to dashboard Cite

Extracting robust feature representation is one of the key challenges for person reidentification (ReID) task. Although convolution neural network (CNN)-based methods have achieved great success, they still cannot handle the part occlusion and misalignment caused by limited receptive field. Recently, pure transformer models have shown its power in the person ReID task. However, current transformer models adopt patches of equal-scale as input, and cannot solve the problem of cross-scale interaction properly. To overcome this problem, an adaptive cross-scale transformer from a perspective of the graph signal, named ACSFormer, is proposed. Specifically, the self-attention module is first treated as an undirected fully connected graph. And then, "node variation" is introduced as an indicator to adaptively merge neighbourhood tokens. To the best of the authors' knowledge, their ACSFormer is the first work to attempt to combine pure transformers and graph signal processing in the field of person ReID. Extensive evaluations are conducted on three person ReID datasets to validate the performance of ACSFormer. Experiments demonstrate that this ACSFormer performs on par with state-of-the-art CNN-based methods and consistently improves transformer-based baseline, for example, surpassing ViT-baseline by 2.5%, 2.7% and 4.8% mAP on Market1501, DukeMTMC-reID and MSMT17, respectively.

show abstract

“…Domain shift arises from differences in cameras, imaging altitude, illumination and water column properties. In addition, classifying benthic species and physical features is a fine-grained classification problem as there are many classes with high intra-class variations and also similar inter-class features, increasing the difficulty of the classification task [7], [8].…”

Section: Introductionmentioning

confidence: 99%

Improved Benthic Classification using Resolution Scaling and SymmNet Unsupervised Domain Adaptation

Doig¹,

Pizarro²,

Williams³

2023

Preprint

View full text Add to dashboard Cite

Autonomous Underwater Vehicles (AUVs) conduct regular visual surveys of marine environments to characterise and monitor the composition and diversity of the benthos. The use of machine learning classifiers for this task is limited by the low numbers of annotations available and the many fine-grained classes involved. In addition to these challenges, there are domain shifts between image sets acquired during different AUV surveys due to changes in camera systems, imaging altitude, illumination and water column properties leading to a drop in classification performance for images from a different survey where some or all these elements may have changed. This paper proposes a framework to improve the performance of a benthic morphospecies classifier when used to classify images from a different survey compared to the training data. We adapt the SymmNet state-of-the-art Unsupervised Domain Adaptation method with an efficient bilinear pooling layer and image scaling to normalise spatial resolution, and show improved classification accuracy. We test our approach on two datasets with images from AUV surveys with different imaging payloads and locations. The results show that generic domain adaptation can be enhanced to produce a significant increase in accuracy for images from an AUV survey that differs from the training images.

show abstract

Dual Cross-Attention Learning for Fine-Grained Visual Categorization and Object Re-Identification

Cited by 86 publications

References 36 publications

X-ReID: Cross-Instance Transformer for Identity-Level Person Re-Identification

X-ReID: Cross-Instance Transformer for Identity-Level Person Re-Identification

An adaptive cross‐scale transformer based on graph signal processing for person re‐identification

Improved Benthic Classification using Resolution Scaling and SymmNet Unsupervised Domain Adaptation

Contact Info

Product

Resources

About