Object-aware Long-short-range Spatial Alignment for Few-Shot Fine-Grained Image Classification

Wu, Yike; Zhang, Bo; Yu, Gang; Zhang, Weixi; Wang, Bin; Chen, Tao; Fan, Jianchao

doi:10.1145/3474085.3475532

Cited by 20 publications

(15 citation statements)

References 45 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In this stage, the backbone is trained from scratch using SGD optimizer with a batch size of 128, a momentum of 0.9, a weight decay of 0.0005, and an initial learning rate of 0.1. To keep consistent with the setting in [58], the learning rate decays at 85 and 170 epochs. We remove the fully-connected layer for performing the next meta-training stage.…”

Section: Methodsmentioning

confidence: 99%

“…Backbone CUB → NABirds 1-shot 5-shot LSC+SSM (Baseline) [58] ResNet-12 45.70±0. 45 4: 5-way few-shot fine-grained classification results by adapting from the CUB-trained model to NABirds dataset using different backbones.…”

Section: Methodsmentioning

confidence: 99%

“…Besides, the work [57] tackles the FSFGL problem from the perspective of reconstructing the query image to learn a classifier. More recently, the work [58] tries to increase the fine-grained classification accuracy via long-shot-range spatial alignment between support and query features. Motivated by these works in the FSFGL community, we further extend the study of FSFGL to a Transformer-based structure, and investigate its effectiveness in strengthening the support-query relation matching process only given a few samples.…”

Section: Related Workmentioning

confidence: 99%

“…Therefore, researchers mainly focus on leveraging meta-learning technology [13,39,41,43,48,59] to deal with the FSL problem. However, the above-mentioned FSL methods focus to classify coarse-grained generic object categories, which are less suitable to address the few-shot fine-grained classification task [29,57,58,71], that requires to emphasize the local feature variations or subtle feature differences.…”

Section: Introductionmentioning

confidence: 99%

“…As illustrated in Fig. 1(a), many few-shot fine-grained works mine discriminative parts of the whole image based on the attention mechanism [71], feature map reconstruction [57], and feature-level spatial alignment [58]. However, these methods fail to As a matter of fact, to recognize a novel class's samples, humans tend to compare the local semantic parts' differences between the ever-seen and newly-seen objects, so that the subtle feature differences can be identified using only a few examples.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Learning Cross-Image Object Semantic Relation in Transformer for Few-Shot Fine-Grained Image Classification

Zhang¹,

Yuan²,

Li³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

Few-shot fine-grained learning aims to classify a query image into one of a set of support categories with fine-grained differences. Although learning different objects' local differences via Deep Neural Networks has achieved success, how to exploit the query-support cross-image object semantic relations in Transformer-based architecture remains under-explored in the few-shot fine-grained scenario. In this work, we propose a Transformer-based doublehelix model, namely HelixFormer, to achieve the cross-image object semantic relation mining in a bidirectional and symmetrical manner. The HelixFormer consists of two steps: 1) Relation Mining Process (RMP) across different branches, and 2) Representation Enhancement Process (REP) within each individual branch. By the designed RMP, each branch can extract fine-grained object-level Cross-image Semantic Relation Maps (CSRMs) using information from the other branch, ensuring better cross-image interaction in semantically related local object regions. Further, with the aid of CSRMs, the developed REP can strengthen the extracted features for those discovered semantically-related local regions in each branch, boosting the model's ability to distinguish subtle feature differences of fine-grained objects. Extensive experiments conducted on five public fine-grained benchmarks demonstrate that HelixFormer can effectively enhance the cross-image object semantic relation matching for recognizing fine-grained objects, achieving much better performance over most state-of-the-art methods under 1-shot and 5-shot scenarios. Our code is available at: https:// github.com/ JiakangYuan/ HelixFormer. CCS CONCEPTS• Computing methodologies → Visual content-based indexing and retrieval; Matching.

show abstract

Section: Methodsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Learning Cross-Image Object Semantic Relation in Transformer for Few-Shot Fine-Grained Image Classification

Zhang¹,

Yuan²,

Li³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

A Task-Aware Dual Similarity Network for Fine-Grained Few-Shot Learning

Qi¹,

Sun²,

Liu³

et al. 2022

Lecture Notes in Computer Science

View full text Add to dashboard Cite

The goal of fine-grained few-shot learning is to recognize sub-categories under the same super-category by learning few labeled samples. Most of the recent approaches adopt a single similarity measure, that is, global or local measure alone. However, for fine-grained images with high intra-class variance and low inter-class variance, exploring global invariant features and discriminative local details is quite essential. In this paper, we propose a Task-aware Dual Similarity Network(TDSNet), which applies global features and local patches to achieve better performance. Specifically, a local feature enhancement module is adopted to activate the features with strong discriminability. Besides, task-aware attention exploits the important patches among the entire task. Finally, both the class prototypes obtained by global features and discriminative local patches are employed for prediction. Extensive experiments on three fine-grained datasets demonstrate that the proposed TDSNet achieves competitive performance by comparing with other stateof-the-art algorithms.

show abstract

Membership-Grade Based Prototype Rectification for Fine-Grained Few-Shot Classification

Ning,

Qi,

Jiang

2023

Artificial Neural Networks and Machine Learning – ICANN 2023

View full text Add to dashboard Cite

Object-aware Long-short-range Spatial Alignment for Few-Shot Fine-Grained Image Classification

Cited by 20 publications

References 45 publications

Learning Cross-Image Object Semantic Relation in Transformer for Few-Shot Fine-Grained Image Classification

Learning Cross-Image Object Semantic Relation in Transformer for Few-Shot Fine-Grained Image Classification

A Task-Aware Dual Similarity Network for Fine-Grained Few-Shot Learning

Membership-Grade Based Prototype Rectification for Fine-Grained Few-Shot Classification

Contact Info

Product

Resources

About