2021
DOI: 10.48550/arxiv.2101.06635
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Context-aware Attentional Pooling (CAP) for Fine-grained Visual Classification

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
5

Relationship

0
5

Authors

Journals

citations
Cited by 5 publications
(12 citation statements)
references
References 0 publications
0
12
0
Order By: Relevance
“…Compared to Imagenet-1k, when we transfer networks trained on Imagenet-21k, MetaFormer-1 achieved 2.0% and 2.2% improvements on CUB-200-2011 and NABirds. The accuracy of CUB-200-2011 and NABirds is 92.3% and 92.7%, respectively, which outperforms the SotA approaches (91.8% and 91.0% on CAP [2]) by a clear margin, using iNaturalist 2021 for pre-training. iNaturalist 2021 with fewer data can perform better than Imagenet-21k since the domain similarity between iNaturalist 2021 and downstream datasets is higher.…”
Section: The Importance Of Pre-trained Modelmentioning
confidence: 94%
See 2 more Smart Citations
“…Compared to Imagenet-1k, when we transfer networks trained on Imagenet-21k, MetaFormer-1 achieved 2.0% and 2.2% improvements on CUB-200-2011 and NABirds. The accuracy of CUB-200-2011 and NABirds is 92.3% and 92.7%, respectively, which outperforms the SotA approaches (91.8% and 91.0% on CAP [2]) by a clear margin, using iNaturalist 2021 for pre-training. iNaturalist 2021 with fewer data can perform better than Imagenet-21k since the domain similarity between iNaturalist 2021 and downstream datasets is higher.…”
Section: The Importance Of Pre-trained Modelmentioning
confidence: 94%
“…Note that these methods are verified based on the poor baseline. In addition, CAP [2] achieved the SotA performance on the CUB-200-2011. Our method can achieve comparable per-Table 4.…”
Section: The Power Of Meta Informationmentioning
confidence: 95%
See 1 more Smart Citation
“…To tackle above limitation, [9] exploits cosegmentation and alignment methods to generate discriminative regions aiming to learn which parts are vital to recognition. [10,11,12,13] researches on attention-based methods that just exploits image-level annotations to guide the deep network focusing on the most essential region with regard to each category. [11] designs a training paradigm that enables navigator to detect most informative regions via the teacher.…”
Section: Related Workmentioning
confidence: 99%
“…[12] explores the rich relationships between self-channels and interaction-channels which assists the network obtaining subtle inter-class difference. [13] adopts a context-aware attention pooling to capture subtle changes via sub-pixel gradients.…”
Section: Related Workmentioning
confidence: 99%