2021
DOI: 10.1609/aaai.v35i2.16176
|View full text |Cite
|
Sign up to set email alerts
|

Context-aware Attentional Pooling (CAP) for Fine-grained Visual Classification

Abstract: Deep convolutional neural networks (CNNs) have shown a strong ability in mining discriminative object pose and parts information for image recognition. For fine-grained recognition, context-aware rich feature representation of object/scene plays a key role since it exhibits a significant variance in the same subcategory and subtle variance among different subcategories. Finding the subtle variance that fully characterizes the object/scene is not straightforward. To address this, we propose a novel context-awar… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
28
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
1

Relationship

2
5

Authors

Journals

citations
Cited by 48 publications
(29 citation statements)
references
References 45 publications
0
28
0
Order By: Relevance
“…Performance-wise, CAP [7] is currently top of the leaderboard for many FGVC datasets. It uses attention to accumulate features from integral regions in a context-aware fashion.…”
Section: Arxiv:220902109v1 [Cscv] 5 Sep 2022mentioning
confidence: 99%
See 3 more Smart Citations
“…Performance-wise, CAP [7] is currently top of the leaderboard for many FGVC datasets. It uses attention to accumulate features from integral regions in a context-aware fashion.…”
Section: Arxiv:220902109v1 [Cscv] 5 Sep 2022mentioning
confidence: 99%
“…The attention mechanism is proliferated to identify salient regions and/or subtle discriminatory features to attain superior performance [7]- [9], [28], [29]. A trilinear attention sampling in [29] learns features from hundreds of part proposals and then applies knowledge distillation to integrate them.…”
Section: B Attention-based Approachesmentioning
confidence: 99%
See 2 more Smart Citations
“…RNNPool [35] uses recurrent neural networks (RNNs) to aggregate features of large 1D receptive fields across different dimensions. In order to extract context-aware rich features for fine-grained visual recognition, another recent work [3] called CAP proposed a novel attentive pooling that correlates between different regions of the convolutional feature map to help discriminate between subcategories and improve accuracy. In particular, CAP is applied late in the network (after all convolutional layers) and is not intended to reduce the model's memory footprint, in contrast to our work which applies pooling to down-sample the large activation maps early in the network.…”
Section: A Pooling Techniquesmentioning
confidence: 99%