2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2021
DOI: 10.1109/cvpr46437.2021.01350
|View full text |Cite
|
Sign up to set email alerts
|

Coordinate Attention for Efficient Mobile Network Design

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
511
0
4

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
2
1

Relationship

1
8

Authors

Journals

citations
Cited by 2,149 publications
(516 citation statements)
references
References 26 publications
1
511
0
4
Order By: Relevance
“…First of all, outlook attention encodes spatial information by measuring the similarity between pairs of token representations, which is more parameter-efficient for feature learning than convolutions, as studied in previous work [37,45]. Second, outlook attention adopts a sliding window mechanism to locally encode token representations at fine level, and to some extent preserves the crucial positional information for vision tasks [25,56]. Third, the way of generating attention weights is simple and efficient.…”
Section: Discussionmentioning
confidence: 99%
“…First of all, outlook attention encodes spatial information by measuring the similarity between pairs of token representations, which is more parameter-efficient for feature learning than convolutions, as studied in previous work [37,45]. Second, outlook attention adopts a sliding window mechanism to locally encode token representations at fine level, and to some extent preserves the crucial positional information for vision tasks [25,56]. Third, the way of generating attention weights is simple and efficient.…”
Section: Discussionmentioning
confidence: 99%
“…BAM and CBAM adopt convolutions to capture local relations, but fail to model long-range dependencies. To solve these problems, Hou et al [130] proposed coordinate attention, a novel attention mechanism which embeds positional information into channel attention, so that the network can focus on large important regions at little computational cost.…”
Section: Coordinate Attentionmentioning
confidence: 99%
“…[ [113][114][115][116] Channel & spatial attention Predict channel and spatial attention masks separately (e.g., [6,117]) or generate a joint 3-D channel, height, width attention mask directly (e.g., [118][119][120]) and use it to select important features. [6,10,13,14,50,101,[117][118][119][121][122][123][124][125][126][127][128][129][130] Spatial & temporal attention…”
Section: Introductionmentioning
confidence: 99%
“…It can extract important features by assigning weights to each channel but does not learn the importance of location information. Therefore, we embed the coordinate attention (CA) module [47], which can fully perceive position information, into CSAM. The CA module first aggregates features near key points in the image into a pair of key point direction-aware feature maps ( , ,1) (,, ) , 0 , 0…”
Section: Channel Attentionmentioning
confidence: 99%