2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2022
DOI: 10.1109/wacv51458.2022.00051
|View full text |Cite
|
Sign up to set email alerts
|

All the attention you need: Global-local, spatial-channel attention for image retrieval

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
2
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8

Relationship

0
8

Authors

Journals

citations
Cited by 26 publications
(8 citation statements)
references
References 32 publications
0
2
0
Order By: Relevance
“…The typical deep network frameworks include LeNet [47], AlexNet [48], VGGNet [49], ResNet [50], Graph Convolutional Network (GCN) [51], and the recently emerging feature learning network Vision Transformer (VIT) [52]. What is more, some researchers utilize attention mechanisms to enhance network feature extraction [43], [53], [54]. There are researchers who combine global and local features to improve feature learning [43], [55].…”
Section: Feature Extraction Methodsmentioning
confidence: 99%
“…The typical deep network frameworks include LeNet [47], AlexNet [48], VGGNet [49], ResNet [50], Graph Convolutional Network (GCN) [51], and the recently emerging feature learning network Vision Transformer (VIT) [52]. What is more, some researchers utilize attention mechanisms to enhance network feature extraction [43], [53], [54]. There are researchers who combine global and local features to improve feature learning [43], [55].…”
Section: Feature Extraction Methodsmentioning
confidence: 99%
“…The attention mechanism simulates the human behavior of paying attention to a few more important words in the process of reading [21], [22]. In the research of computer vision, in order to make full use of limited resources and focus on specific relevant feature information, the attention mechanism is realized through dynamic adaptive weighting of feature information [23]. SEnet [24] automatically strengthens channel information of features through learning, and uses the obtained importance to enhance features and suppress features that are not important to the current task.…”
Section: B Attention Mechanismmentioning
confidence: 99%
“…In [12] the authors show that optimal performance in object classification and detection tasks was achieved by combining classical CNNs with transformers. An alternative approach employs a CNN-based encoder and transformer-based decoder [21] [38] [47], featuring multi-scale feature fusion and a blend of attention mechanisms at different spatial resolution scales [4] [18] [21] [47]. Transformers and attention mechanisms are used separately in various processes and at the deepest spatial resolution levels [4] [18] [19].…”
Section: Related Workmentioning
confidence: 99%
“…Transformers and attention mechanisms are used separately in various processes and at the deepest spatial resolution levels [4] [18] [19]. However, in [21], the authors highlight the significant performance boost achieved by considering interactions between spatial Open Journal of Applied Sciences and channel features, which is overlooked in certain architectures. We propose a model that combines the benefits of pure transformers and hybrid architectures, featuring a CNN-based encoder and a transformer-based decoder within the MACU-Net framework.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation