FcaNet: Frequency Channel Attention Networks

Qin, Zequn; Zhang, Pengyi; Wu, Fei; Li, Xi

doi:10.1109/iccv48922.2021.00082

Cited by 604 publications

(270 citation statements)

References 25 publications

Supporting

Mentioning

192

Contrasting

Order By: Relevance

“…Only using global average pooling in the squeeze module limits representational ability. To obtain a more powerful representation ability, Qin et al [57] rethought global information captured from the viewpoint of compression and analysed global average pooling in the frequency domain. They proved that global average pooling is a special case of the discrete cosine transform (DCT) and used this observation to propose a novel multi-spectral channel attention.…”

Section: Fcanetmentioning

confidence: 99%

“…Channel attention Generate attention mask across the channel domain and use it to select important channels. [5,25,37,[53][54][55][56][57][58][59][60] Spatial attention Generate attention mask across spatial domains and use it to select important spatial regions (e.g., [15,61]) or predict the most relevant spatial position directly (e.g., [7,31]). [8, 9, 15, 20-22, 26, 27, 31, 32, 34, 35, 41-47, 61-109] Temporal attention Generate attention mask in time and use it to select key frames.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Attention mechanisms in computer vision: A survey

et al. 2022

View full text Add to dashboard Cite

Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great success in many visual tasks, including image classification, object detection, semantic segmentation, video understanding, image generation, 3D vision, multimodal tasks, and self-supervised learning. In this survey, we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention, and branch attention; a related repository https://github.com/MenghaoGuo/Awesome-Vision-Attentions is dedicated to collecting related work. We also suggest future directions for attention mechanism research.

show abstract

Section: Fcanetmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Attention mechanisms in computer vision: A survey

et al. 2022

View full text Add to dashboard Cite

show abstract

“…Dai et al [34] proposed a second-order attention network to explore the feature correlations of intermediate layers for image super-resolution. Qin et al [35] proposed a novel multispectral channel attention involving the pre-processing of a channel attention mechanism in the frequency domain.…”

Section: Attention Mechanismsmentioning

confidence: 99%

LSTNet: A Reference-Based Learning Spectral Transformer Network for Spectral Super-Resolution

Yuan

Jiang

et al. 2022

Sensors

View full text Add to dashboard Cite

Hyperspectral images (HSIs) are data cubes containing rich spectral information, making them beneficial to many Earth observation missions. However, due to the limitations of the associated imaging systems and their sensors, such as the swath width and revisit period, hyperspectral imagery over a large coverage area cannot be acquired in a short amount of time. Spectral super-resolution (SSR) is a method that involves learning the relationship between a multispectral image (MSI) and an HSI, based on the overlap region, followed by reconstruction of the HSI by making full use of the large swath width of the MSI, thereby improving its coverage. Much research has been conducted recently to address this issue, but most existing methods mainly learn the prior spectral information from training data, lacking constraints on the resulting spectral fidelity. To address this problem, a novel learning spectral transformer network (LSTNet) is proposed in this paper, utilizing a reference-based learning strategy to transfer the spectral structure knowledge of a reference HSI to create a reasonable reconstruction spectrum. More specifically, a spectral transformer module (STM) and a spectral reconstruction module (SRM) are designed, in order to exploit the prior and reference spectral information. Experimental results demonstrate that the proposed method has the ability to produce high-fidelity reconstructed spectra.

show abstract

“…Further, AANets [3] and SCNet [22] demonstrate how self-attention and self-calibration operations can augment standard convolutions, while GCNet [6] extends non-local neural networks to augment SE operations. More recently, prominent modules include ECA [35] which proposes one-dimensional convolutions to efficiently capture inter-channel interactions for channel attention and FCA [27] which proposes utilization of discrete cosine transform based frequency compression methods to effectively perform feature aggregation in SE in place of GAP.…”

Section: Related Work 21 Attention Modules For Cnnsmentioning

confidence: 99%

“…As a prominent method, Squeeze & Excitation (SE) [16] introduces channel attention modelling of global-average-pooled (GAP) feature representations, which is then enhanced by CBAM [37] through additional incorporation of spatial attention and utilization of both global-max-pooled (GMP) and GAP representations. Further, recent works [12,27,35] identify how channel attention can be made more efficient and effective, while a different direction of work augments convolutional operations with self-attention and calibration methods [3,22] to learn more effective feature representations.…”

Section: Introductionmentioning

confidence: 99%

TDAM: Top-Down Attention Module for Contextually Guided Feature Selection in CNNs

Jaiswal¹,

Fernando²,

Tan³

2021

Preprint

View full text Add to dashboard Cite

Attention modules for Convolutional Neural Networks (CNNs) are an effective method to enhance performance of networks on multiple computer-vision tasks. While many works focus on building more effective modules through appropriate modelling of channel-, spatial-and self-attention, they primarily operate in a feedfoward manner. Consequently, the attention mechanism strongly depends on the representational capacity of a single input feature activation, and can benefit from incorporation of semanticallyricher higher-level activations that can specify "what and where to look" through top-down information flow. Such feedback connections are also prevalent in the primate visual cortex and recognized by neuroscientists as a key component in primate visual attention.Accordingly, in this work, we propose a lightweight topdown (TD) attention module that iteratively generates a "visual searchlight" to perform top-down channel and spatial modulation of its inputs and consequently outputs more selective feature activations at each computation step. Our experiments indicate that integrating TD in CNNs enhances their performance on ImageNet-1k classification and outperforms prominent attention modules while being more parameter and memory efficient. Further, our models are more robust to changes in input resolution during inference and learn to "shift attention" by localizing individual objects or features at each computation step without any explicit supervision. This capability results in 5% improvement for ResNet50 on weakly-supervised object localization besides improvements in fine-grained and multi-label classification.

show abstract

FcaNet: Frequency Channel Attention Networks

Cited by 604 publications

References 25 publications

Attention mechanisms in computer vision: A survey

Attention mechanisms in computer vision: A survey

LSTNet: A Reference-Based Learning Spectral Transformer Network for Spectral Super-Resolution

TDAM: Top-Down Attention Module for Contextually Guided Feature Selection in CNNs

Contact Info

Product

Resources

About