All the attention you need: Global-local, spatial-channel attention for image retrieval

Song, Chull Hwan; Han, Hye Joo; Avrithis, Yannis

doi:10.1109/wacv51458.2022.00051

Cited by 26 publications

(8 citation statements)

References 32 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The typical deep network frameworks include LeNet [47], AlexNet [48], VGGNet [49], ResNet [50], Graph Convolutional Network (GCN) [51], and the recently emerging feature learning network Vision Transformer (VIT) [52]. What is more, some researchers utilize attention mechanisms to enhance network feature extraction [43], [53], [54]. There are researchers who combine global and local features to improve feature learning [43], [55].…”

Section: Feature Extraction Methodsmentioning

confidence: 99%

Ranking-Based Deep Hashing Network for Image Retrieval

Zhang

Xie

et al. 2022

IEEE Access

View full text Add to dashboard Cite

In large-scale image retrieval, the deep learning-based hashing methods have significantly progressed. However, most of the existing deep hashing methods still have the problems of low feature learning efficiency and weak ranking relationship discrimination. To remedy these problems, a novel Ranking-based Deep Hashing Network (RDHN) is proposed for image retrieval in this paper, which integrates the feature learning module and hash learning module into a unified deep hashing network framework to jointly learn a powerful hash function so that the raw images can be mapped to discrete hash codes with significant discrimination. Specifically, a novel difference convolution is designed based on edge detection operators, and then it is uniquely applied to the first convolutional layer of the convolutional neural network (CNN), which can take advantage of the sensitive characteristics of edge detection operators for edge information to extract richer image edge information. Meanwhile, in hash learning, the ranking metric Mean Average Precision (MAP) is optimized using the idea of scaling, and then a ranking loss function based on MAP is carefully designed to enhance the neighborhood ranking capabilities of the hash codes. Furthermore, to reduce the quantification error, a quantization loss function is also designed. Finally, the ranking loss function is combined with the quantization loss function to form the objective function. The proposed method can generate high-quality discrete hash codes while learning to preserve ranking information, effectively improving retrieval performance. Extensive experimental results on three widely used benchmark datasets demonstrate that the proposed method outperforms the state-of-the-art hashing approaches.

show abstract

Section: Feature Extraction Methodsmentioning

confidence: 99%

Ranking-Based Deep Hashing Network for Image Retrieval

Zhang

Xie

et al. 2022

IEEE Access

View full text Add to dashboard Cite

show abstract

“…The attention mechanism simulates the human behavior of paying attention to a few more important words in the process of reading [21], [22]. In the research of computer vision, in order to make full use of limited resources and focus on specific relevant feature information, the attention mechanism is realized through dynamic adaptive weighting of feature information [23]. SEnet [24] automatically strengthens channel information of features through learning, and uses the obtained importance to enhance features and suppress features that are not important to the current task.…”

Section: B Attention Mechanismmentioning

confidence: 99%

LMA-Net: Lightweight Multiple Attention Network for Multi-Source Heterogeneous Pulmonary CXR Segmentation

Mamut,

Meng,

Pei

et al. 2024

IEEE Access

View full text Add to dashboard Cite

The automatic pulmonary segmentation for chest X-ray(CXR) plays an important role in assisting diagnosis. Many deep learning methods have the problems of high computational complexity and low segmentation accuracy, which hinder the application to clinical workstations. Therefore, this paper proposes a lightweight multiple attention network(LMA-Net), which improved U-Net by using the progressive dilated convolution(PDC) for lightweight. A reinforced channel attention(RCA) and a multiscale attention(MSA) are embedded in the decoder to further improve the network segmentation performance. We fuse four types of pulmonary disease CXR from the COVID-QE-Ex dataset to generate a multi-source heterogeneous dataset. Effectiveness of LMA-Net is shown by achieving Intersection over Union(IoU ) of 96.28%, Dice of 96.95%, Average symmetric surface distance(ASSD) of 13.11mm and Hausdorff Distance 95th percentile(HD95) of 81.12mm, respectively. It can be seen that lightweight of LMA-Net is achieved according to parameter(Param) of 2.89M and floating-point operations(FLOPs) of 2.64G. This method can effectively improve segmentation performance and speed.

show abstract

“…In [12] the authors show that optimal performance in object classification and detection tasks was achieved by combining classical CNNs with transformers. An alternative approach employs a CNN-based encoder and transformer-based decoder [21] [38] [47], featuring multi-scale feature fusion and a blend of attention mechanisms at different spatial resolution scales [4] [18] [21] [47]. Transformers and attention mechanisms are used separately in various processes and at the deepest spatial resolution levels [4] [18] [19].…”

Section: Related Workmentioning

confidence: 99%

“…Transformers and attention mechanisms are used separately in various processes and at the deepest spatial resolution levels [4] [18] [19]. However, in [21], the authors highlight the significant performance boost achieved by considering interactions between spatial Open Journal of Applied Sciences and channel features, which is overlooked in certain architectures. We propose a model that combines the benefits of pure transformers and hybrid architectures, featuring a CNN-based encoder and a transformer-based decoder within the MACU-Net framework.…”

Section: Related Workmentioning

confidence: 99%

“…The SCGLU-Net differs by introducing Propagated attention to enhance relevant descriptors from the encoder during multi-scale fusion. The SCGL block influenced by [21], introduces a unique attention layer that combines channel and spatial attention simultaneously, addressing local and global semantic context. The SCGL block shown in Figure 1(b), considers interactions between spatial and channel descriptors.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

New Fusion Approach of Spatial and Channel Attention for Semantic Segmentation of Very High Spatial Resolution Remote Sensing Images

Atiampo,

Diédié

2024

OJAppS

View full text Add to dashboard Cite

The semantic segmentation of very high spatial resolution remote sensing images is difficult due to the complexity of interpreting the interactions between the objects in the scene. Indeed, effective segmentation requires considering spatial local context and long-term dependencies. To address this problem, the proposed approach is inspired by the MAC-UNet network which is an extension of U-Net, densely connected combined with channel attention. The advantages of this solution are as follows: 1) The new model introduces a new attention called propagate attention to build an attention-based encoder. 2) The fusion of multi-scale information is achieved by a weighted linear combination of the attentions whose coefficients are learned during the training phase. 3) Introducing in the decoder, the Spatial-Channel-Global-Local block which is an attention layer that uniquely combines channel attention and spatial attention locally and globally. The performances of the model are evaluated on 2 datasets WHDLD and DLRSD and show results of mean intersection over union (mIoU) index in progress between 1.54% and 10.47% for DLRSD and between 1.04% and 4.37% for WHDLD compared with the most efficient algorithms with attention mechanisms like MAU-Net and transformers like TMNet.

show abstract

All the attention you need: Global-local, spatial-channel attention for image retrieval

Cited by 26 publications

References 32 publications

Ranking-Based Deep Hashing Network for Image Retrieval

Ranking-Based Deep Hashing Network for Image Retrieval

LMA-Net: Lightweight Multiple Attention Network for Multi-Source Heterogeneous Pulmonary CXR Segmentation

New Fusion Approach of Spatial and Channel Attention for Semantic Segmentation of Very High Spatial Resolution Remote Sensing Images

Contact Info

Product

Resources

About