Zhiwu Xia scite author profile

The task of image multi-label classification is to accurately recognize multiple objects in an input image. Most of the recent works need to leverage the label co-occurrence matrix counted from training data to construct the graph structure, which are inflexible and may degrade model generalizability. In addition, these methods fail to capture the semantic correlation between the channel feature maps to further improve the model performance. To address these issues, we propose a D ouble A ttention framework based on G raph A ttention ne T work (DA-GAT) to effectively learn the correlation between labels from training data. Firstly, we devise a new channel attention mechanism to enhance the semantic correlation between channel feature maps, so as to implicitly capture the correlation between labels. Secondly, we propose a new label attention mechanism to avoid the adverse impact of manually constructed label co-occurrence matrix. It only needs to leverage the label embedding as the input of network, and then automatically constructs the label relation matrix to explicitly establish the correlation between labels. Finally, we effectively fuse the output of these two attention mechanisms to further improve the model performance. Extensive experiments are conducted on three public multi-label classification benchmarks. Our DA-GAT model achieves the mAPs of 87.1%, 96.6%, and 64.3% on MS-COCO 2014, PASCAL VOC 2007, and NUS-WIDE respectively, and obviously outperforms other existing state-of-the-art methods. In addition, visual analysis experiments demonstrate that each attention mechanism can capture the correlation between labels well, and significantly promote the model performance.

show abstract

Aligning Image Semantics and Label Concepts for Image Multi-Label Classification

Zhou

Xia

Dou

et al. 2023

ACM Trans. Multimedia Comput. Commun. Appl.

View full text Add to dashboard Cite

Image multi-label classification task is mainly to correctly predict multiple object categories in the images. To capture the correlation between labels, graph convolution network based methods have to manually count the label co-occurrence probability from training data to construct a pre-defined graph as the input of graph network, which is inflexible and may degrade model generalizability. Moreover, most of the current methods cannot effectively align the learned salient object features with the label concepts, so that the predicted results of model may not be consistent with the image content. Therefore, how to learn the salient semantic features of images and capture the correlation between labels, and then effectively align them is one of the key to improve the performance of image multi-label classification task. To this end, we propose a novel image multi-label classification framework which aims to align I mage S emantics with L abel C oncepts ( ISLC ). Specifically, we propose a residual encoder to learn salient object features in the images, and exploit the self-attention layer in aligned decoder to automatically capture the correlation between labels. Then, we leverage the cross-attention layers in aligned decoder to align image semantic features with label concepts, so as to make the labels predicted by model more consistent with image content. Finally, the output features of the last layer of residual encoder and aligned decoder are fused to obtain the final output feature for classification. The proposed ISLC model achieves good performance on various prevalent multi-label image datasets such as MS-COCO 2014, PASCAL VOC 2007, VG-500 and NUS-WIDE with 87.2%, 96.9%, 39.4% and 64.2% respectively.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zhiwu Xia

Double Attention Based on Graph Attention Network for Image Multi-Label Classification

Aligning Image Semantics and Label Concepts for Image Multi-Label Classification

Contact Info

Product

Resources

About