2020
DOI: 10.1609/aaai.v34i07.6964
|View full text |Cite
|
Sign up to set email alerts
|

Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification

Abstract: Multi-label image and video classification are fundamental yet challenging tasks in computer vision. The main challenges lie in capturing spatial or temporal dependencies between labels and discovering the locations of discriminative features for each class. In order to overcome these challenges, we propose to use cross-modality attention with semantic graph embedding for multi-label classification. Based on the constructed label graph, we propose an adjacency-based similarity graph embedding method to learn s… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
84
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 144 publications
(86 citation statements)
references
References 23 publications
0
84
0
Order By: Relevance
“…Chen et al [20] used a GCN to build a graph of labels to represent a set of mutually dependent object classifiers. You et al [21] followed in this manner, but used a cross-modality attention mechanism instead. Additionally, there are some other works regarding multi-label classification.…”
Section: Label Semantic Relationshipsmentioning
confidence: 99%
“…Chen et al [20] used a GCN to build a graph of labels to represent a set of mutually dependent object classifiers. You et al [21] followed in this manner, but used a cross-modality attention mechanism instead. Additionally, there are some other works regarding multi-label classification.…”
Section: Label Semantic Relationshipsmentioning
confidence: 99%
“…Furthermore, Chen et al [28] proposed a semantic decoupling method which extracts label specific features and their dependence at the same time. Lastly, You et al [29] proposed an adjacency-based method to extract dependence associated to a cross-modality attention mechanism. Despite their wide success, these methods are specifically designed for computer vision tasks, thus we do not consider them further in this work.…”
Section: Multi-label Classificationmentioning
confidence: 99%
“…References [39][40][41] used the common-sense or structured prior knowledge to improve the performance of deep models. Other works, like References [42][43][44], tried to use graph embedding to learn some prior knowledge or relationships between label.…”
Section: Remote Sensing With Gnnmentioning
confidence: 99%