2022
DOI: 10.1109/tcyb.2020.3029423
|View full text |Cite
|
Sign up to set email alerts
|

ALSA: Adversarial Learning of Supervised Attentions for Visual Question Answering

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
8
1

Relationship

0
9

Authors

Journals

citations
Cited by 22 publications
(6 citation statements)
references
References 56 publications
0
6
0
Order By: Relevance
“…Further, H-CFIM gives lower accuracy compared to V-CFIM as it blends information from different attention paths but avoids the clash between the top-down and bottom-up paths. In Table 5, we have compared the performance of the proposed VQA method with 22 state-of-the-art methods, for example, Re-attention, 23 ALSA, 50 IASSM, 51 MRA-Net 35 and CAM 52 on both test-dev and test-std sets.…”
Section: Based On the Combination Tcam And Cfimmentioning
confidence: 99%
See 1 more Smart Citation
“…Further, H-CFIM gives lower accuracy compared to V-CFIM as it blends information from different attention paths but avoids the clash between the top-down and bottom-up paths. In Table 5, we have compared the performance of the proposed VQA method with 22 state-of-the-art methods, for example, Re-attention, 23 ALSA, 50 IASSM, 51 MRA-Net 35 and CAM 52 on both test-dev and test-std sets.…”
Section: Based On the Combination Tcam And Cfimmentioning
confidence: 99%
“…In Table 5, we have compared the performance of the proposed VQA method with 22 state‐of‐the‐art methods, for example, Re‐attention, 23 ALSA, 50 IASSM, 51 MRA‐Net 35 and CAM 52 on both test‐dev and test‐std sets. Step‐by‐step reasoning is used by CAM to generate the compound objects.…”
Section: Performance Analysismentioning
confidence: 99%
“…40 based on visual relationship detection, where image features and question vector are used to generate the output. Liu et al 41 presented a supervised attention-based VQA model and designed two attention modules: free-form and detection based, to use the past information for attention learning. Li et al 42 proposed a relation-aware graph attention network to encode an image using a graph.…”
Section: Visual Question Answeringmentioning
confidence: 99%
“… 13 , 14 , 15 However, because of the small size of the structure dataset and the lack of detailed knowledge concerning protein–ligand interactions, most of the existing methods are not yet able to effectively learn the attention distribution and accurately capture the true interaction information between proteins and ligands, limiting the predictive performance. 16 Several studies in the fields of visual question answering 17 , 18 and natural language processing 19 , 20 have demonstrated that training attention mechanism in a supervised manner can result in more effective attention distribution and improve model performance significantly, but its effectiveness in building a better protein–ligand binding affinity prediction model remains unclear.…”
Section: Introductionmentioning
confidence: 99%