2021
DOI: 10.1016/j.patcog.2021.107956
|View full text |Cite
|
Sign up to set email alerts
|

Dual self-attention with co-attention networks for visual question answering

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
12
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
2
1

Relationship

0
10

Authors

Journals

citations
Cited by 50 publications
(12 citation statements)
references
References 8 publications
0
12
0
Order By: Relevance
“…On the other hand, research has demonstrated that attentional mechanisms are not entirely reliable and can even have counterproductive effects [39]. When making inferences, neural networks tend to incorporate target‐related contextual information as an integral part of the target itself.…”
Section: Analysis and Discussionmentioning
confidence: 99%
“…On the other hand, research has demonstrated that attentional mechanisms are not entirely reliable and can even have counterproductive effects [39]. When making inferences, neural networks tend to incorporate target‐related contextual information as an integral part of the target itself.…”
Section: Analysis and Discussionmentioning
confidence: 99%
“…[PA23] offer an extensive review of efficient vision transformers. Through the advancement of effective token mixing strategies and efficient MLP layers, vision transformers can be significantly accelerated [LWZ*22, GHW*22, YPL*22]. For example, both CMT [GHW*22] and WaveViT [YPL*22] outperform EfficientNet [TL19] while maintaining a lower computational complexity.…”
Section: Limitations and Future Workmentioning
confidence: 99%
“…In recent years, convolutional neural networks (CNNs) have been used to extract image features. In order to obtain representative and targeted image features, attention mechanisms are used to highlight important image regions related to the corresponding issues [5,6]. To obtain more accurate image feature maps, a stacked attention network (SAN) is proposed in which the output of the first attention is used as the query for the second attention [7].…”
Section: Related Workmentioning
confidence: 99%