2022
DOI: 10.48550/arxiv.2201.03965
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

On the Efficacy of Co-Attention Transformer Layers in Visual Question Answering

Abstract: In recent years, multi-modal transformers have shown significant progress in Vision-Language tasks, such as Visual Question Answering (VQA), outperforming previous architectures by a considerable margin. This improvement in VQA is often attributed to the rich interactions between vision and language streams. In this work, we investigate the efficacy of co-attention transformer layers in helping the network focus on relevant regions while answering the question. We generate visual attention maps using the quest… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2022
2022
2022
2022

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 19 publications
(33 reference statements)
0
1
0
Order By: Relevance
“…Objective O12: Apply Attention Mechanism to the QA system Apply Attention Mechanism (Section 2.5) to the QA system in order to allow the decoder of the Seq2Seq model to pay attention to one part of the input sequence (while giving less attention to others) at different decoding steps, thus guiding the process of reasoning similar to [239], but in an RL setting. This will help achieve goal G2.…”
Section: Path Planning Systemsmentioning
confidence: 99%
“…Objective O12: Apply Attention Mechanism to the QA system Apply Attention Mechanism (Section 2.5) to the QA system in order to allow the decoder of the Seq2Seq model to pay attention to one part of the input sequence (while giving less attention to others) at different decoding steps, thus guiding the process of reasoning similar to [239], but in an RL setting. This will help achieve goal G2.…”
Section: Path Planning Systemsmentioning
confidence: 99%