2020
DOI: 10.1016/j.patrec.2020.02.031
|View full text |Cite
|
Sign up to set email alerts
|

Visual question answering with attention transfer and a cross-modal gating mechanism

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 14 publications
(3 citation statements)
references
References 5 publications
0
3
0
Order By: Relevance
“…The models are pretrained on large-scale multi-modal datasets with self-supervised objectives. Further finetuning them on specific tasks leads to new state-of-the-art records on several multi-modal challenges such as visual question answering [2,8,22], image-text retrieval [20,24], and visual commonsense reasoning [40]. Murahari et al [28] adapt the two-stream ViLBERT [25] to VisDial via a two-step finetuning and boost the evaluation metrics by a large margin.…”
Section: Visual Dialogmentioning
confidence: 99%
“…The models are pretrained on large-scale multi-modal datasets with self-supervised objectives. Further finetuning them on specific tasks leads to new state-of-the-art records on several multi-modal challenges such as visual question answering [2,8,22], image-text retrieval [20,24], and visual commonsense reasoning [40]. Murahari et al [28] adapt the two-stream ViLBERT [25] to VisDial via a two-step finetuning and boost the evaluation metrics by a large margin.…”
Section: Visual Dialogmentioning
confidence: 99%
“…Despite the impressive performance of AI algorithm in various fields, their safety and reliability is still a concern. Recent studies have achieved successful performance in areas like image [5] and text classification [18], object detection [10], segmentation [9], image captioning [20], visual question answer [8] and graph scene generation [19], and some tasks obtain near-perfect results. However, AI has not been fully deployed in sensitive fields like autonomous driving, medical diagnosing, or assistance for socially vulnerable groups.…”
Section: Introductionmentioning
confidence: 99%
“…Chen et al have improved the robustness of VQA approach by synthesizing the Counterfactual samples for training [3]. Li et al have employed the attention based mechanism through transfer learning alongwith a cross-modal gating approach to improve the VQA performance [15]. Huang et al [8] have utilized the graph based convolutional network to increase the encoding relational informatoin for VQA.…”
Section: Introductionmentioning
confidence: 99%