2021
DOI: 10.48550/arxiv.2101.06399
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Latent Variable Models for Visual Question Answering

Abstract: Conventional models for Visual Question Answering (VQA) explore deterministic approaches with various types of image features, question features, and attention mechanisms. However, there exist other modalities that can be explored in addition to image and question pairs to bring extra information to the models. In this work, we propose latent variable models for VQA where extra information (e.g. captions and answer categories) are incorporated as latent variables to improve inference, which in turn benefits qu… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2021
2021
2021
2021

Publication Types

Select...
2

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(1 citation statement)
references
References 27 publications
0
1
0
Order By: Relevance
“…For example, human attentions are used to enhance the explainability and visual grounding of VQA models (Patro and Namboodiri, 2018;Selvaraju et al, 2019;Das et al, 2017;. Image captions contain substantial visual information and can be used as an auxiliary task (i.e., visual captioning) to strengthen VQA models' visual and language understanding Kim and Bansal, 2019;Wang et al, 2021;Banerjee et al, 2020;Karpathy and Fei-Fei, 2015). Several papers leveraged scene graphs and visual relationships as auxiliary knowledge for VQA Hudson and Manning, 2019;Shi et al, 2019).…”
Section: Related Workmentioning
confidence: 99%
“…For example, human attentions are used to enhance the explainability and visual grounding of VQA models (Patro and Namboodiri, 2018;Selvaraju et al, 2019;Das et al, 2017;. Image captions contain substantial visual information and can be used as an auxiliary task (i.e., visual captioning) to strengthen VQA models' visual and language understanding Kim and Bansal, 2019;Wang et al, 2021;Banerjee et al, 2020;Karpathy and Fei-Fei, 2015). Several papers leveraged scene graphs and visual relationships as auxiliary knowledge for VQA Hudson and Manning, 2019;Shi et al, 2019).…”
Section: Related Workmentioning
confidence: 99%