2020 IEEE Winter Conference on Applications of Computer Vision (WACV) 2020
DOI: 10.1109/wacv45572.2020.9093293
|View full text |Cite
|
Sign up to set email alerts
|

Deep Bayesian Network for Visual Question Generation

Abstract: Generating natural questions from an image is a semantic task that requires using vision and language modalities to learn multimodal representations. Images can have multiple visual and language cues such as places, captions, and tags. In this paper, we propose a principled deep Bayesian learning framework that combines these cues to produce natural questions. We observe that with the addition of more cues and by minimizing uncertainty in the among cues, the Bayesian network becomes more confident. We propose … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
8
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
3
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 10 publications
(8 citation statements)
references
References 36 publications
0
8
0
Order By: Relevance
“…We compare our models with four recently proposed VQG models Information Maximising VQG (IMVQG) (Krishna, Bernstein, and Fei-Fei 2019), What BERT Sees (WBS) (Scialom et al 2020), Deep Bayesian Network (DBN) (Patro et al 2020), and Category Consistent Cyclic VQG (C3VQG) (Uppal et al 2020). Out of these four papers, IMVQG's training and evaluation setup is the most similar to ours.…”
Section: Comparative Approachesmentioning
confidence: 99%
See 1 more Smart Citation
“…We compare our models with four recently proposed VQG models Information Maximising VQG (IMVQG) (Krishna, Bernstein, and Fei-Fei 2019), What BERT Sees (WBS) (Scialom et al 2020), Deep Bayesian Network (DBN) (Patro et al 2020), and Category Consistent Cyclic VQG (C3VQG) (Uppal et al 2020). Out of these four papers, IMVQG's training and evaluation setup is the most similar to ours.…”
Section: Comparative Approachesmentioning
confidence: 99%
“…Jain, Zhang, and Schwing (2017) proposed a model using a VAE instead of a GAN, however their improved results require the use of a target answer during inference. To overcome this requirement, Krishna, Bernstein, and Fei-Fei (2019) Other work, such as Patro et al (2018), Patro et al (2020) and Uppal et al (2020), either do not include BLEU scores higher than BLEU-1, which is not very informative, or address variants of the VQG task. In the latter case the models fail to beat previous SoTA on BLEU-4 for standard VQG.…”
Section: Introductionmentioning
confidence: 99%
“…In general, Bayesian approach considerably outperforms the quantitative metrics in state-of-the-art benchmarks. There has been some work on exploring Bayesian and latent variable methods for Visual Question Generation (Patro et al, 2020;Krishna et al, 2019). However, in our work, we frame VQA under the variational inference framework where we approximate both the variational and generative distribution during training.…”
Section: Related Workmentioning
confidence: 99%
“…In contrast to answering visual questions about images, generating questions has received little attention so far. A few recent works have attempted to generate questions from images in the open domain [24][25][26]. However, the task of VQG in the medical domain has not been well-studied.…”
Section: Introductionmentioning
confidence: 99%