Deep Bayesian Network for Visual Question Generation

Patro, Badri N.; Kurmi, Vinod K; Kumar, Sandeep; Namboodiri, Vinay P.

doi:10.1109/wacv45572.2020.9093293

Cited by 10 publications

(8 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We compare our models with four recently proposed VQG models Information Maximising VQG (IMVQG) (Krishna, Bernstein, and Fei-Fei 2019), What BERT Sees (WBS) (Scialom et al 2020), Deep Bayesian Network (DBN) (Patro et al 2020), and Category Consistent Cyclic VQG (C3VQG) (Uppal et al 2020). Out of these four papers, IMVQG's training and evaluation setup is the most similar to ours.…”

Section: Comparative Approachesmentioning

confidence: 99%

“…Jain, Zhang, and Schwing (2017) proposed a model using a VAE instead of a GAN, however their improved results require the use of a target answer during inference. To overcome this requirement, Krishna, Bernstein, and Fei-Fei (2019) Other work, such as Patro et al (2018), Patro et al (2020) and Uppal et al (2020), either do not include BLEU scores higher than BLEU-1, which is not very informative, or address variants of the VQG task. In the latter case the models fail to beat previous SoTA on BLEU-4 for standard VQG.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Guiding Visual Question Generation

Vedd¹,

Wang²,

Rei³

et al. 2021

Preprint

View full text Add to dashboard Cite

In traditional Visual Question Generation (VQG), most images have multiple concepts (e.g. objects and categories) for which a question could be generated, but models are trained to mimic an arbitrary choice of concept as given in their training data. This makes training difficult and also poses issues for evaluation -multiple valid questions exist for most images but only one or a few are captured by the human references. We present Guiding Visual Question Generation -a variant of VQG which conditions the question generator on categorical information based on expectations on the type of question and the objects it should explore. We propose two variants: (i) an explicitly guided model that enables an actor (human or automated) to select which objects and categories to generate a question for; and (ii) an implicitly guided model that learns which objects and categories to condition on, based on discrete latent variables. The proposed models are evaluated on an answer-category augmented VQA dataset and our quantitative results show a substantial improvement over the current state of the art (over 9 BLEU-4 increase). Human evaluation validates that guidance helps the generation of questions that are grammatically coherent and relevant to the given image and objects.

show abstract

Section: Comparative Approachesmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Guiding Visual Question Generation

Vedd¹,

Wang²,

Rei³

et al. 2021

Preprint

View full text Add to dashboard Cite

show abstract

“…In general, Bayesian approach considerably outperforms the quantitative metrics in state-of-the-art benchmarks. There has been some work on exploring Bayesian and latent variable methods for Visual Question Generation (Patro et al, 2020;Krishna et al, 2019). However, in our work, we frame VQA under the variational inference framework where we approximate both the variational and generative distribution during training.…”

Section: Related Workmentioning

confidence: 99%

Latent Variable Models for Visual Question Answering

Wang

Miao

Specia

2021

Preprint

View full text Add to dashboard Cite

Conventional models for Visual Question Answering (VQA) explore deterministic approaches with various types of image features, question features, and attention mechanisms. However, there exist other modalities that can be explored in addition to image and question pairs to bring extra information to the models. In this work, we propose latent variable models for VQA where extra information (e.g. captions and answer categories) are incorporated as latent variables to improve inference, which in turn benefits questionanswering performance. Experiments on the VQA v2.0 benchmarking dataset demonstrate the effectiveness of our proposed models in that they improve over strong baselines, especially those that do not rely on extensive language-vision pre-training.

show abstract

“…In contrast to answering visual questions about images, generating questions has received little attention so far. A few recent works have attempted to generate questions from images in the open domain [24][25][26]. However, the task of VQG in the medical domain has not been well-studied.…”

Section: Introductionmentioning

confidence: 99%

Goal-Driven Visual Question Generation from Radiology Images

2021

View full text Add to dashboard Cite

Visual Question Generation (VQG) from images is a rising research topic in both fields of natural language processing and computer vision. Although there are some recent efforts towards generating questions from images in the open domain, the VQG task in the medical domain has not been well-studied so far due to the lack of labeled data. In this paper, we introduce a goal-driven VQG approach for radiology images called VQGRaD that generates questions targeting specific image aspects such as modality and abnormality. In particular, we study generating natural language questions based on the visual content of the image and on additional information such as the image caption and the question category. VQGRaD encodes the dense vectors of different inputs into two latent spaces, which allows generating, for a specific question category, relevant questions about the images, with or without their captions. We also explore the impact of domain knowledge incorporation (e.g., medical entities and semantic types) and data augmentation techniques on visual question generation in the medical domain. Experiments performed on the VQA-RAD dataset of clinical visual questions showed that VQGRaD achieves 61.86% BLEU score and outperforms strong baselines. We also performed a blinded human evaluation of the grammaticality, fluency, and relevance of the generated questions. The human evaluation demonstrated the better quality of VQGRaD outputs and showed that incorporating medical entities improves the quality of the generated questions. Using the test data and evaluation process of the ImageCLEF 2020 VQA-Med challenge, we found that relying on the proposed data augmentation technique to generate new training samples by applying different kinds of transformations, can mitigate the lack of data, avoid overfitting, and bring a substantial improvement in medical VQG.

show abstract

Deep Bayesian Network for Visual Question Generation

Cited by 10 publications

References 36 publications

Guiding Visual Question Generation

Guiding Visual Question Generation

Latent Variable Models for Visual Question Answering

Goal-Driven Visual Question Generation from Radiology Images

Contact Info

Product

Resources

About