2021
DOI: 10.1007/978-981-16-1092-9_7
|View full text |Cite
|
Sign up to set email alerts
|

Visual Question Answering Using Deep Learning: A Survey and Performance Analysis

Abstract: The Visual Question Answering (VQA) task combines challenges for processing data with both Visual and Linguistic processing, to answer basic 'common sense' questions about given images. Given an image and a question in natural language, the VQA system tries to find the correct answer to it using visual elements of the image and inference gathered from textual questions. In this survey, we cover and discuss the recent datasets released in the VQA domain dealing with various types of question-formats and robustn… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
13
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
6
2

Relationship

0
8

Authors

Journals

citations
Cited by 20 publications
(16 citation statements)
references
References 35 publications
0
13
0
Order By: Relevance
“…NLP must be assisted by multimodal control interfaces, identification and understanding of human behavior, and collaborative decision-making between the system and individuals or groups to understand the requirements of the customer and other stakeholders [77]. Visual question answering is a method that addresses the challenging unimodal aspect of NLP systems [78]. Many other methods are used to integrate multimodality into NLP structures, including declarative learning-based programming [79], multimodal datasets [80], procedural reasoning networks [81], and unified attention networks [82].…”
Section: Current Limitations Of Nlp In Requirements Elicitation and Requirements Analysismentioning
confidence: 99%
“…NLP must be assisted by multimodal control interfaces, identification and understanding of human behavior, and collaborative decision-making between the system and individuals or groups to understand the requirements of the customer and other stakeholders [77]. Visual question answering is a method that addresses the challenging unimodal aspect of NLP systems [78]. Many other methods are used to integrate multimodality into NLP structures, including declarative learning-based programming [79], multimodal datasets [80], procedural reasoning networks [81], and unified attention networks [82].…”
Section: Current Limitations Of Nlp In Requirements Elicitation and Requirements Analysismentioning
confidence: 99%
“…VQA Datasets. Many large-scale VQA datasets have been proposed over the past six years [19,33,34,37]. A key challenge the community has faced in developing such datasets is the language bias problem [12,23,27,30].…”
Section: Related Workmentioning
confidence: 99%
“…This pioneering work was immediately followed by a vigorous worldwide effort aimed at building new datasets and models (Antol et al., 2015; Gao et al., 2015; Geman et al., 2015; Goyal et al., 2016, 2017; Malinowski et al., 2015; M. Ren, Kiros et al., 2015; Yu et al., 2015). This effort has been exhaustively summarized in various surveys (Kafle & Kanan, 2017b; Manmadhan & Kovoor, 2020; Srivastava et al., 2021; Wu et al., 2017), as well as tutorials (Kordjamshidi et al., 2020; Teney et al., 2017). 1 In particular, Srivastava et al.…”
Section: The Recent Revival Of Vqamentioning
confidence: 99%
“…1 In particular, Srivastava et al. (2021) nicely sketch the timeline of the major breakthroughs in VQA in the last five years, whilst Wu et al. (2017) provide interesting connections with structured knowledge base and an in‐depth description of the question/answer pairs present in VQA datasets.…”
Section: The Recent Revival Of Vqamentioning
confidence: 99%
See 1 more Smart Citation