2020
DOI: 10.1007/s10462-020-09832-7
|View full text |Cite
|
Sign up to set email alerts
|

Visual question answering: a state-of-the-art review

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
11
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
3
1

Relationship

0
8

Authors

Journals

citations
Cited by 36 publications
(11 citation statements)
references
References 77 publications
0
11
0
Order By: Relevance
“…In deep learning, a question answering system is represented by a more or less complex architecture of neural networks taking at the input a textual question in natu-ral language and a text document [23] or an image [21] [22]. The output should match the correct answer if it exists in the document or image provided.…”
Section: Question Answering Based On Deep Learningmentioning
confidence: 99%
“…In deep learning, a question answering system is represented by a more or less complex architecture of neural networks taking at the input a textual question in natu-ral language and a text document [23] or an image [21] [22]. The output should match the correct answer if it exists in the document or image provided.…”
Section: Question Answering Based On Deep Learningmentioning
confidence: 99%
“…All of them address the problem of reasoning about abstract concepts present in the image or identifying hidden rules (also referred to as patterns) that govern visual entities, although in various settings. First of all, such tasks often emerge in the field of Visual Question Answering (VQA) [22][23][24][25][26], where the goal is to answer a question written in natural language referring to an associated image. In VQA, the information present in the image is sufficient for answering the related questions, whereas the Visual Commonsense Reasoning (VCR) field [27][28][29][30][31] takes it a step further and places the tasks in real-world settings, where external knowledge is often required to solve them.…”
Section: Scopementioning
confidence: 99%
“…With the increased number of available AVR benchmarks, more and more methods are proposed to tackle them. Even though in many cases the methods operate on similar inputs and outputs, most of the time they are evaluated only on a single chosen task, without considering the Visual Reasoning (VR) Abstract Visual Reasoning (AVR) ( [17][18][19][20][21] and this work) Visual Question Answering (VQA) [22][23][24][25][26] Visual Commonsense Reasoning (VCR) [27][28][29][30][31] Physical Reasoning (PR) [32][33][34][35][36] Fig. 2.…”
Section: Introductionmentioning
confidence: 99%
“…This pioneering work was immediately followed by a vigorous worldwide effort aimed at building new datasets and models (Antol et al., 2015; Gao et al., 2015; Geman et al., 2015; Goyal et al., 2016, 2017; Malinowski et al., 2015; M. Ren, Kiros et al., 2015; Yu et al., 2015). This effort has been exhaustively summarized in various surveys (Kafle & Kanan, 2017b; Manmadhan & Kovoor, 2020; Srivastava et al., 2021; Wu et al., 2017), as well as tutorials (Kordjamshidi et al., 2020; Teney et al., 2017). 1 In particular, Srivastava et al.…”
Section: The Recent Revival Of Vqamentioning
confidence: 99%
“…Ren, Kiros et al, 2015;Yu et al, 2015). This effort has been exhaustively summarized in various surveys (Kafle & Kanan, 2017b;Manmadhan & Kovoor, 2020;Srivastava et al, 2021;Wu et al, 2017), as well as tutorials (Kordjamshidi et al, 2020;Teney et al, 2017). 1 In particular, Srivastava et al ( 2021) nicely sketch the timeline of the major breakthroughs in VQA in the last five years, whilst Wu et al (2017) provide interesting connections with structured knowledge base and an in-depth description of the question/answer pairs present in VQA datasets.…”
Section: The Recent Revival Of Vqamentioning
confidence: 99%