2019
DOI: 10.1609/aaai.v33i01.33018076
|View full text |Cite
|
Sign up to set email alerts
|

TallyQA: Answering Complex Counting Questions

Abstract: Most counting questions in visual question answering (VQA) datasets are simple and require no more than object detection. Here, we study algorithms for complex counting questions that involve relationships between objects, attribute identification, reasoning, and more. To do this, we created TallyQA, the world's largest dataset for open-ended counting. We propose a new algorithm for counting that uses relation networks with region proposals. Our method lets relation networks be efficiently used with high-resol… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
46
0

Year Published

2019
2019
2024
2024

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 43 publications
(46 citation statements)
references
References 1 publication
0
46
0
Order By: Relevance
“…Synsets Figure 4 shows the counts of examples per synset in the training and development sets. Image Pair Reasoning We use a 200-sentence subset of the sentences analyzed in Table 5 (3) existential and (4) universal quantifiers; (5) coordination; (6) coreference; (7) spatial relations; (8) presupposition; (9) preposition attachment ambiguity VQA1.0 (Antol et al, 2015), VQA-CP (Agrawal et al, 2017), VQA2.0 (Goyal et al, 2017) Visual Question Answering Referring Expression Generation VQA (Abstract) (Zitnick and Parikh, 2013) Visual Question Answering ReferItGame (Kazemzadeh et al, 2014) Referring Expression Resolution SHAPES (Andreas et al, 2016) Visual Question Answering Bisk et al (2016) Instruction Following MSCOCO (Chen et al, 2016) Caption Generation Google RefExp (Mao et al, 2016) Referring Expression Resolution ROOM-TO-ROOM (Anderson et al, 2018) Instruction Following Visual Dialog (Das et al, 2017) Dialogue Visual Question Answering CLEVR (Johnson et al, 2017a) Visual Question Answering CLEVR-Humans (Johnson et al, 2017b) Visual Question Answering TDIUC (Kafle and Kanan, 2017) Visual Question Answering ShapeWorld (Kuhnle and Copestake, 2017) Binary Sentence Classification FigureQA (Kahou et al, 2018) Visual Question Answering TVQA (Lei et al, 2018) Video Question Answering LANI & CHAI (Misra et al, 2018) Instruction Following Talk the Walk (de Vries et al, 2018) Dialogue Instruction Following COG (Yang et al, 2018) Visual Question Answering; Instruction Following VCR (Zellers et al, 2019) Visual Question Answering TallyQA (Acharya et al, 2019) Visual Question Answering What to avoid…”
Section: Additional Data Analysismentioning
confidence: 99%
“…Synsets Figure 4 shows the counts of examples per synset in the training and development sets. Image Pair Reasoning We use a 200-sentence subset of the sentences analyzed in Table 5 (3) existential and (4) universal quantifiers; (5) coordination; (6) coreference; (7) spatial relations; (8) presupposition; (9) preposition attachment ambiguity VQA1.0 (Antol et al, 2015), VQA-CP (Agrawal et al, 2017), VQA2.0 (Goyal et al, 2017) Visual Question Answering Referring Expression Generation VQA (Abstract) (Zitnick and Parikh, 2013) Visual Question Answering ReferItGame (Kazemzadeh et al, 2014) Referring Expression Resolution SHAPES (Andreas et al, 2016) Visual Question Answering Bisk et al (2016) Instruction Following MSCOCO (Chen et al, 2016) Caption Generation Google RefExp (Mao et al, 2016) Referring Expression Resolution ROOM-TO-ROOM (Anderson et al, 2018) Instruction Following Visual Dialog (Das et al, 2017) Dialogue Visual Question Answering CLEVR (Johnson et al, 2017a) Visual Question Answering CLEVR-Humans (Johnson et al, 2017b) Visual Question Answering TDIUC (Kafle and Kanan, 2017) Visual Question Answering ShapeWorld (Kuhnle and Copestake, 2017) Binary Sentence Classification FigureQA (Kahou et al, 2018) Visual Question Answering TVQA (Lei et al, 2018) Video Question Answering LANI & CHAI (Misra et al, 2018) Instruction Following Talk the Walk (de Vries et al, 2018) Dialogue Instruction Following COG (Yang et al, 2018) Visual Question Answering; Instruction Following VCR (Zellers et al, 2019) Visual Question Answering TallyQA (Acharya et al, 2019) Visual Question Answering What to avoid…”
Section: Additional Data Analysismentioning
confidence: 99%
“…Tally-QA: Very recently, in 2019, the Tally-QA [1] dataset is proposed which is the largest dataset of object counting in the open-ended task. The dataset includes both simple and complex question types which can be seen in Fig.…”
Section: Datasetsmentioning
confidence: 99%
“…In this survey, first we cover major datasets published for validating the Visual Question Answering task, such as VQA dataset [2], DAQUAR [19], Visual7W [38] and most recent datasets up to 2019 include Tally-QA [1] and KVQA [25]. Next, we discuss the state-of-the-art architectures designed for the task of Visual Question Answering such as Vanilla VQA [2], Stacked Attention Networks [32] and Pythia v1.0 [10].…”
Section: Introductionmentioning
confidence: 99%
“…Reasoning-based VQA Reasoning-based VQA datasets aim at measuring a system's capability to reason about a set of objects, their attributes and relationships. HowManyQA (Trott et al, 2017) and TallyQA (Acharya et al, 2019) have object counting questions over images. SNLI-VE (Xie et al, 2019), VCOPA (Yeo et al, 2018) focus on causal reasoning whereas CLEVR (Johnson et al, 2017), NLVR (Suhr et al, 2017) target spatial reasoning.…”
Section: Visual Question Answering (Vqa)mentioning
confidence: 99%