2022
DOI: 10.48550/arxiv.2203.10244
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning

Abstract: Charts are very popular for analyzing data. When exploring charts, people often ask a variety of complex reasoning questions that involve several logical and arithmetic operations. They also commonly refer to visual features of a chart in their questions. However, most existing datasets do not focus on such complex reasoning questions as their questions are template-based and answers come from a fixed-vocabulary. In this work, we present a large-scale benchmark covering 9.6K human-written questions as well as … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
24
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 11 publications
(33 citation statements)
references
References 21 publications
0
24
0
Order By: Relevance
“…Text-oriented visual understanding has a broad application prospect in real-world scenarios. We assess our models' ability toward text-oriented visual question answering on several benchmarks including TextVQA (Sidorov et al, 2020), DocVQA (Mathew et al, 2021), ChartQA (Masry et al, 2022), AI2Diagram (Kembhavi et al, 2016), and OCR-VQA (Mishra et al, 2019). Similarly, the results are shown in Table 6.…”
Section: Text-oriented Visual Question Answeringmentioning
confidence: 99%
“…Text-oriented visual understanding has a broad application prospect in real-world scenarios. We assess our models' ability toward text-oriented visual question answering on several benchmarks including TextVQA (Sidorov et al, 2020), DocVQA (Mathew et al, 2021), ChartQA (Masry et al, 2022), AI2Diagram (Kembhavi et al, 2016), and OCR-VQA (Mishra et al, 2019). Similarly, the results are shown in Table 6.…”
Section: Text-oriented Visual Question Answeringmentioning
confidence: 99%
“…As for instruction finetuning, we sample 643K single-and multi-turn conversations (excluding 21K TextCaps [59] data) from the LLaVA [43] dataset, 100K QA pairs from ShareGPT4V [14], 10K LAION-GPT-4V [60] captions, 700K GPT-4V-responded instruction pairs from ALLaVA dataset [15], and 6K text-only multi-turn conversations from LIMA [20] and OpenAssistant2 [21]. To bolster the OCR-related abilities, we further collect 28K QA pairs that comprise 10K DocVQA [17], 4K ChartQA [18], 10K DVQA [61], and 4K AI2D [19] data. In general, there are about 1.5M instruction-related conversations for image comprehension.…”
Section: Text and Image Generationmentioning
confidence: 99%
“…During inference, they work in an attention mechanism, where the low-resolution one generates visual queries, and the high-resolution counterpart provides candidate keys and values for reference. To augment the data quality, we collect and produce more data based on public resources, including high-quality responses [14,15], task-oriented instructions [16][17][18][19], and generation-related data [20,21]. The increased amount and quality improve the overall performance and extend the capability of model.…”
Section: Introductionmentioning
confidence: 99%
“…Previously, when datasets only contained fixed-vocabulary questions and answers, the standard approach was to construct classification-based VQA models such as STL-CQA [7], LEAF-Net [5], DQVQ-baseline [2], and VisionTAPAS [4]. These encoder-only models encoded image features and textual questions separately, combining them later using attention blocks [5]; [7] [2].…”
Section: Chart Vqa Expert Systemsmentioning
confidence: 99%
“…Built on ground truth data tables crawled online, PlotQA programmatically generated charts, and questions answers, but the limitations stem from its synthetic nature post data crawl, restricted chart types: bar plots, line plots, and scatter plots and restricted question variety [3] More recently, ChartQA capitalized on real-world, web-crawled charts to develop its visual question-answering datasets, supplemented by human annotators. In order to scale up, ChartQA's authors fine-tuned a T5 model to generate 2/3 of their questions and answers derived from human-written chart summaries [4].…”
Section: Introductionmentioning
confidence: 99%