Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval 2022
DOI: 10.1145/3477495.3531753
|View full text |Cite
|
Sign up to set email alerts
|

ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named Entities

Abstract: Whether to retrieve, answer, translate, or reason, multimodality opens up new challenges and perspectives. In this context, we are interested in answering questions about named entities grounded in a visual context using a Knowledge Base (KB). To benchmark this task, called KVQAE (Knowledge-based Visual Question Answering about named Entities), we provide ViQuAE, a dataset of 3.7K questions paired with images. This is the first KVQAE dataset to cover a wide range of entity types (e.g. persons, landmarks, and p… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
2

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 11 publications
(27 citation statements)
references
References 44 publications
0
7
0
Order By: Relevance
“…Results on the ViQuAE test set are shown in Table 1. Surprisingly, we find that PaLM can read questions and generate answers with 31.5% accuracy, outperforming the SOTA retrieval-based model [24] (which has access to the image) on this dataset by 9.4%. Although PaLM is a much larger model, this experiment illustrates that it is possible to achieve very good performance on Vi-QuAE without using information from the image.…”
Section: The Need For a New Visual Information Seeking Benchmarkmentioning
confidence: 75%
See 4 more Smart Citations
“…Results on the ViQuAE test set are shown in Table 1. Surprisingly, we find that PaLM can read questions and generate answers with 31.5% accuracy, outperforming the SOTA retrieval-based model [24] (which has access to the image) on this dataset by 9.4%. Although PaLM is a much larger model, this experiment illustrates that it is possible to achieve very good performance on Vi-QuAE without using information from the image.…”
Section: The Need For a New Visual Information Seeking Benchmarkmentioning
confidence: 75%
“…Early efforts in this area, such as KBQA [50] and FVQA [49], were based on domainspecific knowledge graphs, while recent datasets like OK-VQA [33] and A-OKVQA [45] have improved upon this foundation by incorporating an open-domain approach and highlighting common-sense knowledge. Among the existing benchmarks, K-VQA [44] and ViQuAE [24] are most relevant to our study, but they have limitations in their question generation process, as discussed below. In our analysis, we focus on three crucial aspects when evaluating the per-formance of pre-trained models on these benchmarks: (1) the level of information-seeking intent, (2) the reliance on visual understanding, and (3) coverage of diverse entities.…”
Section: The Need For a New Visual Information Seeking Benchmarkmentioning
confidence: 99%
See 3 more Smart Citations