Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval 2021
DOI: 10.1145/3404835.3463259
|View full text |Cite
|
Sign up to set email alerts
|

Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
2
2

Relationship

0
8

Authors

Journals

citations
Cited by 16 publications
(8 citation statements)
references
References 33 publications
0
6
0
Order By: Relevance
“…KB-VQA questions can also require commonsense reasoning, as in parts of OK-VQA and A-OKVQA (Schwenk et al, 2022). In particular, S3VQA (Jain et al, 2021) is an augmented version of OKVQA, improving both the quantity and quality of some question types. A-OKVQA has shifted its core task to "reasoning questions".…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…KB-VQA questions can also require commonsense reasoning, as in parts of OK-VQA and A-OKVQA (Schwenk et al, 2022). In particular, S3VQA (Jain et al, 2021) is an augmented version of OKVQA, improving both the quantity and quality of some question types. A-OKVQA has shifted its core task to "reasoning questions".…”
Section: Related Workmentioning
confidence: 99%
“…VQA 2.0 (Goyal et al, 2017) collects 'complementary images' such that each question is associated with a pair of images that result in different answers. Jain et al (2021) derive new S3VQA questions from manually defined question templates. They annotated spans of objects that could be replaced, and then substituted them with a com-plicated substitute-and-search system.…”
Section: Related Workmentioning
confidence: 99%
“…retriever) to recall the required explicit knowledge as external input of the downstream reader. In order to take advantage of the information on the Internet, [20,31,33,37] pass the vision-linguistic information through a search engine (e.g., Google) to retrieve relevant corpus (e.g., sentences from Wikipedia articles or snippets in searching result) as weak positive knowledge samples, which are further passed to the reader module for knowledge incorporation. Within the above methods, Luo et al [31] apply the previously retrieved snippets as a KB, and assign those snippets which contain the answer words as weak positive signal for retriever training.…”
Section: Knowledge-based Vqamentioning
confidence: 99%
“…Works from the second category are based on the knowledge retrieval strategy. We observe that those methods [20,31,37,57] usually pass the vision-linguistic information through a search engine where the network delay might become a bottleneck. Others retrieve relevant corpus from encyclopedia articles, which leads to lots of irrelevant information and interferes with the model's judgment.…”
Section: Introductionmentioning
confidence: 99%
“…In this work, we mainly focus on the KRVQR dataset and also test our model on the FVQA dataset. Other VQA datasets that require external knowledge exist (Marino et al 2019;Jain et al 2021) but here the task is to search for external knowledge, which is not the scope of this work.…”
Section: Introductionmentioning
confidence: 99%