COFAR: Commonsense and Factual Reasoning in Image Search

Prajwal, Gatti,; Penamakuri, Abhirama Subramanyam; Teotia, Revant; Mishra, Anand; Sengupta, Shamik; Ramnani, Roshni

doi:10.48550/arxiv.2210.08554

Cited by 1 publication

(1 citation statement)

References 24 publications

(39 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…WinoX [39] French, German, Russian [83] answering 150,000 questions COFAR [47] Find an image 25,300 images Expert construction matching a query 40,800 queries CoSim [74] Counterfactual 3500 instances Crowd sourcing reasoning about images CRIC [44] Compositional 96,000 images Synthesized reasoning 494,000 questions e-SNLI-VE [71] Visual-textual 430,000 Synthesized entailment from SNLI-VE FVQA [138] Visual question 2190 images Synthesized answering GD-VCR [149] Visual question 328 images Expert construction answering 886 Q/A pairs Half&Half [123] Reasoning with text 126,000 examples Synthesized and incomplete images HumanCog [151] Who in image 67,000 images Extracted from VCR is being described? 138,000 descriptions + crowd sourcing HVQR [21] Visual question 33,000 images Synthesized answering 157,000 Q/A pairs IconQA [94] Visual question 107,400 instances Crowd sourcing answering KB-VQA [137] Visual question 2190 images Synthesized answering Naive action-Match image 1400 text effects Crowd sourcing effect prediction [45] to effect of action 4163 images PTR [61] Visual question 80,000 images Synthesized (both answering 800,000 images images and Q/A pairs) Sherlock [60] Inferences from 103,000 images Crowd sourcing images 363,000 inferences VCR [155] Visual question 290,000 questions Crowd sourcing answering Visual Visual question 108,000 images Crowd sourcing Genome [78] answering WinoGAViL [13] Match image to text 4482 examples Gamification Table 8: Image benchmarks Name Task Size Construction AGENT [121] Is this surprising?…”

Section: Originalmentioning

confidence: 99%

Benchmarks for Automated Commonsense Reasoning: A Survey

Davis¹

2023

Preprint

View full text Add to dashboard Cite

More than one hundred benchmarks have been developed to test the commonsense knowledge and commonsense reasoning abilities of artificial intelligence (AI) systems. However, these benchmarks are often flawed and many aspects of common sense remain untested. Consequently, we do not currently have any reliable way of measuring to what extent existing AI systems have achieved these abilities.This paper surveys the development and uses of AI commonsense benchmarks. We discuss the nature of common sense; the role of common sense in AI; the goals served by constructing commonsense benchmarks; and desirable features of commonsense benchmarks. We analyze the common flaws in benchmarks, and we argue that it is worthwhile to invest the work needed ensure that benchmark examples are consistently high quality. We survey the various methods of constructing commonsense benchmarks. We enumerate 139 commonsense benchmarks that have been developed: 102 text-based, 18 image-based, 12 video based, and 7 simulated physical environments. We discuss the gaps in the existing benchmarks and aspects of commonsense reasoning that are not addressed in any existing benchmark. We conclude with a number of recommendations for future development of commonsense AI benchmarks.

show abstract

Section: Originalmentioning

confidence: 99%

Benchmarks for Automated Commonsense Reasoning: A Survey

Davis¹

2023

Preprint

View full text Add to dashboard Cite

show abstract

COFAR: Commonsense and Factual Reasoning in Image Search

Cited by 1 publication

References 24 publications

Benchmarks for Automated Commonsense Reasoning: A Survey

Benchmarks for Automated Commonsense Reasoning: A Survey

Contact Info

Product

Resources

About