Proceedings of the 29th ACM International Conference on Multimedia 2021
DOI: 10.1145/3474085.3475350
|View full text |Cite
|
Sign up to set email alerts
|

X-GGM: Graph Generative Modeling for Out-of-distribution Generalization in Visual Question Answering

Abstract: Encouraging progress has been made towards Visual Question Answering (VQA) in recent years, but it is still challenging to enable VQA models to adaptively generalize to out-of-distribution (OOD) samples. Intuitively, recompositions of existing visual concepts (i.e., attributes and objects) can generate unseen compositions in the training set, which will promote VQA models to generalize to OOD samples. In this paper, we formulate OOD generalization in VQA as a compositional generalization problem and propose a … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
7
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
4

Relationship

0
9

Authors

Journals

citations
Cited by 16 publications
(7 citation statements)
references
References 52 publications
0
7
0
Order By: Relevance
“…For example, Beery et al (2018) [2] demonstrated a network that accurately recognizes cows in a typical context (e.g., pasture) consistently misclassifies cows in a non-typical context (e.g., beach). Similar heuristics also arise in visual question answering systems [1] and researchers proposed graph generative modeling schemes [13] (inspired by graph convolutional networks [30]) to handle the problem implicitly. In this paper, we study this problem within the Natural Language Inference (NLI): the task of determining whether a premise sentence entails (i.e., implies the truth of) a hypothesis sentence [7,8,4].…”
Section: Introductionmentioning
confidence: 90%
“…For example, Beery et al (2018) [2] demonstrated a network that accurately recognizes cows in a typical context (e.g., pasture) consistently misclassifies cows in a non-typical context (e.g., beach). Similar heuristics also arise in visual question answering systems [1] and researchers proposed graph generative modeling schemes [13] (inspired by graph convolutional networks [30]) to handle the problem implicitly. In this paper, we study this problem within the Natural Language Inference (NLI): the task of determining whether a premise sentence entails (i.e., implies the truth of) a hypothesis sentence [7,8,4].…”
Section: Introductionmentioning
confidence: 90%
“…VQA-OOD While there are certain similarities between OOD and realistic VQA, they are different. [14,31,24] address OOD where the distributions of training and test set are different. However, there is no answer for UQs, so [14,31,24] are not applicable to the proposed RVQA task.…”
Section: E Additional Related Workmentioning
confidence: 99%
“…[14,31,24] address OOD where the distributions of training and test set are different. However, there is no answer for UQs, so [14,31,24] are not applicable to the proposed RVQA task.…”
Section: E Additional Related Workmentioning
confidence: 99%
“…(2) Weakening language priors: AdvReg [29], GRL [30], RUBi [31], LM [32], LMH [32], Bias-Product (POE) [32], RMFE [33], CF-VQA [34] and GGE-DQ [35]; (3) Using various data enhancement: CSS [36], CL-VQA [37], GradSup [38], Loss-Rescaling [39], Mutant [40], RandImg [41], Unshuffling [42], ADA-VQA [43] and X-GGM [44].…”
Section: Language Bias In Vqamentioning
confidence: 99%