The platform will undergo maintenance on Sep 14 at about 7:45 AM EST and will be unavailable for approximately 2 hours.
2020
DOI: 10.1609/aaai.v34i07.6776
|View full text |Cite
|
Sign up to set email alerts
|

Overcoming Language Priors in VQA via Decomposed Linguistic Representations

Abstract: Most existing Visual Question Answering (VQA) models overly rely on language priors between questions and answers. In this paper, we present a novel method of language attention-based VQA that learns decomposed linguistic representations of questions and utilizes the representations to infer answers for overcoming language priors. We introduce a modular language attention mechanism to parse a question into three phrase representations: type representation, object representation, and concept representation. We … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
38
0

Year Published

2020
2020
2024
2024

Publication Types

Select...
5
3
2

Relationship

0
10

Authors

Journals

citations
Cited by 60 publications
(38 citation statements)
references
References 24 publications
0
38
0
Order By: Relevance
“…In contrast, federated learning is used with the aimNet and is validated on federated learning settings that include both horizontal and vertical federated learning. To focus on language priors, a modular language attention mechanism is used by Jing et al (2020) to parse a question into three phrase representations, namely type representation, object representation, and concept representation. It has prevented language priors from dominating the answering process.…”
Section: Image Question Answering Modelmentioning
confidence: 99%
“…In contrast, federated learning is used with the aimNet and is validated on federated learning settings that include both horizontal and vertical federated learning. To focus on language priors, a modular language attention mechanism is used by Jing et al (2020) to parse a question into three phrase representations, namely type representation, object representation, and concept representation. It has prevented language priors from dominating the answering process.…”
Section: Image Question Answering Modelmentioning
confidence: 99%
“…3) The data rebalance-based methods [9,15,28,31,51] attempt to propose some data augmentation strategies to generate counterfactual training instances automatically, thereby balancing the answer distribution of training data. 4) The other methods: there are also many impressive works to overcome language bias through adversarial learning [37], modifying language module [21,27], and casual inference [33]. Among most of these debiasing methods [8,12,33,37], the question-only (unimodal) branch is crucial for capturing spurious relationships between questions and answer candidates.…”
Section: Related Workmentioning
confidence: 99%
“…Existing works to address this can be categorized into three groups. The first group attempts to reduce the language bias by designing new VQA models or learning strategies Ramakrishnan et al, 2018;Cadene et al, 2019;Clark et al, 2019;Grand and Belinkov, 2019;Clark et al, 2019;Jing et al, 2020;Niu et al, 2021;Gat et al, 2020). For example, RUBi and Ensemble (Cadene et al, 2019;Clark et al, 2019) explicitly modeled the questionanswer correlations to encourage VQA models to explore other patterns in the data that are more likely to generalize.…”
Section: Related Workmentioning
confidence: 99%