Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles

Clark, Christopher; Yatskar, Mark; Zettlemoyer, Luke

doi:10.18653/v1/2020.findings-emnlp.272

Cited by 39 publications

(43 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The results are satisfactory, especially when considering the simplicity and ef-ficiency of our approach. Moreover, the fact that a single configuration works well on 3 tasks is an indicator that our method has the potential to generalize on completely unknown OOD sets (Clark et al, 2020).…”

Section: Resultsmentioning

confidence: 99%

“…Following (Clark et al, 2019;Grand and Belinkov, 2019;Clark et al, 2020;Sanh et al, 2020), we tune our method hyperparameters on the OOD sets. As pointed out by (Clark et al, 2019(Clark et al, , 2020, this is not ideal since it assumes some prior knowledge of the OOD test sets. To best mitigate this impact, we follow the procedure of previous works and use the same hyper-parameters for all 3 tasks.…”

Section: Methodsmentioning

confidence: 99%

“…Recently, there has been a number of endeavours to produce a bias model without prior knowledge on the targeted biases or without the need for manually designing features. Utama et al (2020b) propose to use instead a model trained on a tiny fraction (< 1%) of the training data for few epochs as a bias model; while Clark et al (2020) and Sanh et al (2020) trained a low capacity model on the full training set. These approaches target the training of the bias model alone, which is subsequently queried while training the main model of interest.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

End-to-End Self-Debiasing Framework for Robust NLU Training

Ghaddar

Langlais

Rezagholizadeh

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

Existing Natural Language Understanding (NLU) models have been shown to incorporate dataset biases leading to strong performance on in-distribution (ID) test sets but poor performance on out-of-distribution (OOD) ones. We introduce a simple yet effective debiasing framework whereby the shallow representations of the main model are used to derive a bias model and both models are trained simultaneously. We demonstrate on three well studied NLU tasks that despite its simplicity, our method leads to competitive OOD results. It significantly outperforms other debiasing approaches on two tasks, while still delivering high in-distribution performance.

show abstract

Section: Resultsmentioning

confidence: 99%

Section: Methodsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

End-to-End Self-Debiasing Framework for Robust NLU Training

Ghaddar

Langlais

Rezagholizadeh

et al. 2021

Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

View full text Add to dashboard Cite

show abstract

“…Belinkov et al (2019) used adversarial training to mitigate the hypothesis-only bias in textual entailment models. Clark et al (2020) adversarially trained a low and a high capacity model in an ensemble in order to ensure that the latter model is focusing on patterns that should generalize better. Dayanik and Padó (2020) Dai and Adel (2020) explored different entity substitution techniques for data augmentation tailored to NER.…”

Section: Mitigating Biasmentioning

confidence: 99%

Context-aware Adversarial Training for Name Regularity Bias in Named Entity Recognition

Ghaddar

Langlais

Ahmad

et al. 2021

Transactions of the Association for Computational Linguistics

View full text Add to dashboard Cite

In this work, we examine the ability of NER models to use contextual information when predicting the type of an ambiguous entity. We introduce NRB, a new testbed carefully designed to diagnose Name Regularity Bias of NER models. Our results indicate that all state-of-the-art models we tested show such a bias; BERT fine-tuned models significantly outperforming feature-based (LSTM-CRF) ones on NRB, despite having comparable (sometimes lower) performance on standard benchmarks. To mitigate this bias, we propose a novel model-agnostic training method that adds learnable adversarial noise to some entity mentions, thus enforcing models to focus more strongly on the contextual signal, leading to significant gains on NRB. Combining it with two other training strategies, data augmentation and parameter freezing, leads to further gains.

show abstract

“…To generalize to outof-distribution samples adaptively, the VQA model should own two capabilities: (1) overcoming negative language biases and (2) producing out-of-distribution answers by learning rules entailed in in-domain data. The prevailing OOD generalization methods [10,11,18,65] focus on enhancing the first capability, which achieves OOD generalization by explicitly mitigating the language biases. While the second capability, which directly endues VQA models the potentiality to generalize to out-of-distribution (i.e., unseen or rare) samples, has not been well explored.…”

mentioning

confidence: 99%

X-GGM: Graph Generative Modeling for Out-of-distribution Generalization in Visual Question Answering

Jiang

Liu

et al. 2021

Proceedings of the 29th ACM International Conference on Multimedia

View full text Add to dashboard Cite

Encouraging progress has been made towards Visual Question Answering (VQA) in recent years, but it is still challenging to enable VQA models to adaptively generalize to out-of-distribution (OOD) samples. Intuitively, recompositions of existing visual concepts (i.e., attributes and objects) can generate unseen compositions in the training set, which will promote VQA models to generalize to OOD samples. In this paper, we formulate OOD generalization in VQA as a compositional generalization problem and propose a graph generative modeling-based training scheme (X-GGM) to handle the problem implicitly. X-GGM leverages graph generative modeling to iteratively generate a relation matrix and node representations for the predefined graph that utilizes attribute-object pairs as nodes. Furthermore, to alleviate the unstable training issue in graph generative modeling, we propose a gradient distribution consistency loss to constrain the data distribution with adversarial perturbations and the generated distribution. The baseline VQA model (LXMERT) trained with the X-GGM scheme achieves state-of-the-art OOD performance on two standard VQA OOD benchmarks, i.e., VQA-CP v2 and GQA-OOD. Extensive ablation studies demonstrate the effectiveness of X-GGM components. CCS CONCEPTS• Computing methodologies → Computer vision tasks; • Information systems → Question answering.

show abstract

Learning to Model and Ignore Dataset Bias with Mixed Capacity Ensembles

Cited by 39 publications

References 34 publications

End-to-End Self-Debiasing Framework for Robust NLU Training

End-to-End Self-Debiasing Framework for Robust NLU Training

Context-aware Adversarial Training for Name Regularity Bias in Named Entity Recognition

X-GGM: Graph Generative Modeling for Out-of-distribution Generalization in Visual Question Answering

Contact Info

Product

Resources

About