Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.770
|View full text |Cite
|
Sign up to set email alerts
|

Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance

Abstract: Models for natural language understanding (NLU) tasks often rely on the idiosyncratic biases of the dataset, which make them brittle against test cases outside the training distribution. Recently, several proposed debiasing methods are shown to be very effective in improving out-of-distribution performance. However, their improvements come at the expense of performance drop when models are evaluated on the in-distribution data, which contain examples with higher diversity. This seemingly inevitable trade-off m… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

2
16
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
4
2

Relationship

2
8

Authors

Journals

citations
Cited by 51 publications
(28 citation statements)
references
References 24 publications
(49 reference statements)
2
16
1
Order By: Relevance
“…We note that there are three different criteria for controlling the debiasing strategy: (1) Models may be trained end-to-end by propagating errors to the weak learner as well as the main model (Mahabadi et al, 2020) or in a pipeline, where the weak learner is trained first and frozen, such that only its predictions are used to tune the combination loss (He et al, 2019;Clark et al, 2019;Sanh et al, 2021;Utama et al, 2020a). ( 2…”
Section: Debiasing Methodsmentioning
confidence: 99%
“…We note that there are three different criteria for controlling the debiasing strategy: (1) Models may be trained end-to-end by propagating errors to the weak learner as well as the main model (Mahabadi et al, 2020) or in a pipeline, where the weak learner is trained first and frozen, such that only its predictions are used to tune the combination loss (He et al, 2019;Clark et al, 2019;Sanh et al, 2021;Utama et al, 2020a). ( 2…”
Section: Debiasing Methodsmentioning
confidence: 99%
“…contradiction H: The woman is not awake. Recent popular solution to such issue is to develop debiasing mehods that overcome these biases at the training stage (Belinkov et al 2019;He, Zha, and Wang 2019;Stacey et al 2020;Mahabadi, Belinkov, and Henderson 2020;Utama, Moosavi, and Gurevych 2020;Ghaddar et al 2021). Namely, they first use a bias model 1 to identify biased samples.…”
Section: Premise and Hypothesismentioning
confidence: 99%
“…In the second approach, they first recognize examples that contain artifacts, and use this knowledge in the training objective to either skip or downweight biased examples (He et al, 2019;Clark et al, 2019a), or to regularize the confidence of the model on those examples (Utama et al, 2020a). The use of this information in the training objective improves the robustness of the model on adversarial datasets (He et al, 2019;Clark et al, 2019a;Utama et al, 2020a), i.e., datasets that contain counterexamples in which relying on the bias results in an incorrect prediction. In addition, it can also improve in-domain performances as well as generalization across various datasets that represent the same task (Wu et al, 2020a;Utama et al, 2020b).…”
Section: Artifacts In Nlp Datasetsmentioning
confidence: 99%