Improving QA Generalization by Concurrent Modeling of Multiple Biases

Wu, Mingzhu; Moosavi, Nafise Sadat; Rücklé, Andreas; Gurevych, Iryna

doi:10.18653/v1/2020.findings-emnlp.74

Cited by 8 publications

(10 citation statements)

References 36 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The use of this information in the training objective improves the robustness of the model on adversarial datasets (He et al, 2019;Clark et al, 2019a;Utama et al, 2020a), i.e., datasets that contain counterexamples in which relying on the bias results in an incorrect prediction. In addition, it can also improve in-domain performances as well as generalization across various datasets that represent the same task (Wu et al, 2020a;Utama et al, 2020b).…”

Section: Artifacts In Nlp Datasetsmentioning

confidence: 99%

Coreference Reasoning in Machine Reading Comprehension

Wu¹,

Moosavi²,

Roth³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

Self Cite

View full text Add to dashboard Cite

Coreference resolution is essential for natural language understanding and has been long studied in NLP. In recent years, as the format of Question Answering (QA) became a standard for machine reading comprehension (MRC), there have been data collection efforts, e.g., , that attempt to evaluate the ability of MRC models to reason about coreference. However, as we show, coreference reasoning in MRC is a greater challenge than earlier thought; MRC datasets do not reflect the natural distribution and, consequently, the challenges of coreference reasoning. Specifically, success on these datasets does not reflect a model's proficiency in coreference reasoning. We propose a methodology for creating MRC datasets that better reflect the challenges of coreference reasoning and use it to create a sample evaluation set. The results on our dataset show that state-ofthe-art models still struggle with these phenomena. Furthermore, we develop an effective way to use naturally occurring coreference phenomena from existing coreference resolution datasets when training MRC models. This allows us to show an improvement in the coreference reasoning abilities of state-of-theart models. 1 Passage in CoNLLMention Cluster CoNLLdec Quesion CoNLLbart Question Gold Answer

show abstract

Section: Artifacts In Nlp Datasetsmentioning

confidence: 99%

Coreference Reasoning in Machine Reading Comprehension

Wu¹,

Moosavi²,

Roth³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

Self Cite

View full text Add to dashboard Cite

show abstract

Section: Artifacts In Nlp Datasetsmentioning

confidence: 99%

Coreference Reasoning in Machine Reading Comprehension

Wu¹,

Moosavi

Roth

et al. 2020

Preprint

Self Cite

View full text Add to dashboard Cite

The ability to reason about multiple references to a given entity is essential for natural language understanding and has been long studied in NLP. In recent years, as the format of Question Answering (QA) became a standard for machine reading comprehension (MRC), there have been data collection efforts, e.g., , that attempt to evaluate the ability of MRC models to reason about coreference.However, as we show, coreference reasoning in MRC is a greater challenge than was earlier thought; MRC datasets do not reflect the natural distribution and, consequently, the challenges of coreference reasoning. Specifically, success on these datasets does not reflect a model's proficiency in coreference reasoning. We propose a methodology for creating reading comprehension datasets that better reflect the challenges of coreference reasoning and use it to show that stateof-the-art models still struggle with these phenomena. Furthermore, we develop an effective way to use naturally occurring coreference phenomena from annotated coreference resolution datasets when training MRC models. This allows us to show an improvement in the coreference reasoning abilities of state-ofthe-art models across various MRC datasets. We will release all the code and the resulting dataset at https://github.com/UKPLab/ coref-reasoning-in-qa.

show abstract

“…Therefore, while they improve the performance on the targeted adversarial sets, they may hurt the overall robustness. The recent work of Utama et al (2020b) and Wu et al (2020) are the exceptions in which they show that their proposed debiasing frameworks improve the overall robustness, and hence the generalization across different datasets in natural language understanding and question answering, respectively. Utama et al (2020b) propose a new framework that automatically recognizes biased training examples and does not require predefining bias types.…”

Section: Introductionmentioning

confidence: 99%

“…The majority of existing works improve the robustness against a given bias by proposing new methods or training paradigms (He et al, 2019;Clark et al, 2019;Mahabadi and Henderson, 2019;Utama et al, 2020a,b;Wu et al, 2020). The common component in such methods is a bias model that is trained to detect training examples that can be solved only using a bias.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Improving Robustness by Augmenting Training Sentences with Predicate-Argument Structures

Moosavi¹,

Utama²,

Gurevych³

2020

Preprint

Self Cite

View full text Add to dashboard Cite

Existing NLP datasets contain various biases, and models tend to quickly learn those biases, which in turn limits their robustness. Existing approaches to improve robustness against dataset biases mostly focus on changing the training objective so that models learn less from biased examples. Besides, they mostly focus on addressing a specific bias, and while they improve the performance on adversarial evaluation sets of the targeted bias, they may bias the model in other ways, and therefore, hurt the overall robustness. In this paper, we propose to augment the input sentences in the training data with their corresponding predicate-argument structures, which provide a higher-level abstraction over different realizations of the same meaning and help the model to recognize important parts of sentences. We show that without targeting a specific bias, our sentence augmentation improves the robustness of transformer models against multiple biases. In addition, we show that models can still be vulnerable to the lexical overlap bias, even when the training data does not contain this bias, and that the sentence augmentation also improves the robustness in this scenario. We will release our adversarial datasets to evaluate bias in such a scenario as well as our augmentation scripts at https://github.com/UKPLab/ data-augmentation-for-robustness.

show abstract

Improving QA Generalization by Concurrent Modeling of Multiple Biases

Cited by 8 publications

References 36 publications

Coreference Reasoning in Machine Reading Comprehension

Coreference Reasoning in Machine Reading Comprehension

Coreference Reasoning in Machine Reading Comprehension

Improving Robustness by Augmenting Training Sentences with Predicate-Argument Structures

Contact Info

Product

Resources

About