Generalizing Question Answering System with Pre-trained Language Model Fine-tuning

Su, Dan; Xu, Yan; Winata, Genta Indra; Xu, Peng; Kim, Hyeondey; Liu, Zihan; Fung, Pascale

doi:10.18653/v1/d19-5827

Cited by 47 publications

(62 citation statements)

References 26 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Harbin Institute of Technology HLTC (Su et al, 2019) Hong Kong University of Science & Technology BERT-cased-whole-word Aristo @ AI2 CLER (Takahashi et al, 2019) Fuji Xerox Co., Ltd. Adv. Train (Lee et al, 2019) 42Maru and Samsung Research BERT-Multi-Finetune Beijing Language and Culture University PAL IN DOMAIN University of California Irvine HierAtt (Osama et al, 2019) Alexandria University (Longpre et al, 2019) 82.3 68.5 66.9 74.6 70.8 FT XLNet 82.9 68.0 66.7 74.4 70.5 HLTC (Su et al, 2019) 81.0 65.9 65.0 72.9 69.0 BERT-cased-whole-word 79.4 61.1 61.4 71.2 66.3 CLER (Takahashi et al, 2019) 80.2 62.7 62.5 69.7 66.1 Adv. Train (Lee et al, 2019) 76.8 57.1 57.9 66.5 62.…”

Section: Ft Xlnetmentioning

confidence: 99%

“…Within these restrictions, we encouraged participants to explore how to best utilize the provided data. Inspired by Talmor and Berant (2019), two submissions (Su et al, 2019;Longpre et al, 2019) analyzed similarities between datasets. Unsurprisingly, the performance improved significantly when fine-tuned on the training dataset most similar to the evaluation dataset of interest.…”

Section: Summary Of Findingsmentioning

confidence: 99%

“…Unsurprisingly, the performance improved significantly when fine-tuned on the training dataset most similar to the evaluation dataset of interest. Su et al (2019) found each of the development (Split II) datasets resembles one or two training datasets (Split I)-and thus training with all datasets is crucial for generalization across the multiple domains. They experimented with datafeeding methodologies, and found that shuffling instances of all six training datasets is more effective than sequentially feeding all examples from each dataset, one dataset after another.…”

Section: Summary Of Findingsmentioning

confidence: 99%

See 2 more Smart Citations

MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension

Fisch¹,

Talmor²,

Jia³

et al. 2019

Proceedings of the 2nd Workshop on Machine Reading for Question Answering

197

218

View full text Add to dashboard Cite

We present the results of the Machine Reading for Question Answering (MRQA) 2019 shared task on evaluating the generalization capabilities of reading comprehension systems. 1 In this task, we adapted and unified 18 distinct question answering datasets into the same format. Among them, six datasets were made available for training, six datasets were made available for development, and the final six were hidden for final evaluation. Ten teams submitted systems, which explored various ideas including data sampling, multi-task learning, adversarial training, and ensembling. The best system achieved an average F1 score of 72.5 on the 12 held-out datasets, 10.7 absolute points higher than our initial baseline based on BERT.

show abstract

Section: Ft Xlnetmentioning

confidence: 99%

Section: Summary Of Findingsmentioning

confidence: 99%

Section: Summary Of Findingsmentioning

confidence: 99%

See 1 more Smart Citation

MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension

Fisch¹,

Talmor²,

Jia³

et al. 2019

Proceedings of the 2nd Workshop on Machine Reading for Question Answering

197

218

View full text Add to dashboard Cite

show abstract

“…We combine all the bias weights and use them to adapt the distillation loss. Su et al (2019) achieve considerable improvements by simply fine-tuning XLNet instead of BERT, and Longpre et al (2019) achieve further improvements by augmenting the training data with additional unanswerable questions.…”

Section: Related Workmentioning

confidence: 99%

“…Existing approaches to improve generalization in QA either are only applicable when there exist multiple training domains (Talmor and Berant, 2019;Takahashi et al, 2019; or rely on models and ensembles with larger capacity (Longpre et al, 2019;Su et al, 2019;. In contrast, our novel debiasing approach can be applied to both single and multi-domain scenarios, and it improves the model generalization without requiring larger pre-trained language models.…”

Section: Introductionmentioning

confidence: 99%

Improving QA Generalization by Concurrent Modeling of Multiple Biases

Wu¹,

Moosavi

Rücklé

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Existing NLP datasets contain various biases that models can easily exploit to achieve high performances on the corresponding evaluation sets. However, focusing on dataset-specific biases limits their ability to learn more generalizable knowledge about the task from more general data patterns. In this paper, we investigate the impact of debiasing methods for improving generalization and propose a general framework for improving the performance on both in-domain and out-of-domain datasets by concurrent modeling of multiple biases in the training data. Our framework weights each example based on the biases it contains and the strength of those biases in the training data. It then uses these weights in the training objective so that the model relies less on examples with high bias weights. We extensively evaluate our framework on extractive question answering with training data from various domains with multiple biases of different strengths. We perform the evaluations in two different settings, in which the model is trained on a single domain or multiple domains simultaneously, and show its effectiveness in both settings compared to state-of-the-art debiasing methods. 1

show abstract

Using Knowledge Graphs and Cognitive Approaches for Literature Review Analysis: A Framework

Elnagar

Osei-Bryson

2020

Information Systems

View full text Add to dashboard Cite

Generalizing Question Answering System with Pre-trained Language Model Fine-tuning

Cited by 47 publications

References 26 publications

MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension

MRQA 2019 Shared Task: Evaluating Generalization in Reading Comprehension

Improving QA Generalization by Concurrent Modeling of Multiple Biases

Using Knowledge Graphs and Cognitive Approaches for Literature Review Analysis: A Framework

Contact Info

Product

Resources

About