Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs

Warstadt, Alex; Cao, Yu; Grosu, Ioana; Peng, Wei; Blix, Hagen; Nie, Yining; Alsop, Anna; Bordia, Shikha; Liu, Haokun; Parrish, Alicia; Wang, Sheng Fu; Phang, Jason; Mohananey, Anhad; Htut, Phu Mon; Jeretič, Paloma; Bowman, Samuel R.

doi:10.18653/v1/d19-1286

Cited by 91 publications

(112 citation statements)

References 33 publications

Supporting

Mentioning

111

Contrasting

Order By: Relevance

“…Within the paradigm of training large pretrained Transformer language representations via intermediate-stage training before fine-tuning on a target task, positive transfer has been shown in both sequential task-to-task (Phang et al, 2018) and multi-task-to-task Raffel et al, 2019) formats. Wang et al (2019a) perform an extensive study on transfer with BERT, finding language modeling and NLI tasks to be among the most beneficial tasks for improving target-task performance. Talmor and Berant (2019) perform a similar cross-task transfer study on reading comprehension datasets, finding similar positive transfer in most cases, with the biggest gains stemming from a combination of multiple QA datasets.…”

Section: Related Workmentioning

confidence: 99%

“…Unsupervised pretraining-e.g., BERT (Devlin et al, 2019) or RoBERTa (Liu et al, 2019b)-has recently pushed the state of the art on many natural language understanding tasks. One method of further improving pretrained models that has been shown to be broadly helpful is to first finetune a pretrained model on an intermediate task, before fine-tuning again on the target task of interest (Phang et al, 2018;Wang et al, 2019a;Clark et al, 2019a;Sap et al, 2019), also referred to as * Equal contribution. STILTs.…”

Section: Introductionmentioning

confidence: 99%

“…This paper offers a large-scale empirical study aimed at addressing this open question. We perform a broad survey of intermediate and target task pairs, following an experimental pipeline similar to Phang et al (2018) and Wang et al (2019a). This differs from previous work in that we use a larger and more diverse set of intermediate and target tasks, introduce additional analysis-oriented probing tasks, and use a better-performing base model RoBERTa (Liu et al, 2019b).…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work?

Pruksachatkun

Phang

Liu

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Self Cite

126

View full text Add to dashboard Cite

While pretrained models such as BERT have shown large gains across natural language understanding tasks, their performance can be improved by further training the model on a data-rich intermediate task, before fine-tuning it on a target task. However, it is still poorly understood when and why intermediate-task training is beneficial for a given target task. To investigate this, we perform a large-scale study on the pretrained RoBERTa model with 110 intermediate-target task combinations. We further evaluate all trained models with 25 probing tasks meant to reveal the specific skills that drive transfer. We observe that intermediate tasks requiring high-level inference and reasoning abilities tend to work best. We also observe that target task performance is strongly correlated with higher-level abilities such as coreference resolution. However, we fail to observe more granular correlations between probing and target task performance, highlighting the need for further work on broad-coverage probing benchmarks. We also observe evidence that the forgetting of knowledge learned during pretraining may limit our analysis, highlighting the need for further work on transfer learning methods in these settings.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work?

Pruksachatkun

Phang

Liu

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Self Cite

126

View full text Add to dashboard Cite

show abstract

“…These large corpora have been used as part of larger benchmark sets, e.g., GLUE (Wang et al, 2018), and have proven useful for problems beyond NLI, such as sentence representation and transfer learning (Conneau et al, 2017;Subramanian et al, 2018;Reimers and Gurevych, 2019), automated question-answering (Khot et al, 2018;Trivedi et al, 2019) and model probing (Warstadt et al, 2019;Geiger et al, 2020;Jeretic et al, 2020).…”

Section: Related Workmentioning

confidence: 99%

OCNLI: Original Chinese Natural Language Inference

Hu¹,

Richardson²,

Xu³

et al. 2020

Findings of the Association for Computational Linguistics: EMNLP 2020

View full text Add to dashboard Cite

Despite the tremendous recent progress on natural language inference (NLI), driven largely by large-scale investment in new datasets (e.g., SNLI, MNLI) and advances in modeling, most progress has been limited to English due to a lack of reliable datasets for most of the world's languages. In this paper, we present the first large-scale NLI dataset (consisting of ∼56,000 annotated sentence pairs) 1 for Chinese called the Original Chinese Natural Language Inference dataset (OCNLI). Unlike recent attempts at extending NLI to other languages, our dataset does not rely on any automatic translation or non-expert annotation. Instead, we elicit annotations from native speakers specializing in linguistics. We follow closely the annotation protocol used for MNLI, but create new strategies for eliciting diverse hypotheses. We establish several baseline results on our dataset using state-of-the-art pre-trained models for Chinese, and find even the best performing models to be far outpaced by human performance (∼12% absolute performance gap), making it a challenging new resource that we hope will help to accelerate progress in Chinese natural language understanding. To the best of our knowledge, this is the first humanelicited MNLI-style corpus for a non-English language.

show abstract

“…To evaluate and promote the robustness of neural models against noise, some studies manually create new datasets with specific linguistic phenomena (Linzen et al, 2016;Marvin and Linzen, 2018;Goldberg, 2019;Warstadt et al, 2019a). Others have introduced various methods to generate synthetic errors on clean downstream datasets, in particular, machine translation corpora.…”

Section: Synthesized Errorsmentioning

confidence: 99%

On the Robustness of Language Encoders against Grammatical Errors

Yin¹,

Long²,

Meng³

et al. 2020

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

View full text Add to dashboard Cite

We conduct a thorough study to diagnose the behaviors of pre-trained language encoders (ELMo, BERT, and RoBERTa) when confronted with natural grammatical errors. Specifically, we collect real grammatical errors from non-native speakers and conduct adversarial attacks to simulate these errors on clean text data. We use this approach to facilitate debugging models on downstream applications. Results confirm that the performance of all tested models is affected but the degree of impact varies. To interpret model behaviors, we further design a linguistic acceptability task to reveal their abilities in identifying ungrammatical sentences and the position of errors. We find that fixed contextual encoders with a simple classifier trained on the prediction of sentence correctness are able to locate error positions. We also design a cloze test for BERT and discover that BERT captures the interaction between errors and specific tokens in context. Our results shed light on understanding the robustness and behaviors of language encoders against grammatical errors.

show abstract

Investigating BERT’s Knowledge of Language: Five Analysis Methods with NPIs

Cited by 91 publications

References 33 publications

Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work?

Intermediate-Task Transfer Learning with Pretrained Language Models: When and Why Does It Work?

OCNLI: Original Chinese Natural Language Inference

On the Robustness of Language Encoders against Grammatical Errors

Contact Info

Product

Resources

About