Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning

Tamborrino, Alexandre; Pellicanò, Nicola; Pannier, Baptiste; Voitot, Pascal; Naudin, Louise

doi:10.18653/v1/2020.acl-main.357

Cited by 47 publications

(51 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Jiang et al (2020a) build a multilingual knowledge probing benchmark based on LAMA. There are many studies focus on probing specific knowledge in PLMs, such as linguistic knowledge (Lin et al, 2019;Tenney et al, 2019;Liu et al, 2019a;Hewitt and Manning, 2019;Goldberg, 2019;Warstadt et al, 2019), 1862 semantic knowledge (Tenney et al, 2019;Wallace et al, 2019;Ettinger, 2020) and world knowledge (Davison et al, 2019;Bouraoui et al, 2020;Forbes et al, 2019;Zhou et al, 2019;Roberts et al, 2020;Tamborrino et al, 2020). Recently, some studies doubt the reliability of PLMs as knowledge base by discovering the the spurious correlation to surface forms Poerner et al, 2020;Shwartz et al, 2020), and their sensitivity to "negation" and "mispriming" (Kassner and Schütze, 2020b).…”

Section: Related Workmentioning

confidence: 99%

Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases

Cao¹,

Lin²,

Han³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Previous literatures show that pre-trained masked language models (MLMs) such as BERT can achieve competitive factual knowledge extraction performance on some datasets, indicating that MLMs can potentially be a reliable knowledge source. In this paper, we conduct a rigorous study to explore the underlying predicting mechanisms of MLMs over different extraction paradigms. By investigating the behaviors of MLMs, we find that previous decent performance mainly owes to the biased prompts which overfit dataset artifacts. Furthermore, incorporating illustrative cases and external contexts improve knowledge prediction mainly due to entity type guidance and golden answer leakage. Our findings shed light on the underlying predicting mechanisms of MLMs, and strongly question the previous conclusion that current MLMs can potentially serve as reliable factual knowledge bases 1 .

show abstract

Section: Related Workmentioning

confidence: 99%

Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases

Cao¹,

Lin²,

Han³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

show abstract

“…Information Masking Previous work [35,36,37,38] on question-answering has shown that some models tend to learn from artificial or superficial patterns of the dataset, and they can still predict the correct answer after import clues (to human) in the premise are masked. Therefore, we challenge our well-trained model, a RoBERTa with MLM fine-tuned on the original training set, by masking context, question, and both, respectively, during inference.…”

Section: Analysesmentioning

confidence: 99%

Go Beyond Plain Fine-Tuning: Improving Pretrained Models for Social Commonsense

Chang

Liu

Gopalakrishnan

et al. 2021

2021 IEEE Spoken Language Technology Workshop (SLT)

View full text Add to dashboard Cite

Pretrained language models have demonstrated outstanding performance in many NLP tasks recently. However, their social intelligence, which requires commonsense reasoning about the current situation and mental states of others, is still developing. Towards improving language models' social intelligence, in this study we focus on the Social IQA dataset, a task requiring social and emotional commonsense reasoning. Building on top of the pretrained RoBERTa and GPT2 models, we propose several architecture variations and extensions, as well as leveraging external commonsense corpora, to optimize the model for Social IQA. Our proposed system achieves competitive results as those top-ranking models on the leaderboard. This work demonstrates the strengths of pretrained language models, and provides viable ways to improve their performance for a particular task.

show abstract

“…We also report a language model generation baseline, due to the improved representation power of modern language models and recent evidence of their power in modeling common sense reasoning tasks (Weir et al, 2020;Tamborrino et al, 2020). The baseline is performed using the AI2 GPT-2 large model (Radford et al, 2019) (specifically, the Hugging Face PyTorch implementation (Wolf et al, 2019)).…”

Section: Language Model Baselinementioning

confidence: 99%

“…Recent works have also sought to characterize the ability of pre-trained language models to understand common sense reasoning, showing such models perform well at common sense reasoning tasks even without fine-tuning, allowing one to explore the common sense reasoning inherent in those models (Tamborrino et al, 2020;Weir et al, 2020). Of particular relevance to the current work, Weir et al (2020) explored the ability of pre-trained models to predict stereotypic tacit assumptions, generalizing about entire classes of entities with statements such as "everyone knows that a bear has ".…”

Section: Related Workmentioning

confidence: 99%

ProtoQA: A Question Answering Dataset for Prototypical Common-Sense Reasoning

Boratko¹,

Li²,

O'Gorman³

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Given questions regarding some prototypical situation -such as Name something that people usually do before they leave the house for work? -a human can easily answer them via acquired experiences. There can be multiple right answers for such questions, with some more common for a situation than others. This paper introduces a new question answering dataset for training and evaluating common sense reasoning capabilities of artificial intelligence systems in such prototypical situations. The training set is gathered from an existing set of questions played in a longrunning international game show -FAMILY-FEUD. The hidden evaluation set is created by gathering answers for each question from 100 crowd-workers. We also propose a generative evaluation task where a model has to output a ranked list of answers, ideally covering all prototypical answers for a question. After presenting multiple competitive baseline models, we find that human performance still exceeds model scores on all evaluation metrics with a meaningful gap, supporting the challenging nature of the task.

show abstract

Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning

Cited by 47 publications

References 10 publications

Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases

Knowledgeable or Educated Guess? Revisiting Language Models as Knowledge Bases

Go Beyond Plain Fine-Tuning: Improving Pretrained Models for Social Commonsense

ProtoQA: A Question Answering Dataset for Prototypical Common-Sense Reasoning

Contact Info

Product

Resources

About