Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics 2020
DOI: 10.18653/v1/2020.acl-main.357
|View full text |Cite
|
Sign up to set email alerts
|

Pre-training Is (Almost) All You Need: An Application to Commonsense Reasoning

Abstract: Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks (Devlin et al., 2019). Most of the existing approaches rely on a randomly initialized classifier on top of such networks. We argue that this fine-tuning procedure is sub-optimal as the pre-trained model has no prior on the specific classifier labels, while it might have already learned an intrinsic textual representation of the task. In this paper, we introduce a new scoring method that casts a plausibil… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
44
1

Year Published

2020
2020
2024
2024

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 47 publications
(51 citation statements)
references
References 10 publications
0
44
1
Order By: Relevance
“…Jiang et al (2020a) build a multilingual knowledge probing benchmark based on LAMA. There are many studies focus on probing specific knowledge in PLMs, such as linguistic knowledge (Lin et al, 2019;Tenney et al, 2019;Liu et al, 2019a;Hewitt and Manning, 2019;Goldberg, 2019;Warstadt et al, 2019), 1862 semantic knowledge (Tenney et al, 2019;Wallace et al, 2019;Ettinger, 2020) and world knowledge (Davison et al, 2019;Bouraoui et al, 2020;Forbes et al, 2019;Zhou et al, 2019;Roberts et al, 2020;Tamborrino et al, 2020). Recently, some studies doubt the reliability of PLMs as knowledge base by discovering the the spurious correlation to surface forms Poerner et al, 2020;Shwartz et al, 2020), and their sensitivity to "negation" and "mispriming" (Kassner and Schütze, 2020b).…”
Section: Related Workmentioning
confidence: 99%
“…Jiang et al (2020a) build a multilingual knowledge probing benchmark based on LAMA. There are many studies focus on probing specific knowledge in PLMs, such as linguistic knowledge (Lin et al, 2019;Tenney et al, 2019;Liu et al, 2019a;Hewitt and Manning, 2019;Goldberg, 2019;Warstadt et al, 2019), 1862 semantic knowledge (Tenney et al, 2019;Wallace et al, 2019;Ettinger, 2020) and world knowledge (Davison et al, 2019;Bouraoui et al, 2020;Forbes et al, 2019;Zhou et al, 2019;Roberts et al, 2020;Tamborrino et al, 2020). Recently, some studies doubt the reliability of PLMs as knowledge base by discovering the the spurious correlation to surface forms Poerner et al, 2020;Shwartz et al, 2020), and their sensitivity to "negation" and "mispriming" (Kassner and Schütze, 2020b).…”
Section: Related Workmentioning
confidence: 99%
“…Information Masking Previous work [35,36,37,38] on question-answering has shown that some models tend to learn from artificial or superficial patterns of the dataset, and they can still predict the correct answer after import clues (to human) in the premise are masked. Therefore, we challenge our well-trained model, a RoBERTa with MLM fine-tuned on the original training set, by masking context, question, and both, respectively, during inference.…”
Section: Analysesmentioning
confidence: 99%
“…We also report a language model generation baseline, due to the improved representation power of modern language models and recent evidence of their power in modeling common sense reasoning tasks (Weir et al, 2020;Tamborrino et al, 2020). The baseline is performed using the AI2 GPT-2 large model (Radford et al, 2019) (specifically, the Hugging Face PyTorch implementation (Wolf et al, 2019)).…”
Section: Language Model Baselinementioning
confidence: 99%
“…Recent works have also sought to characterize the ability of pre-trained language models to understand common sense reasoning, showing such models perform well at common sense reasoning tasks even without fine-tuning, allowing one to explore the common sense reasoning inherent in those models (Tamborrino et al, 2020;Weir et al, 2020). Of particular relevance to the current work, Weir et al (2020) explored the ability of pre-trained models to predict stereotypic tacit assumptions, generalizing about entire classes of entities with statements such as "everyone knows that a bear has ".…”
Section: Related Workmentioning
confidence: 99%