2020
DOI: 10.48550/arxiv.2006.10413
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Are Pretrained Language Models Symbolic Reasoners Over Knowledge?

Abstract: How can pre-trained language models (PLMs) learn factual knowledge from the training set? We investigate the two most important mechanisms: reasoning and memorization. Prior work has attempted to quantify the number of facts PLMs learn, but we present, using synthetic data, the first study that establishes a causal relation between facts present in training and facts learned by the PLM. For reasoning, we show that PLMs learn to apply some symbolic reasoning rules; but in particular, they struggle with two-hop … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(4 citation statements)
references
References 11 publications
0
3
0
Order By: Relevance
“…A number of works have studied the impact of word frequency on different aspects of LLM and, in particular, on the quality of the delivered representations. Kassner et al (2020) have studied BERT models and possible memorization based on token frequency, demonstrating that if a token appears fewer than 15 times, the model will disregard it, while a token that appears 100 times or more will be predicted more accurately. Zhou et al (2022) demonstrated that high frequency words and low frequency words are represented differently by transformer LLM, in particular by BERT.…”
Section: Related Workmentioning
confidence: 99%
“…A number of works have studied the impact of word frequency on different aspects of LLM and, in particular, on the quality of the delivered representations. Kassner et al (2020) have studied BERT models and possible memorization based on token frequency, demonstrating that if a token appears fewer than 15 times, the model will disregard it, while a token that appears 100 times or more will be predicted more accurately. Zhou et al (2022) demonstrated that high frequency words and low frequency words are represented differently by transformer LLM, in particular by BERT.…”
Section: Related Workmentioning
confidence: 99%
“…Language Understanding Benchmarks. Previous NLP benchmarks are usually for evaluate general language understanding, such as slot filling (Elsahar et al, 2019;Levy et al, 2017), QA Rajpurkar et al, 2016;Joshi et al, 2017;Fan et al, 2019;Ding et al, 2019;Clark et al, 2019;Kassner et al, 2020), dialogue (Dinan et al, 2018), entailment (Williams et al, 2018;Rocktäschel et al, 2015;Dagan et al, 2005;Morgenstern and Ortiz, 2015). For example, some question answering tasks aim to evaluate machine reading comprehension or reason over a knowledge source, such as Wikipedia.…”
Section: Related Workmentioning
confidence: 99%
“…Embedding-based methods first convert symbolic facts and rules to embeddings and then apply neural network layers on top to softly predict answers. Recent work in deductive reasoning focused on tasks where rules and facts are expressed in natural language (Talmor et al, 2020;Saeed et al, 2021;Clark et al, 2020b;Kassner et al, 2020). Such tasks are more challenging because the model has to first understand the logic described in the natural language sentences before performing logical reasoning.…”
Section: Related Workmentioning
confidence: 99%