2021
DOI: 10.48550/arxiv.2106.02902
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

BERTnesia: Investigating the capture and forgetting of knowledge in BERT

Abstract: Probing complex language models has recently revealed several insights into linguistic and semantic patterns found in the learned representations. In this article, we probe BERT specifically to understand and measure the relational knowledge it captures in its parametric memory. While probing for linguistic understanding is commonly applied to all layers of BERT as well as finetuned models, this has not been done for factual knowledge. We utilize existing knowledge base completion tasks (LAMA) to probe every l… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 35 publications
0
5
0
Order By: Relevance
“…The merit of finetuned LMs has been also shown for common-sense knowledge extraction (Bosselut et al, 2019). Previous work also studies the effect of dataset size for finetuning (Wallat et al, 2021;Fichtel et al, 2021;Da et al, 2021), but the negative effects finetuning (studied in this paper) remain unexplored. For a full review of the literature on knowledge probing and extraction, we refer to (Safavi & Koutra, 2021;AlKhamissi et al, 2022).…”
Section: Related Workmentioning
confidence: 73%
See 2 more Smart Citations
“…The merit of finetuned LMs has been also shown for common-sense knowledge extraction (Bosselut et al, 2019). Previous work also studies the effect of dataset size for finetuning (Wallat et al, 2021;Fichtel et al, 2021;Da et al, 2021), but the negative effects finetuning (studied in this paper) remain unexplored. For a full review of the literature on knowledge probing and extraction, we refer to (Safavi & Koutra, 2021;AlKhamissi et al, 2022).…”
Section: Related Workmentioning
confidence: 73%
“…While previous work typically explains the phenomanon in Figure 1 as forgetting effect (Wallat et al, 2021), our study reveals a more nuanced explanation in terms of Frequency Shock: even though both "Moscow" and "Baku" have been observed an equal number of times in the training set, since "Baku" is expected to be a less common entity 2 and hence less observed during the pre-training of the language model, the finetuned model receives a frequency shock leading to an over-prediction of the entity "Baku", hence corrupting an originally correct prediction. Note that Frequency Shock and Range Shift are related to the problem of out-of-distribution (OOD) generalization in machine learning, see section 3.6 for more discussion.…”
Section: Introductionmentioning
confidence: 95%
See 1 more Smart Citation
“…There are increasing evidence that show that scaling LMs to larger sizes is not the solution to generating factually correct information (Lazaridou et al, 2021;Gehman et al, 2020;Lin et al, 2021a). As a result, this would also result in catastrophic forgetting (Wallat et al, 2021). Changing a single weight may have a ripple effect that affects a large number of other implicitly memorized facts.…”
Section: Lms-as-kbsmentioning
confidence: 99%
“…commonsense question answering) so it can make way for the required knowledge to surface in the output during evaluation. Previous work has shown that most knowledge encoded in a LM are acquired during pretraining, while finetuning just learns an interface to access that acquired knowledge (Da et al, 2021;Wallat et al, 2021).…”
Section: Finetuningmentioning
confidence: 99%