2022
DOI: 10.48550/arxiv.2204.06031
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Review on Language Models as Knowledge Bases

Abstract: Recently, there has been a surge of interest in the NLP community on the use of pretrained Language Models (LMs) as Knowledge Bases (KBs). Researchers have shown that LMs trained on a sufficiently large (web) corpus will encode a significant amount of knowledge implicitly in its parameters. The resulting LM can be probed for different kinds of knowledge and thus acting as a KB. This has a major advantage over traditional KBs in that this method requires no human supervision. In this paper, we present a set of … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
13
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
7
3

Relationship

0
10

Authors

Journals

citations
Cited by 13 publications
(13 citation statements)
references
References 61 publications
(106 reference statements)
0
13
0
Order By: Relevance
“…Memorization in Language Models: Unintended memorization is a known challenge for language models [12,13], which makes them open to extraction attacks [14,15] and membership inference attacks [16,17], although there has been work on mitigating these vulnerabilities [11,18]. Recent work has argued that memorization is not exclusively harmful, and can be crucial for certain types of generalization (e.g., on QA tasks) [19,20,21], while also allowing the models to encode significant amounts of world or factual knowledge [22,23,24]. There is also a growing body of work analyzing fundamental properties of memorization in language models [9,8,10].…”
Section: Background and Related Workmentioning
confidence: 99%
“…Memorization in Language Models: Unintended memorization is a known challenge for language models [12,13], which makes them open to extraction attacks [14,15] and membership inference attacks [16,17], although there has been work on mitigating these vulnerabilities [11,18]. Recent work has argued that memorization is not exclusively harmful, and can be crucial for certain types of generalization (e.g., on QA tasks) [19,20,21], while also allowing the models to encode significant amounts of world or factual knowledge [22,23,24]. There is also a growing body of work analyzing fundamental properties of memorization in language models [9,8,10].…”
Section: Background and Related Workmentioning
confidence: 99%
“…We focus on the task of slot-filling which, since its introduction in LM evaluation through the LAMA benchmark (Petroni et al, 2019a), has been extensively used to probe the knowledge contained in LMs (AlKhamissi et al, 2022). More specifically, we use the T-ReX split (Elsahar et al, 2018) of LAMA.…”
Section: Datamentioning
confidence: 99%
“…Previous work also studies the effect of dataset size for finetuning (Wallat et al, 2021;Fichtel et al, 2021;Da et al, 2021), but the negative effects finetuning (studied in this paper) remain unexplored. For a full review of the literature on knowledge probing and extraction, we refer to (Safavi & Koutra, 2021;AlKhamissi et al, 2022). et al (2020).…”
Section: Related Workmentioning
confidence: 99%