A Review on Language Models as Knowledge Bases

Badr, AlKhamissi,; Li, Millicent; Çelikyılmaz, Aslı; Diab, Mona; Ghazvininejad, Marjan

doi:10.48550/arxiv.2204.06031

Cited by 13 publications

(13 citation statements)

References 61 publications

(106 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Memorization in Language Models: Unintended memorization is a known challenge for language models [12,13], which makes them open to extraction attacks [14,15] and membership inference attacks [16,17], although there has been work on mitigating these vulnerabilities [11,18]. Recent work has argued that memorization is not exclusively harmful, and can be crucial for certain types of generalization (e.g., on QA tasks) [19,20,21], while also allowing the models to encode significant amounts of world or factual knowledge [22,23,24]. There is also a growing body of work analyzing fundamental properties of memorization in language models [9,8,10].…”

Section: Background and Related Workmentioning

confidence: 99%

Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

Tirumala¹,

Markosyan²,

Zettlemoyer³

et al. 2022

Preprint

View full text Add to dashboard Cite

Despite their wide adoption, the underlying training and memorization dynamics of very large language models is not well understood. We empirically study exact memorization in causal and masked language modeling, across model sizes and throughout the training process. We measure the effects of dataset size, learning rate, and model size on memorization, finding that larger language models memorize training data faster across all settings. Surprisingly, we show that larger models can memorize a larger portion of the data before over-fitting and tend to forget less throughout the training process. We also analyze the memorization dynamics of different parts of speech and find that models memorize nouns and numbers first; we hypothesize and provide empirical evidence that nouns and numbers act as a unique identifier for memorizing individual training examples. Together, these findings present another piece of the broader puzzle of trying to understand what actually improves as models get bigger.

show abstract

Section: Background and Related Workmentioning

confidence: 99%

Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

Tirumala¹,

Markosyan²,

Zettlemoyer³

et al. 2022

Preprint

View full text Add to dashboard Cite

show abstract

“…We focus on the task of slot-filling which, since its introduction in LM evaluation through the LAMA benchmark (Petroni et al, 2019a), has been extensively used to probe the knowledge contained in LMs (AlKhamissi et al, 2022). More specifically, we use the T-ReX split (Elsahar et al, 2018) of LAMA.…”

Section: Datamentioning

confidence: 99%

Can discrete information extraction prompts generalize across language models?

Rakotonirina¹,

Dessì²,

Petroni³

et al. 2023

Preprint

View full text Add to dashboard Cite

We study whether automatically-induced prompts that effectively extract information from a language model can also be used, out-of-the-box, to probe other language models for the same information. After confirming that discrete prompts induced with the AutoPrompt algorithm outperform manual and semi-manual prompts on the slot-filling task, we demonstrate a drop in performance for Au-toPrompt prompts learned on a model and tested on another. We introduce a way to induce prompts by mixing language models at training time that results in prompts that generalize well across models. We conduct an extensive analysis of the induced prompts, finding that the more general prompts include a larger proportion of existing English words and have a less order-dependent and more uniform distribution of information across their component tokens. Our work provides preliminary evidence that it's possible to generate discrete prompts that can be induced once and used with a number of different models, and gives insights on the properties characterizing such prompts. 1

show abstract

“…Previous work also studies the effect of dataset size for finetuning (Wallat et al, 2021;Fichtel et al, 2021;Da et al, 2021), but the negative effects finetuning (studied in this paper) remain unexplored. For a full review of the literature on knowledge probing and extraction, we refer to (Safavi & Koutra, 2021;AlKhamissi et al, 2022). et al (2020).…”

Section: Related Workmentioning

confidence: 99%

Understanding Finetuning for Factual Knowledge Extraction from Language Models

Mehran¹,

Mittal²,

Ramachandran³

2023

Preprint

View full text Add to dashboard Cite

Language models (LMs) pretrained on large corpora of text from the web have been observed to contain large amounts of various types of knowledge about the world. This observation has led to a new and exciting paradigm in knowledge graph construction where, instead of manual curation or text mining, one extracts knowledge from the parameters of an LM. Recently, it has been shown that finetuning LMs on a set of factual knowledge makes them produce better answers to queries from a different set, thus making finetuned LMs a good candidate for knowledge extraction and, consequently, knowledge graph construction. In this paper, we analyze finetuned LMs for factual knowledge extraction. We show that along with its previously known positive effects, finetuning also leads to a (potentially harmful) phenomenon which we call Frequency Shock, where at the test time the model over-predicts rare entities that appear in the training set and under-predicts common entities that do not appear in the training set enough times. We show that Frequency Shock leads to a degradation in the predictions of the model and beyond a point, the harm from Frequency Shock can even outweigh the positive effects of finetuning, making finetuning harmful overall. We then consider two solutions to remedy the identified negative effect: 1-model mixing and 2-mixture finetuning with the LM's pre-training task. The two solutions combined lead to significant improvements compared to vanilla finetuning.

show abstract

A Review on Language Models as Knowledge Bases

Cited by 13 publications

References 61 publications

Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models

Can discrete information extraction prompts generalize across language models?

Understanding Finetuning for Factual Knowledge Extraction from Language Models

Contact Info

Product

Resources

About