2022
DOI: 10.48550/arxiv.2202.05262
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Locating and Editing Factual Associations in GPT

Abstract: We investigate the mechanisms underlying factual knowledge recall in autoregressive transformer language models. First, we develop a causal intervention for identifying neuron activations capable of altering a model's factual predictions. Within large GPT-style models, this reveals two distinct sets of neurons that we hypothesize correspond to knowing an abstract fact and saying a concrete word, respectively. This insight inspires the development of ROME, a novel method for editing facts stored in model weight… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
19
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
4
4

Relationship

0
8

Authors

Journals

citations
Cited by 21 publications
(27 citation statements)
references
References 13 publications
0
19
0
Order By: Relevance
“…More recently, a surge of works have investigated the knowledge captured by the FFN layers (Da et al, 2021;Dai et al, 2021;Yao et al, 2022;Meng et al, 2022;Wallat et al, 2020). These works show that the FFN layers store various types of knowledge, which can be located in specific neurons and edited.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…More recently, a surge of works have investigated the knowledge captured by the FFN layers (Da et al, 2021;Dai et al, 2021;Yao et al, 2022;Meng et al, 2022;Wallat et al, 2020). These works show that the FFN layers store various types of knowledge, which can be located in specific neurons and edited.…”
Section: Related Workmentioning
confidence: 99%
“…We study this question through the lens of the feed-forward network (FFN) layers, one of the core components in transformers (Vaswani et al, 2017). Recent work showed that these layers play an important role in LMs, acting as memories that encode factual and linguistic knowledge (Geva et al, 2021;Da et al, 2021;Meng et al, 2022). In this work, we investigate how outputs from the FFN layers are utilized internally to build predictions.…”
Section: Introductionmentioning
confidence: 99%
“…Transformer Mechanisms We evaluate transformers and whether they are explainable via other mechanisms, such as the feedfoward layers Geva et al, 2021;Meng et al, 2022). propose a tool to measure nonlinearities in LMs by taking into account geometry space of embeddings, finding that the non-linearity of the self-attention feedforward layers and MLPs of a LM follow similar patterns, but their functions are less known.…”
Section: Interpretabilitymentioning
confidence: 99%
“…propose a tool to measure nonlinearities in LMs by taking into account geometry space of embeddings, finding that the non-linearity of the self-attention feedforward layers and MLPs of a LM follow similar patterns, but their functions are less known. Geva et al (2021) extrapolate from this learned fact and find that feedforward layers in LMs are just key value memories; as a result Meng et al (2022) are able to use their method of causal tracing to locate the knowledge and use the key value pairs to access knowledge within the feedforward layers and make modifications to it.…”
Section: Interpretabilitymentioning
confidence: 99%
“…Our intervention in FFN sub-updates relates to recent methods for locating and editing knowledge in the FFN layers of LMs (Meng et al, 2022;Dai et al, 2021). Different from these methods, LM-Debugger aims to provide a comprehensive and fine-grained interpretation of the prediction construction process across the layers.…”
Section: Related Workmentioning
confidence: 99%