2022
DOI: 10.48550/arxiv.2203.14680
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space

Abstract: Transformer-based language models (LMs) are at the core of modern NLP, but their internal prediction construction process is opaque and largely not understood. In this work, we make a substantial step towards unveiling this underlying prediction process, by reverseengineering the operation of the feed-forward network (FFN) layers, one of the building blocks of transformer models. We view the token representation as a changing distribution over the vocabulary, and the output from each FFN layer as an additive u… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

1
15
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1

Relationship

1
5

Authors

Journals

citations
Cited by 9 publications
(18 citation statements)
references
References 21 publications
(29 reference statements)
1
15
0
Order By: Relevance
“…LM-Debugger establishes a framework for interpreting a token's representation and updates applied to it at each layer in the network. This framework builds upon recent findings by Geva et al (2022), who viewed the token representation as a changing distribution over the output vocabulary, and the output from each FFN layer as a collection of weighted sub-updates to that distribution, which are often interpretable to humans. We next elaborate on the findings we rely on at this work.…”
Section: Underlying Interpretation Methodsmentioning
confidence: 98%
See 4 more Smart Citations
“…LM-Debugger establishes a framework for interpreting a token's representation and updates applied to it at each layer in the network. This framework builds upon recent findings by Geva et al (2022), who viewed the token representation as a changing distribution over the output vocabulary, and the output from each FFN layer as a collection of weighted sub-updates to that distribution, which are often interpretable to humans. We next elaborate on the findings we rely on at this work.…”
Section: Underlying Interpretation Methodsmentioning
confidence: 98%
“…where x i is the output from the preceding multihead self-attention layer, and x i is the updated token representation (Vaswani et al, 2017). Geva et al (2022) proposed an interpretation method for these updates in terms of the vocabulary, which we employ as the backbone of LM-Debugger and describe in detail next.…”
Section: Underlying Interpretation Methodsmentioning
confidence: 99%
See 3 more Smart Citations