2023
DOI: 10.48550/arxiv.2301.05062
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Tracr: Compiled Transformers as a Laboratory for Interpretability

Abstract: DeepMind, * Work done at DeepMind. Interpretability research aims to build tools for understanding machine learning (ML) models. However, such tools are inherently hard to evaluate because we do not have ground truth information about how ML models actually work. In this work, we propose to build transformer models manually as a testbed for interpretability research. We introduce Tracr, a "compiler" for translating human-readable programs into weights of a transformer model. Tracr takes code written in RASP, a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
3
0

Year Published

2023
2023
2023
2023

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(3 citation statements)
references
References 11 publications
0
3
0
Order By: Relevance
“…In a recent and related work, Lindner et al [2023] suggests using transformer networks as programmable units and introduces a compiler called Tracr which utilizes RASP. However, the expressivity limitations and unclear Turing completeness of the language are discussed in Weiss et al [2021], Merrill et al [2022], Lindner et al [2023].…”
Section: Prior Workmentioning
confidence: 99%
See 1 more Smart Citation
“…In a recent and related work, Lindner et al [2023] suggests using transformer networks as programmable units and introduces a compiler called Tracr which utilizes RASP. However, the expressivity limitations and unclear Turing completeness of the language are discussed in Weiss et al [2021], Merrill et al [2022], Lindner et al [2023].…”
Section: Prior Workmentioning
confidence: 99%
“…In a recent and related work, Lindner et al [2023] suggests using transformer networks as programmable units and introduces a compiler called Tracr which utilizes RASP. However, the expressivity limitations and unclear Turing completeness of the language are discussed in Weiss et al [2021], Merrill et al [2022], Lindner et al [2023]. Our approach, in contrast, demonstrates the potential of transformer networks to serve as universal computers, enabling the implementation of arbitrary nonlinear functions and emulating iterative, non-linear algorithms.…”
Section: Prior Workmentioning
confidence: 99%
“…Interpretation methods have been actively developing recently due to the various real-world applications of neural networks and the need to debug and maintain systems based on them. Especially, the Transformer architecture (Vaswani et al, 2017) demonstrates state-of-the-art performance in natural language processing and other modalities, representing a field for the development of interpretability methods (Elhage et al, 2021;Weiss et al, 2021;Zhou et al, 2023;Lindner et al, 2023). Based on active research on a computational model behind the transformer architecture, recent works propose a way to learn models that are fully interpretable by design (Friedman et al, 2023).…”
Section: Introductionmentioning
confidence: 99%