Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space

Geva, Mor; Caciularu, Avi; Wang, Kevin Ro; Goldberg, Yoav

doi:10.48550/arxiv.2203.14680

Cited by 9 publications

(18 citation statements)

References 21 publications

(29 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…LM-Debugger establishes a framework for interpreting a token's representation and updates applied to it at each layer in the network. This framework builds upon recent findings by Geva et al (2022), who viewed the token representation as a changing distribution over the output vocabulary, and the output from each FFN layer as a collection of weighted sub-updates to that distribution, which are often interpretable to humans. We next elaborate on the findings we rely on at this work.…”

Section: Underlying Interpretation Methodsmentioning

confidence: 98%

“…where x i is the output from the preceding multihead self-attention layer, and x i is the updated token representation (Vaswani et al, 2017). Geva et al (2022) proposed an interpretation method for these updates in terms of the vocabulary, which we employ as the backbone of LM-Debugger and describe in detail next.…”

Section: Underlying Interpretation Methodsmentioning

confidence: 99%

“…A bottom-up approach is to observe the dominant sub-updates for specific examples, and apply interventions on them. A sub-update can be interpreted by inspecting the top-tokens in the projection of its corresponding value vector to the vocabulary (Geva et al, 2022). For convenience, we let the user assign names to value vectors.…”

Section: Prediction Viewmentioning

confidence: 99%

“…Recent work (Elhage et al, 2021;Geva et al, 2022) suggested that the construction process of LM predictions can be viewed as a sequence of * Work done during an internship at AI2. updates to the token representation.…”

Section: Introductionmentioning

confidence: 99%

“…updates to the token representation. Specifically, Geva et al (2022) showed that updates by the feedforward network (FFN) layers, one of the building blocks of transformers (Vaswani et al, 2017), can be decomposed into weighted collections of subupdates, each induced by a FFN parameter vector, that can be interpreted in the vocabulary space.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

LM-Debugger: An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models

Geva¹,

Caciularu²,

Dar³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

The opaque nature and unexplained behavior of transformer-based language models (LMs) have spurred a wide interest in interpreting their predictions. However, current interpretation methods mostly focus on probing models from outside, executing behavioral tests, and analyzing salience input features, while the internal prediction construction process is largely not understood. In this work, we introduce LM-Debugger, an interactive debugger tool for transformer-based LMs, which provides a fine-grained interpretation of the model's internal prediction process, as well as a powerful framework for intervening in LM behavior. For its backbone, LM-Debugger relies on a recent method that interprets the inner token representations and their updates by the feed-forward layers in the vocabulary space. We demonstrate the utility of LM-Debugger for single-prediction debugging, by inspecting the internal disambiguation process done by GPT2. Moreover, we show how easily LM-Debugger allows to shift model behavior in a direction of the user's choice, by identifying a few vectors in the network and inducing effective interventions to the prediction process. We release LM-Debugger as an open-source tool and a demo over GPT2 models.

show abstract

Section: Underlying Interpretation Methodsmentioning

confidence: 98%

Section: Underlying Interpretation Methodsmentioning

confidence: 99%

Section: Prediction Viewmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

LM-Debugger: An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models

Geva¹,

Caciularu²,

Dar³

et al. 2022

Preprint

Self Cite

View full text Add to dashboard Cite

show abstract

Future Shock: Generative AI and the International AI Policy and Governance Crisis

Leslie,

Perini

2024

Harvard Data Science Review

View full text Add to dashboard Cite

stakeholders from across industry, academia, government, and civil society, and from around the globe, had made concerted efforts to develop standards, policies, and governance mechanisms to ensure the ethical, responsible, and equitable production and use of AI systems.However, as we then show, despite these ostensibly supportive activities and background conditions, several primary drivers of future shock converged to produce an international AI policy and governance crisis in the wake of the dawning of the GenAI era. Such a crisis, we argue, was marked by the disconnect between the strengthening thrust of public concerns about the hazards posed by the hasty industrial scaling of GenAI and the absence of effectual regulatory mechanisms and needed policy interventions to address such hazards. In painting a broad-stroked picture of this crisis, we underscore two sets of contributing factors. First, there have been factors that have demonstrated the absence of various vital aspects of AI policy and governance capability and execution-and thus the absence of key preconditions for readiness and resilience in managing technological transformation. These include prevalent enforcement gaps in existing digital-and data-related laws (e.g., intellectual property and data protection statutes), a lack of regulatory AI capacity, democratic deficits in the production of standards for trustworthy AI, and widespread evasionary tactics of ethic washing and state-enabled deregulation.Second, there have been factors that have significantly contributed to the presence of a new scale and order of systemic-, societal-, and biospheric-level risks and harms. Chief among these were the closely connected dynamics of unprecedented scaling and centralization that emerged as both drivers and by-products of the GenAI revolution. We focus, in particular, on model scaling and industrial scaling. Whereas the scaling of data, model size, and compute were linked to the emergence of serious model intrinsic risks deriving from the unfathomability of training data, model opacity and complexity, emergent model capabilities, and exponentially expanding compute costs, the rapid industrialization of FMs and GenAI systems meant the onset of a new scale of systemic risks that spanned the social, political, economic, cultural, and natural ecosystems in which these systems were embedded. The brute-force commercialization of GenAI ushered in a new age of widespread exposure in which increasing numbers of impacted people and communities at large were made susceptible to the risks and harms issuing from model scaling and to new possibilities for misuse, abuse, and cascading system-level effects.Alongside these aspects of model scaling and industrial scaling, patterns of economic and geopolitical centralization only further intensified conditions of future shock. The steering and momentum of these scaling dynamics lay largely in the hands of a few large tech corporations, which essentially controlled the data, compute, and skills and knowledge infrastructures r...

show abstract

Team ÚFAL at CMCL 2022 Shared Task: Figuring out the correct recipe for predicting Eye-Tracking features using Pretrained Language Models

Bhattacharya¹,

Kumar²,

Bojar³

2022

Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics

View full text Add to dashboard Cite

Recent research suggests that the feed-forward module within Transformers can be viewed as a collection of key-value memories, where the keys learn to capture specific patterns from the input based on the training examples. The values then combine the output from the 'memories' of the keys to generate predictions about the next token. This leads to an incremental process of prediction that gradually converges towards the final token choice near the output layers.This interesting perspective raises questions about how multilingual models might leverage this mechanism. Specifically, for autoregressive models trained on two or more languages, do all neurons (across layers) respond equally to all languages? No! Our hypothesis centers around the notion that during pretraining, certain model parameters learn strong language-specific features, while others learn more language-agnostic (shared across languages) features. To validate this, we conduct experiments utilizing parallel corpora of two languages that the model was initially pretrained on. Our findings reveal that the layers closest to the network's input or output tend to exhibit more language-specific behaviour compared to the layers in the middle.

show abstract

Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space

Cited by 9 publications

References 21 publications

LM-Debugger: An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models

LM-Debugger: An Interactive Tool for Inspection and Intervention in Transformer-Based Language Models

Future Shock: Generative AI and the International AI Policy and Governance Crisis

Team ÚFAL at CMCL 2022 Shared Task: Figuring out the correct recipe for predicting Eye-Tracking features using Pretrained Language Models

Contact Info

Product

Resources

About