Pretraining on Interactions for Learning Grounded Affordance Representations

Merullo, Jack; Ebert, Dylan; Eickhoff, Carsten; Pavlick, Ellie

doi:10.18653/v1/2022.starsem-1.23

Cited by 8 publications

(10 citation statements)

References 34 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Given the connection between facts learned in pretraining and the MLP layers (Geva et al, 2021;Meng et al, 2023;Merullo et al, 2023), it's possible that tuning attention alone is not enough to see higher performance in this setting.…”

Section: Results Of Interventions On the World Capital Datasetmentioning

confidence: 99%

“…This naturally sets up our study, which also considers attention heads as the source of the competing effect between copying the counterfactual from earlier in context vs. extracting the memorized fact from an earlier subject token. A core technique in these works is projecting activations from model components into the vocabulary space to make claims about their roles, which we generically refer to here as logit attribution (Nostalgebraist, 2020;Wang et al, 2022;Merullo et al, 2023;Belrose et al, 2023;Dar et al, 2022;Millidge and Black, 2022). We leverage this technique to localize attention heads which tend to promote either context or memorized information ( §6).…”

Section: Related Workmentioning

confidence: 99%

See 1 more Smart Citation

Characterizing Mechanisms for Factual Recall in Language Models

Yu,

Merullo,

Pavlick

2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

Language Models (LMs) often must integrate facts they memorized in pretraining with new information that appears in a given context. These two sources can disagree, causing competition within the model, and it is unclear how an LM will resolve the conflict. On a dataset that queries for knowledge of world capitals, we investigate both distributional and mechanistic determinants of LM behavior in such situations. Specifically, we measure the proportion of the time an LM will use a counterfactual prefix (e.g., "The capital of Poland is London") to overwrite what it learned in pretraining ("Warsaw"). On Pythia and GPT2, the training frequency of both the query country ("Poland") and the in-context city ("London") highly affect the models' likelihood of using the counterfactual. We then use head attribution to identify individual attention heads that either promote the memorized answer or the in-context answer in the logits. By scaling up or down the value vector of these heads, we can control the likelihood of using the in-context answer on new data. This method can increase the rate of generating the in-context answer to 88% of the time simply by scaling a single head at runtime. Our work contributes to a body of evidence showing that we can often localize model behaviors to specific components and provides a proof of concept for how future methods might control model behavior dynamically at runtime.

show abstract

Section: Results Of Interventions On the World Capital Datasetmentioning

confidence: 99%

Section: Related Workmentioning

confidence: 99%

Characterizing Mechanisms for Factual Recall in Language Models

Yu,

Merullo,

Pavlick

2023

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…These were conceptual neurons in which the distinction between image and text tended to be overcome (Goh et al 2021). Multimodality, at the neural level, is really panmodality, suggesting a semantics without clearly differentiated sign systems (this is also suggested by Merullo et al 2022). Dumb meaning finds a new quality here, and is not tied to either text or image data, but encompasses both in a way that points to meaning beyond modal separation -and again has nothing to do with mind (see for more on this Bajohr 2024c).…”

mentioning

confidence: 78%

Artificial Intelligence - Intelligent Art?

2024

Digitale Gesellschaft

View full text Add to dashboard Cite

As algorithmic data processing increasingly pervades everyday life, it is also making its way into the worlds of art, literature and music. In doing so, it shifts notions of creativity and evokes non-anthropocentric perspectives on artistic practice. This volume brings together contributions from the fields of cultural studies, literary studies, musicology and sound studies as well as media studies, sociology of technology, and beyond, presenting a truly interdisciplinary, state-of-the-art picture of the transformation of creative practice brought about by various forms of AI.

show abstract

“…The pragmatic methods in Section 4 are also compatible with LLMs, e.g., Liu et al (2023) combine RSA with meta-learning to apply GPT models in an image reference game setting; FAIR et al (2022) use a large BART model (Lewis et al, 2020) in conjunction with a multi-agent planning procedure in the grounded dialogue game of Diplomacy. As grounded LLM adapters (Alayrac et al, 2022;Merullo et al, 2023;Eichenberg et al, 2022;Koh et al, 2023) continue to improve, we expect to see more work applying LLMs as components of pragmatic models for these grounding tasks.…”

Section: Pragmatic Modeling and Llmsmentioning

confidence: 99%

Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling Approaches

Fried,

Tomlin,

et al. 2023

Findings of the Association for Computational Linguistics: EMNLP 2023

View full text Add to dashboard Cite

People rely heavily on context to enrich meaning beyond what is literally said, enabling concise but effective communication. To interact successfully and naturally with people, LLMs and other user-facing NLP systems will require similar skills in pragmatics: relying on various types of context-from shared linguistic goals and conventions, to the visual and embodied world-to use language effectively. We survey existing grounded settings and pragmatic modeling approaches and analyze how the task goals, environmental contexts, and communicative affordances in each work enrich linguistic meaning. We present recommendations for future grounded task design to naturally elicit pragmatic phenomena, and suggest directions that focus on a broader range of communicative contexts and affordances.Paul Bloom. 2002. How children learn the meanings of words. MIT press.

show abstract

Pretraining on Interactions for Learning Grounded Affordance Representations

Cited by 8 publications

References 34 publications

Characterizing Mechanisms for Factual Recall in Language Models

Characterizing Mechanisms for Factual Recall in Language Models

Artificial Intelligence - Intelligent Art?

Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling Approaches

Contact Info

Product

Resources

About