Pretraining with Artificial Language: Studying Transferable Knowledge in Language Models

Ri, Ryokan; Tsuruoka, Yoshimasa

doi:10.18653/v1/2022.acl-long.504

Cited by 14 publications

(10 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…47 We also note that Huang et al have used atomic energies to train NN-based atomistic models. 69 In a wider perspective, the pre-training of NN models is a well-documented approach in the ML literature for various applications and domains, [70][71][72][73][74] and it has very recently been described in the context of interatomic potential models, 47,75,76 property prediction with synthetic pre-training data, 77 and as a means to learn generalpurpose representations for atomistic structure. 76…”

Section: Digital Discovery Accepted Manuscriptmentioning

confidence: 99%

Synthetic data enable experiments in atomistic machine learning

2023

View full text Add to dashboard Cite

show abstract

Section: Digital Discovery Accepted Manuscriptmentioning

confidence: 99%

Synthetic data enable experiments in atomistic machine learning

2023

View full text Add to dashboard Cite

show abstract

“…Recent success of transfer learning shows that pre-training (or continue pre-training) with similar source tasks can help better solve downstream target task (e.g., question answering (Khashabi et al, 2020;Liu et al, 2021b), face verification (Cao et al, 2013), and general NLU tasks (Pruksachatkun et al, 2020)). Some previous work in cross-lingual transfer learning empirically observed that the model can transfer some knowledge beyond vocabulary (Artetxe et al, 2020;Ri & Tsuruoka, 2022), but they did not consider to exclude the influence from other potential factors. Our results can serve as stronger evidence for the reason to the success of transfer learning, that in addition to transferring some surface patterns, the better target performance can also benefit from similar abstract concepts learned from source tasks.…”

Section: A Discussionmentioning

confidence: 99%

Does Deep Learning Learn to Abstract? A Systematic Probing Framework

An¹,

Lin²,

Chen³

et al. 2023

Preprint

View full text Add to dashboard Cite

ion is a desirable capability for deep learning models, which means to induce abstract concepts from concrete instances and flexibly apply them beyond the learning context. At the same time, there is a lack of clear understanding about both the presence and further characteristics of this capability in deep learning models. In this paper, we introduce a systematic probing framework to explore the abstraction capability of deep learning models from a transferability perspective. A set of controlled experiments are conducted based on this framework, providing strong evidence that two probed pre-trained language models (PLMs), T5 and GPT2, have the abstraction capability. We also conduct in-depth analysis, thus shedding further light: (1) the whole training phase exhibits a "memorize-thenabstract" two-stage process; (2) the learned abstract concepts are gathered in a few middle-layer attention heads, rather than evenly distributed throughout the model;(3) the probed abstraction capabilities exhibit robustness against concept mutations, and are more robust to low-level/source-side mutations than high-level/target-side ones; (4) generic pre-training is critical to the emergence of abstraction capability, and PLMs exhibit better abstraction with larger model sizes and data scales.

show abstract

“…This makes understanding what the model can do much easier. There is a growing number of studies that have made use of artificial language corpora to understand the representations learned by complex models (Asr & Jones, 2017;Elman, 1990Elman, , 1991Elman, , 1993Frank et al, 2009;Mao et al, 2022;Perruchet & Vinter, 1998;Ravfogel et al, 2019;Ri & Tsuruoka, 2022;Rohde & Plaut, 1999;Rubin et al, 2014;St. Clair et al, 2009;Tabullo et al, 2012;Wang & Eisner, 2016;White & Cotterell, 2021;Willits, 2013).…”

Section: A World For Wordsmentioning

confidence: 99%

Spatial versus graphical representation of distributional semantic knowledge.

Mao,

Huebner,

Willits

2024

Psychological Review

View full text Add to dashboard Cite

Spatial distributional semantic models represent word meanings in a vector space. While able to model many basic semantic tasks, they are limited in many ways, such as their inability to represent multiple kinds of relations in a single semantic space and to directly leverage indirect relations between two lexical representations. To address these limitations, we propose a distributional graphical model that encodes lexical distributional data in a graphical structure and uses spreading activation for determining the plausibility of word sequences. We compare our model to existing spatial and graphical models by systematically varying parameters that contributing to dimensions of theoretical interest in semantic modeling. In order to be certain about what the models should be able to learn, we trained each model on an artificial corpus describing events in an artificial world simulation containing experimentally controlled verb-noun selectional preferences. The task used for model evaluation requires recovering observed selectional preferences and inferring semantically plausible but never observed verb-noun pairs. We show that the distributional graphical model performed better than all other models. Further, we argue that the relative success of this model comes from its improved ability to access the different orders of spatial representations with the spreading activation on the graph, enabling the model to infer the plausibility of noun-verb pairs unobserved in the training data. The model integrates classical ideas of representing semantic knowledge in a graph with spreading activation and more recent trends focused on the extraction of lexical distributional data from large natural language corpora.

show abstract

Pretraining with Artificial Language: Studying Transferable Knowledge in Language Models

Cited by 14 publications

References 0 publications

Synthetic data enable experiments in atomistic machine learning

Synthetic data enable experiments in atomistic machine learning

Does Deep Learning Learn to Abstract? A Systematic Probing Framework

Spatial versus graphical representation of distributional semantic knowledge.

Contact Info

Product

Resources

About