Shaking Syntactic Trees on the Sesame Street: Multilingual Probing with Controllable Perturbations

Taktasheva, Ekaterina; Mikhailov, Vladislav; Artemova, Ekaterina

doi:10.18653/v1/2021.mrl-1.17

Cited by 5 publications

(3 citation statements)

References 54 publications

(32 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…There have been several different approaches to quantifying the linguistic information that is learned by multilingual models. One direction has performed layer-wise analyses to quantify what information is stored at different layers in the model (de Vries et al, 2020;Taktasheva et al, 2021;Papadimitriou et al, 2021). Others have examined the extent to which the different training languages are captured by the model, finding that some languages suffer in the multilingual setting despite overall good performance from the models (Conneau et al, 2020a;.…”

Section: Related Work Linguistic Knowledge In Multilingual Modelsmentioning

confidence: 99%

Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models

Blevins¹,

Gonen²,

Zettlemoyer³

2022

Preprint

View full text Add to dashboard Cite

The emergent cross-lingual transfer seen in multilingual pretrained models has sparked significant interest in studying their behavior. However, because these analyses have focused on fully trained multilingual models, little is known about the dynamics of the multilingual pretraining process. We investigate when these models acquire their in-language and crosslingual abilities by probing checkpoints taken from throughout XLM-R pretraining, using a suite of linguistic tasks. Our analysis shows that the model achieves high in-language performance early on, with lower-level linguistic skills acquired before more complex ones. In contrast, when the model learns to transfer cross-lingually depends on the language pair. Interestingly, we also observe that, across many languages and tasks, the final, converged model checkpoint exhibits significant performance degradation and that no one checkpoint performs best on all languages. Taken together with our other findings, these insights highlight the complexity and interconnectedness of multilingual pretraining.

show abstract

Section: Related Work Linguistic Knowledge In Multilingual Modelsmentioning

confidence: 99%

Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models

Blevins¹,

Gonen²,

Zettlemoyer³

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…This work follows the same experimental direction where text perturbations serve to explore the sensitivity of language models to specific phenomena (Futrell et al, 2019;Ettinger, 2020;Taktasheva et al, 2021;Dankers et al, 2021). It has been shown, for example, that shuffling word order causes significant performance drops on a wide range of QA tasks (Si et al, 2019;Sugawara et al, 2019), but that state-of-the-art NLU models are not sensitive to word order (Pham et al, 2020;.…”

Section: Related Workmentioning

confidence: 99%

Proceedings of the 11th Joint Conference on Lexical and Computational Semantics

2022

View full text Add to dashboard Cite

In recent years, NLP has made what appears to be incredible progress, with performance even surpassing human performance on some benchmarks. How should we interpret these advances? Have these models achieved language "understanding"? Operating on the premise that "understanding" will necessarily involve the capacity to extract and deploy meaning information, in this talk I will discuss a series of projects leveraging targeted tests to examine NLP models' ability to capture meaning in a systematic fashion. I will first discuss work probing model representations for compositional meaning, with a particular focus on disentangling compositional information from encoding of lexical properties. I'll then explore models' ability to extract and use meaning information when executing the basic pretraining task of word prediction in context. In all cases, these investigations apply tests that prioritize control of unwanted cues, so as to target the desired model capabilities with greater precision. The results of these studies suggest that although models show a good deal of sensitivity to word-level information, and to certain semantic and syntactic distinctions, when subjected to controlled tests they show little sign of representing higher-level compositional meaning, or of being able to retain and deploy such information robustly during word prediction. Instead, models show signs of heuristic predictive strategies that are unsurprising given their training, but that differ critically from systematic understanding of meaning. I will discuss potential implications of these findings with respect to the goals of achieving "understanding" with currently dominant pre-training paradigms.Bio: Allyson Ettinger is an Assistant Professor in the Department of Linguistics at the University of Chicago. Her interdisciplinary work combines methods and insights from cognitive science, linguistics, and computer science to examine meaning extraction and predictive processes executed during language processing in artificial intelligence systems and in humans. She received her PhD in Linguistics from the University of Maryland, and spent a year as research faculty at the Toyota Technological Institute at Chicago (TTIC) before beginning her appointment at the University of Chicago. She holds an additional courtesy appointment at TTIC.

show abstract

“…The existing work devoted to the cross-lingual probing showed that grammatical knowledge of Transformer LMs is adapted to the downstream language; in the case of Russian, the interpretation of results cannot be easily explained (Ravishankar et al, 2019). However, LMs are more insensitive towards granular perturbations when processing texts in languages with free word order, such as Russian (Taktasheva et al, 2021).…”

Section: Introductionmentioning

confidence: 99%

Proceedings of the 9th Workshop on Slavic Natural Language Processing 2023 (SlavicNLP 2023)

2023

View full text Add to dashboard Cite

show abstract

Shaking Syntactic Trees on the Sesame Street: Multilingual Probing with Controllable Perturbations

Cited by 5 publications

References 54 publications

Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models

Analyzing the Mono- and Cross-Lingual Pretraining Dynamics of Multilingual Language Models

Proceedings of the 11th Joint Conference on Lexical and Computational Semantics

Proceedings of the 9th Workshop on Slavic Natural Language Processing 2023 (SlavicNLP 2023)

Contact Info

Product

Resources

About