Oolong: Investigating What Makes Crosslingual Transfer Hard with Controlled Studies

Wu, Zhengxuan; Papadimitriou, Isabel; Tamkin, Alex

doi:10.48550/arxiv.2202.12312

Cited by 3 publications

(6 citation statements)

References 22 publications

(43 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This gives us a general idea of how well we can expect models to do with this amount of data and without being able to start from a coherent embedding matrix. Wu et al (2022b) show that there is a ceiling that models hit in fine-tuning when the embedding matrix is reinitialized, and as such we cannot expect state-of-the-art perplexity from any of these models.…”

Section: Control Baselinementioning

confidence: 95%

“…We initialize the embedding matrix by randomly sampling with replacement from the rows of the old embedding matrix with 500 rows. We do this following Wu et al (2022b), who show that initializing an embedding matrix for transfer learning by sampling from the old embedding matrix leads to far better transfer learning than random reinitialization (see also Hewitt, 2021). This is likely since the vectors in the embedding matrix are in subspaces of the embedding space that the first layer of the mode is trained to expect.…”

Section: Fine-tuningmentioning

confidence: 99%

See 1 more Smart Citation

Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models

Papadimitriou

Jurafsky

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

We propose transfer learning as a method for analyzing the encoding of grammatical structure in neural language models. We train LSTMs on non-linguistic data and evaluate their performance on natural language to assess which kinds of data induce generalizable structural features that LSTMs can use for natural language. We find that training on nonlinguistic data with latent structure (MIDI music or Java code) improves test performance on natural language, despite no overlap in surface form or vocabulary. To pinpoint the kinds of abstract structure that models may be encoding to lead to this improvement, we run similar experiments with two artificial parentheses languages: one which has a hierarchical recursive structure, and a control which has paired tokens but no recursion. Surprisingly, training a model on either of these artificial languages leads the same substantial gains when testing on natural language. Further experiments on transfer between natural languages controlling for vocabulary overlap show that zero-shot performance on a test language is highly correlated with typological syntactic similarity to the training language, suggesting that representations induced by pre-training correspond to the cross-linguistic syntactic properties. Our results provide insights into the ways that neural models represent abstract syntactic structure, and also about the kind of structural inductive biases which allow for natural language acquisition. 1

show abstract

Section: Control Baselinementioning

confidence: 95%

Section: Fine-tuningmentioning

confidence: 99%

Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models

Papadimitriou

Jurafsky

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

show abstract

“…Despite the common findings stated above, there are contradictions in the results of a number of studies in which different experimental settings are used. Wu et al (2022) and Deshpande et al (2022) investigated the impact of word order by isolating it from other factors. In both works, language variants were created by randomly permutating, inversing, or consistently adapting word order to a different language via a dependency tree.…”

Section: Linguistic Similaritymentioning

confidence: 99%

“…This makes it hard to compare the aforementioned findings to results from Dufter andSchütze (2020) andK et al (2020) who solely evaluated on language variants with reversed or randomly permuted word order, respectively. Even if both latter works found evidence that word order impacts transfer performance, it is important to consider that Wu et al (2022) and Deshpande et al (2022) have comparable findings in similar settings but observed a less significant effect when switching to a more structured syntactic modification.…”

Section: Linguistic Similaritymentioning

confidence: 99%

Towards a Common Understanding of Contributing Factors for Cross-Lingual Transfer in Multilingual Language Models: A Review

Fred¹,

Guo²,

Haddadan³

2023

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

View full text Add to dashboard Cite

In recent years, pre-trained Multilingual Language Models (MLLMs) have shown a strong ability to transfer knowledge across different languages. However, given that the aspiration for such an ability has not been explicitly incorporated in the design of the majority of MLLMs, it is challenging to obtain a unique and straightforward explanation for its emergence. In this review paper, we survey literature that investigates different factors contributing to the capacity of MLLMs to perform zero-shot cross-lingual transfer and subsequently outline and discuss these factors in detail. To enhance the structure of this review and to facilitate consolidation with future studies, we identify five categories of such factors. In addition to providing a summary of empirical evidence from past studies, we identify consensuses among studies with consistent findings and resolve conflicts among contradictory ones. Our work contextualizes and unifies existing research streams which aim at explaining the cross-lingual potential of MLLMs. This review provides, first, an aligned reference point for future research and, second, guidance for a better-informed and more efficient way of leveraging the crosslingual capacity of MLLMs.

show abstract

“…Several analysis studies have put forth inconsistent conclusions about factors like subword overlap and typological similarity (Pires et al, 2019;Conneau et al, 2020b;Wu and Dredze, 2019;Hsu et al, 2019;Lin et al, 2019). Some recent studies (Deshpande et al, 2021;Wu et al, 2022;Dufter and Schütze, 2020;K et al, 2020) consider transfer in controlled settings, between natural languages and derived counterparts created by modifying specific linguistic aspects like script and word order. However, these methods only investigate the masked language-modeling (MLM) objective ), whereas we additionally analyze newer pretraining methods like XLM and DICT-MLM .…”

Section: Related Workmentioning

confidence: 99%

ALIGN-MLM: Word Embedding Alignment is Crucial for Multilingual Pre-training

Tang¹,

Deshpande²,

Narasimhan³

2022

Preprint

View full text Add to dashboard Cite

Multilingual pre-trained models exhibit zeroshot cross-lingual transfer, where a model finetuned on a source language achieves surprisingly good performance on a target language. While studies have attempted to understand transfer, they focus only on MLM, and the large number of differences between natural languages makes it hard to disentangle the importance of different properties. In this work, we specifically highlight the importance of word embedding alignment by proposing a pretraining objective (ALIGN-MLM) whose auxiliary loss guides similar words in different languages to have similar word embeddings. ALIGN-MLM either outperforms or matches three widely adopted objectives (MLM, XLM, DICT-MLM) when we evaluate transfer between pairs of natural languages and their counterparts created by systematically modifying specific properties like the script. In particular, ALIGN-MLM outperforms XLM and MLM by 35 and 30 F1 points on POS-tagging for transfer between languages that differ both in their script and word order (left-to-right v.s. right-to-left). We also show a strong correlation between alignment and transfer for all objectives (e.g., ρ s = 0.727 for XNLI), which together with ALIGN-MLM's strong performance calls for explicitly aligning word embeddings for multilingual models.

show abstract

Oolong: Investigating What Makes Crosslingual Transfer Hard with Controlled Studies

Cited by 3 publications

References 22 publications

Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models

Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models

Towards a Common Understanding of Contributing Factors for Cross-Lingual Transfer in Multilingual Language Models: A Review

ALIGN-MLM: Word Embedding Alignment is Crucial for Multilingual Pre-training

Contact Info

Product

Resources

About