2022
DOI: 10.48550/arxiv.2202.12312
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

Oolong: Investigating What Makes Crosslingual Transfer Hard with Controlled Studies

Abstract: Little is known about what makes crosslingual transfer hard, since factors like tokenization, morphology, and syntax all change at once between languages. To disentangle the impact of these factors, we propose a set of controlled transfer studies: we systematically transform GLUE tasks to alter different factors one at a time, then measure the resulting drops in a pretrained model's downstream performance. In contrast to prior work suggesting little effect from syntax on knowledge transfer, we find significant… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
3
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
3

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
(6 citation statements)
references
References 22 publications
(43 reference statements)
0
3
0
Order By: Relevance
“…This gives us a general idea of how well we can expect models to do with this amount of data and without being able to start from a coherent embedding matrix. Wu et al (2022b) show that there is a ceiling that models hit in fine-tuning when the embedding matrix is reinitialized, and as such we cannot expect state-of-the-art perplexity from any of these models.…”
Section: Control Baselinementioning
confidence: 95%
See 1 more Smart Citation
“…This gives us a general idea of how well we can expect models to do with this amount of data and without being able to start from a coherent embedding matrix. Wu et al (2022b) show that there is a ceiling that models hit in fine-tuning when the embedding matrix is reinitialized, and as such we cannot expect state-of-the-art perplexity from any of these models.…”
Section: Control Baselinementioning
confidence: 95%
“…We initialize the embedding matrix by randomly sampling with replacement from the rows of the old embedding matrix with 500 rows. We do this following Wu et al (2022b), who show that initializing an embedding matrix for transfer learning by sampling from the old embedding matrix leads to far better transfer learning than random reinitialization (see also Hewitt, 2021). This is likely since the vectors in the embedding matrix are in subspaces of the embedding space that the first layer of the mode is trained to expect.…”
Section: Fine-tuningmentioning
confidence: 99%
“…Despite the common findings stated above, there are contradictions in the results of a number of studies in which different experimental settings are used. Wu et al (2022) and Deshpande et al (2022) investigated the impact of word order by isolating it from other factors. In both works, language variants were created by randomly permutating, inversing, or consistently adapting word order to a different language via a dependency tree.…”
Section: Linguistic Similaritymentioning
confidence: 99%
“…This makes it hard to compare the aforementioned findings to results from Dufter andSchütze (2020) andK et al (2020) who solely evaluated on language variants with reversed or randomly permuted word order, respectively. Even if both latter works found evidence that word order impacts transfer performance, it is important to consider that Wu et al (2022) and Deshpande et al (2022) have comparable findings in similar settings but observed a less significant effect when switching to a more structured syntactic modification.…”
Section: Linguistic Similaritymentioning
confidence: 99%
“…Several analysis studies have put forth inconsistent conclusions about factors like subword overlap and typological similarity (Pires et al, 2019;Conneau et al, 2020b;Wu and Dredze, 2019;Hsu et al, 2019;Lin et al, 2019). Some recent studies (Deshpande et al, 2021;Wu et al, 2022;Dufter and Schütze, 2020;K et al, 2020) consider transfer in controlled settings, between natural languages and derived counterparts created by modifying specific linguistic aspects like script and word order. However, these methods only investigate the masked language-modeling (MLM) objective ), whereas we additionally analyze newer pretraining methods like XLM and DICT-MLM .…”
Section: Related Workmentioning
confidence: 99%