“…A widely explored transfer scenario is zero-shot crosslingual transfer (Pires et al, 2019;Conneau and Lample, 2019;Artetxe and Schwenk, 2019), where a pretrained encoder is finetuned on abundant task data in the source language (e.g., English) and then directly evaluated on target-language test data, achieving surprisingly good performance (Wu and Dredze, 2019;Hu et al, 2020). However, there is evidence that zero-shot performance reported in the literature has large variance and is often not reproducible (Keung et al, 2020a;Rios et al, 2020); the results in languages distant from English fall far short of those similar to English (Hu et al, 2020;Liang et al, 2020). Lauscher et al (2020) stress the importance of few-shot crosslingual transfer instead, where the encoder is first finetuned on a source language and then further finetuned with a small amount (10-100) of examples (few shots) of the target language.…”