From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers

Lauscher, Anne; Ravishankar, Vinit; Vulić, Ivan; Glavaš, Goran

doi:10.18653/v1/2020.emnlp-main.363

Cited by 198 publications

(273 citation statements)

References 51 publications

(67 reference statements)

Supporting

Mentioning

191

Contrasting

Order By: Relevance

“…This effectively means that M-BERT's subword vocabulary contains plenty of CMN-specific and YUE-specific subwords that are exploited by the encoder when producing M-BERT-based representations. Simultaneously, higher scores with M-BERT (and XLM in Table 13) are reported for resource-rich languages such as French, Spanish, and English, which are better represented in M-BERT's training data, while we observe large performance losses for lower-resource languages: These artifacts of massively multilingual training with M-BERT and XLM and lower performance in low-resource languages was further validated recently (Lauscher et al 2020;Wu and Dredze 2020). We also observe lower absolute scores (and a larger number of OOVs) for languages with very rich and productive morphological systems such as the two Slavic languages (Polish and Russian) and Finnish.…”

Section: Table 13supporting

confidence: 78%

Multi-SimLex: A Large-Scale Evaluation of Multilingual and Crosslingual Lexical Semantic Similarity

Vulić

Baker

Ponti

et al. 2021

Computational Linguistics

Self Cite

View full text Add to dashboard Cite

We introduce Multi-SimLex, a large-scale lexical resource and evaluation benchmark covering data sets for 12 typologically diverse languages, including major languages (e.g., Mandarin Chinese, Spanish, Russian) as well as less-resourced ones (e.g., Welsh, Kiswahili). Each language data set is annotated for the lexical relation of semantic similarity and contains 1,888 semantically aligned concept pairs, providing a representative coverage of word classes (nouns, verbs, adjectives, adverbs), frequency ranks, similarity intervals, lexical fields, and concreteness levels. Additionally, owing to the alignment of concepts across languages, we provide a suite of 66 cross-lingual semantic similarity data sets. Due to its extensive size and language coverage, Multi-SimLex provides entirely novel opportunities for experimental evaluation and analysis. On its monolingual and cross-lingual benchmarks, we evaluate and analyze a wide array of recent state-of-the-art monolingual and cross-lingual representation models, including static and contextualized word embeddings (such as fastText, monolingual and multilingual BERT, XLM), externally informed lexical representations, as well as fully unsupervised and (weakly) supervised cross-lingual word embeddings. We also present a step-by-step data set creation protocol for creating consistent, Multi-Simlex -style resources for additional languages. We make these contributions - the public release of Multi-SimLex data sets, their creation protocol, strong baseline results, and in-depth analyses which can be be helpful in guiding future developments in multilingual lexical semantics and representation learning - available via a website which will encourage community effort in further expansion of Multi-Simlex to many more languages. Such a large-scale semantic resource could inspire significant further advances in NLP across languages.

show abstract

Section: Table 13supporting

confidence: 78%

Multi-SimLex: A Large-Scale Evaluation of Multilingual and Crosslingual Lexical Semantic Similarity

Vulić

Baker

Ponti

et al. 2021

Computational Linguistics

Self Cite

View full text Add to dashboard Cite

show abstract

“…Wu and Dredze (2020) consider the performance on up to 99 languages for NER. In contrast, Lauscher et al (2020) show limitations of the zero-shot setting and Zhao et al (2020) observe poor performance of mBERT in reference-free machine translation evaluation. Prior work here focuses on investigating the degree of multilinguality, not the reasons for it.…”

Section: Related Workmentioning

confidence: 92%

Identifying Elements Essential for BERT’s Multilinguality

Dufter¹,

Schütze²

2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

It has been shown that multilingual BERT (mBERT) yields high quality multilingual representations and enables effective zero-shot transfer. This is surprising given that mBERT does not use any crosslingual signal during training. While recent literature has studied this phenomenon, the reasons for the multilinguality are still somewhat obscure. We aim to identify architectural properties of BERT and linguistic properties of languages that are necessary for BERT to become multilingual. To allow for fast experimentation we propose an efficient setup with small BERT models trained on a mix of synthetic and natural data. Overall, we identify four architectural and two linguistic elements that influence multilinguality. Based on our insights, we experiment with a multilingual pretraining setup that modifies the masking strategy using VecMap, i.e., unsupervised embedding alignment. Experiments on XNLI with three languages indicate that our findings transfer from our small setup to larger scale settings.

show abstract

“…For pretraining approaches where labeled data exists in a high-resource language, and the information is transferred to a lowresource language, Hu et al (2020) find a significant gap between performance on English and the cross-lingually transferred models. In a recent study, Lauscher et al (2020) find that the transfer for multilingual transformer models is less effective for resource-lean settings and distant languages. A popular technique to obtain labeled data quickly and cheaply is distant and weak supervision.…”

Section: Introductionmentioning

confidence: 99%

Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages

Hedderich

Adelani

Zhu

et al. 2020

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

View full text Add to dashboard Cite

Multilingual transformer models like mBERT and XLM-RoBERTa have obtained great improvements for many NLP tasks on a variety of languages. However, recent works also showed that results from high-resource languages could not be easily transferred to realistic, low-resource scenarios. In this work, we study trends in performance for different amounts of available resources for the three African languages Hausa, isiXhosa and Yorùbá on both NER and topic classification. We show that in combination with transfer learning or distant supervision, these models can achieve with as little as 10 or 100 labeled sentences the same performance as baselines with much more supervised training data. However, we also find settings where this does not hold. Our discussions and additional experiments on assumptions such as time and hardware restrictions highlight challenges and opportunities in low-resource learning.

show abstract

From Zero to Hero: On the Limitations of Zero-Shot Language Transfer with Multilingual Transformers

Cited by 198 publications

References 51 publications

Multi-SimLex: A Large-Scale Evaluation of Multilingual and Crosslingual Lexical Semantic Similarity

Multi-SimLex: A Large-Scale Evaluation of Multilingual and Crosslingual Lexical Semantic Similarity

Identifying Elements Essential for BERT’s Multilinguality

Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages

Contact Info

Product

Resources

About