Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer 2021
DOI: 10.18653/v1/2021.acl-long.351
|View full text |Cite
|
Sign up to set email alerts
|

How to Adapt Your Pretrained Multilingual Model to 1600 Languages

Abstract: Pretrained multilingual models (PMMs) enable zero-shot learning via cross-lingual transfer, performing best for languages seen during pretraining. While methods exist to improve performance for unseen languages, they have almost exclusively been evaluated using amounts of raw text only available for a small fraction of the world's languages. In this paper, we evaluate the performance of existing methods to adapt PMMs to new languages using a resource available for over 1600 languages: the New Testament. This i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

1
24
1

Year Published

2021
2021
2024
2024

Publication Types

Select...
4
4
1

Relationship

1
8

Authors

Journals

citations
Cited by 19 publications
(33 citation statements)
references
References 35 publications
1
24
1
Order By: Relevance
“…2 Among the 7,000 languages in the world, mBERT only covers about 1% of the languages while Wikipedia and Com-monCrawl, the two most common resources used for pretraining and adaptation, only contain textual data from 4% of the languages (often in quite small quantities, partially because language IDs are difficult to obtain for low-resource languages (Caswell et al, 2020)). Ebrahimi and Kann (2021) show that continued pretraining of multilingual models on a small amount of Bible data can significantly improve the performance of uncovered languages. Although the Bible has much better language coverage of 23%, its relatively small data size and constrained domain limits its utility (see § 6)-and 70% of the world's languages do not even have this resource.…”
Section: Introductionmentioning
confidence: 94%
See 1 more Smart Citation
“…2 Among the 7,000 languages in the world, mBERT only covers about 1% of the languages while Wikipedia and Com-monCrawl, the two most common resources used for pretraining and adaptation, only contain textual data from 4% of the languages (often in quite small quantities, partially because language IDs are difficult to obtain for low-resource languages (Caswell et al, 2020)). Ebrahimi and Kann (2021) show that continued pretraining of multilingual models on a small amount of Bible data can significantly improve the performance of uncovered languages. Although the Bible has much better language coverage of 23%, its relatively small data size and constrained domain limits its utility (see § 6)-and 70% of the world's languages do not even have this resource.…”
Section: Introductionmentioning
confidence: 94%
“…MLM Continued pretraining on monolingual text D T mono = {x T i } i in the target language (Howard and Ruder, 2018;Gururangan et al, 2020) using a masked language model (MLM) objective has proven effective for adapting models to the target language (Pfeiffer et al, 2020). Notably, Ebrahimi and Kann (2021) show that using as little as several thousand sentences can significantly improve the model's performance on target languages not covered during pretraining.…”
Section: Adaptation With Textmentioning
confidence: 99%
“…If on the other hand retraining from scratch is not feasible, one option is to add new subword units for the underresourced/oversegmented languages. Wang et al (2020b) and Chau et al (2020) both propose such additions with randomly initialized embeddings, but these approaches did not perform well when studied by Ebrahimi and Kann (2021); extending the idea, propose to use information about existing subword units to estimate embeddings instead of initializing newly added units randomly (similar to Salesky et al (2018)). A different option is proposed by Wang et al (2021b), who instead force the model to use (already existing) smaller subword units in highresource languages like English to make the segmentations across languages more similar and thus aid transfer-thus avoiding the complete retraining that comes with changing the segmentation method or allocation.…”
Section: Shared Vocabularies In Multilingual Modelsmentioning
confidence: 99%
“…Furthermore, due to data sparsity, monolingual pretrained models are not likely to obtain good results for many lowresource languages. In those cases, multilingual models can zero-shot learn for unseen languages with an above-chance performance, which can be further improved via model adaptation with targetlanguage text (Wang et al, 2020a), even for limited amounts (Ebrahimi and Kann, 2021). However, it is poorly understood how the number of pretraining languages influences performance in those cases.…”
Section: Introductionmentioning
confidence: 99%