Our system is currently under heavy load due to increased usage. We're actively working on upgrades to improve performance. Thank you for your patience.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) 2020
DOI: 10.18653/v1/2020.emnlp-main.204
|View full text |Cite
|
Sign up to set email alerts
|

Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages

Abstract: Multilingual transformer models like mBERT and XLM-RoBERTa have obtained great improvements for many NLP tasks on a variety of languages. However, recent works also showed that results from high-resource languages could not be easily transferred to realistic, low-resource scenarios. In this work, we study trends in performance for different amounts of available resources for the three African languages Hausa, isiXhosa and Yorùbá on both NER and topic classification. We show that in combination with transfer le… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

4
64
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 42 publications
(68 citation statements)
references
References 24 publications
4
64
0
Order By: Relevance
“…A comparison concerning the number of shots (K), based on the few-shot results in Table 2 and Figure 2, reveals that the buckets largely improve model performance on a majority of tasks (MLDoc, MARC, POS, NER) over zero-shot results. This is in line with prior work (Lauscher et al, 2020;Hedderich et al, 2020) and follows the success of work on using bootstrapped data (Chaudhary et al, 2019 In general, we observe that: 1) 1-shot buckets bring the largest relative performance improvement over ZS-XLT; 2) the gains follow the increase of K, but with diminishing returns; 3) the performance variance across the 40 buckets decreases as K increases. These observations are more pronounced for POS and NER; e.g., 1-shot EN to Urdu (UR) POS transfer shows gains of ≈22 F 1 points (52.40 with zero-shot, 74.95 with 1-shot).…”
Section: Target-adapting Resultssupporting
confidence: 88%
“…A comparison concerning the number of shots (K), based on the few-shot results in Table 2 and Figure 2, reveals that the buckets largely improve model performance on a majority of tasks (MLDoc, MARC, POS, NER) over zero-shot results. This is in line with prior work (Lauscher et al, 2020;Hedderich et al, 2020) and follows the success of work on using bootstrapped data (Chaudhary et al, 2019 In general, we observe that: 1) 1-shot buckets bring the largest relative performance improvement over ZS-XLT; 2) the gains follow the increase of K, but with diminishing returns; 3) the performance variance across the 40 buckets decreases as K increases. These observations are more pronounced for POS and NER; e.g., 1-shot EN to Urdu (UR) POS transfer shows gains of ≈22 F 1 points (52.40 with zero-shot, 74.95 with 1-shot).…”
Section: Target-adapting Resultssupporting
confidence: 88%
“…Moreover, data quality for low-resource, even for unlabeled data, might not be comparable to data from high-resource languages. Alabi et al (2020) found that word embeddings trained on larger amounts of unlabeled data from low-resource languages are not competitive to embeddings trained on smaller, but curated data sources.…”
Section: Pre-trained Language Representationsmentioning
confidence: 94%
“…This distant supervision using information from external knowledge sources can be seen as a subset of the more general approach of labeling rules. These encompass also other ideas like reg-ex rules or simple programming functions (Ratner et al, 2017;Zheng et al, 2019;Adelani et al, 2020;Hedderich et al, 2020;Lison et al, 2020;Ren et al, 2020;Karamanolakis et al, 2021).…”
Section: Distant and Weak Supervisionmentioning
confidence: 99%
See 2 more Smart Citations