Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages

Hedderich, Michael A.; Adelani, David Ifeoluwa; Zhu, Dawei; Alabi, Jesujoba O.; Markus, Udia; Klakow, Dietrich

doi:10.18653/v1/2020.emnlp-main.204

Cited by 42 publications

(68 citation statements)

References 24 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A comparison concerning the number of shots (K), based on the few-shot results in Table 2 and Figure 2, reveals that the buckets largely improve model performance on a majority of tasks (MLDoc, MARC, POS, NER) over zero-shot results. This is in line with prior work (Lauscher et al, 2020;Hedderich et al, 2020) and follows the success of work on using bootstrapped data (Chaudhary et al, 2019 In general, we observe that: 1) 1-shot buckets bring the largest relative performance improvement over ZS-XLT; 2) the gains follow the increase of K, but with diminishing returns; 3) the performance variance across the 40 buckets decreases as K increases. These observations are more pronounced for POS and NER; e.g., 1-shot EN to Urdu (UR) POS transfer shows gains of ≈22 F 1 points (52.40 with zero-shot, 74.95 with 1-shot).…”

Section: Target-adapting Resultssupporting

confidence: 88%

A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters

Zhao¹,

Zhu²,

Shareghi

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Few-shot crosslingual transfer has been shown to outperform its zero-shot counterpart with pretrained encoders like multilingual BERT. Despite its growing popularity, little to no attention has been paid to standardizing and analyzing the design of few-shot experiments. In this work, we highlight a fundamental risk posed by this shortcoming, illustrating that the model exhibits a high degree of sensitivity to the selection of few shots. We conduct a largescale experimental study on 40 sets of sampled few shots for six diverse NLP tasks across up to 40 languages. We provide an analysis of success and failure cases of few-shot transfer, which highlights the role of lexical features. Additionally, we show that a straightforward full model finetuning approach is quite effective for few-shot transfer, outperforming several state-of-the-art few-shot approaches. As a step towards standardizing few-shot crosslingual experimental designs, we make our sampled few shots publicly available. 1 * Equal contribution.

show abstract

Section: Target-adapting Resultssupporting

confidence: 88%

A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters

Zhao¹,

Zhu²,

Shareghi

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

show abstract

“…Moreover, data quality for low-resource, even for unlabeled data, might not be comparable to data from high-resource languages. Alabi et al (2020) found that word embeddings trained on larger amounts of unlabeled data from low-resource languages are not competitive to embeddings trained on smaller, but curated data sources.…”

Section: Pre-trained Language Representationsmentioning

confidence: 94%

“…This distant supervision using information from external knowledge sources can be seen as a subset of the more general approach of labeling rules. These encompass also other ideas like reg-ex rules or simple programming functions (Ratner et al, 2017;Zheng et al, 2019;Adelani et al, 2020;Hedderich et al, 2020;Lison et al, 2020;Ren et al, 2020;Karamanolakis et al, 2021).…”

Section: Distant and Weak Supervisionmentioning

confidence: 99%

“…Mekala et al (2020) leverage meta-data for text classification and Huber and Carenini (2020) build a discourse-structure dataset using guidance from sentiment annotations. For topic classification, heuristics can be used in combination with inputs from other classifiers like NER (Bach et al, 2019) or from entity lists (Hedderich et al, 2020). For some classification tasks, the labels can be rephrased with simple rules into sentences.…”

Section: Distant and Weak Supervisionmentioning

confidence: 99%

“…Hu et al (2020) showed, however, that there is still a large gap between low and high-resource setting. Lauscher et al (2020) and Hedderich et al (2020) proposed adding a minimal amount of target-task and -language data (in the range of 10 to 100 labeled sentences) which resulted in a significant boost in performance for classification in low-resource languages.…”

Section: Multilingual Language Modelsmentioning

confidence: 99%

See 2 more Smart Citations

A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios

Hedderich¹,

Lange²,

Adel³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

Self Cite

137

View full text Add to dashboard Cite

Deep neural networks and huge language models are becoming omnipresent in natural language applications. As they are known for requiring large amounts of training data, there is a growing body of work to improve the performance in low-resource settings. Motivated by the recent fundamental changes towards neural models and the popular pre-train and fine-tune paradigm, we survey promising approaches for low-resource natural language processing. After a discussion about the different dimensions of data availability, we give a structured overview of methods that enable learning when training data is sparse. This includes mechanisms to create additional labeled data like data augmentation and distant supervision as well as transfer learning settings that reduce the need for target supervision. A goal of our survey is to explain how these methods differ in their requirements as understanding them is essential for choosing a technique suited for a specific low-resource setting. Further key aspects of this work are to highlight open issues and to outline promising directions for future research.

show abstract

Evaluating Automated and Hybrid Neural Disambiguation for African Historical Named Entities

Dunn¹,

Suleman²

2022

Artificial Intelligence Research

View full text Add to dashboard Cite

Documents detailing South African history contain ambiguous names. Ambiguous names may be due to people having the same name or the same person being referred to by multiple different names. Thus when searching for or attempting to extract information about a particular person, the name used may affect the results. This problem may be alleviated by using a Named Entity Disambiguation (NED) system to disambiguate names by linking them to a knowledge base. In recent years, transformer-based language models have led to improvements in NED systems. Furthermore, multilingual language models have shown the ability to learn concepts across languages, reducing the amount of training data required in low-resource languages. Thus a multilingual language model-based NED system was developed to disambiguate people's names within a historical South African context using documents written in English and isiZulu from the 500 Year Archive (FHYA). The multilingual language model-based system substantially improved on a probabilitybased baseline and achieved a micro F1-score of 0.726. At the same time, the entity linking component was able to link 81.9% of the mentions to the correct entity. However, the system's performance on documents written in isiZulu was significantly lower than on the documents written in English. Thus the system was augmented with handcrafted rules to improve its performance. The addition of handcrafted rules resulted in a small but significant improvement in performance when compared to the unaugmented NED system.

show abstract

Transfer Learning and Distant Supervision for Multilingual Transformer Models: A Study on African Languages

Cited by 42 publications

References 24 publications

A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters

A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters

A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios

Evaluating Automated and Hybrid Neural Disambiguation for African Historical Named Entities

Contact Info

Product

Resources

About