Unsupervised Cross-Lingual Part-of-Speech Tagging for Truly Low-Resource Scenarios

Eskander, Ramy; Muresan, Smaranda; Collins, Michael

doi:10.18653/v1/2020.emnlp-main.391

Cited by 19 publications

(32 citation statements)

References 31 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While the above works deal with generally improving cross-lingual representations, task-specific cross-lingual systems often show strong performance in a zero-shot setting. For POS tagging, in a similar setting to our work, Eskander et al (2020) achieve strong zero-shot results by using unsupervised projection (Yarowsky et al, 2001) with aligned Bibles. Recent work for cross-lingual NER includes Mayhew et al (2017) who use dictionary translations to create target-language training data, as well as Xie et al (2018) who use a bilingual dictionary in addition to self-attention.…”

Section: Introductionmentioning

confidence: 89%

How to Adapt Your Pretrained Multilingual Model to 1600 Languages

Ebrahimi¹,

Kann²

2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

Pretrained multilingual models (PMMs) enable zero-shot learning via cross-lingual transfer, performing best for languages seen during pretraining. While methods exist to improve performance for unseen languages, they have almost exclusively been evaluated using amounts of raw text only available for a small fraction of the world's languages. In this paper, we evaluate the performance of existing methods to adapt PMMs to new languages using a resource available for over 1600 languages: the New Testament. This is challenging for two reasons: (1) the small corpus size, and (2) the narrow domain. While performance drops for all approaches, we surprisingly still see gains of up to 17.69% accuracy for part-ofspeech tagging and 6.29 F1 for NER on average over all languages as compared to XLM-R. Another unexpected finding is that continued pretraining, the simplest approach, performs best. Finally, we perform a case study to disentangle the effects of domain and size and to shed light on the influence of the finetuning source language.

show abstract

Section: Introductionmentioning

confidence: 89%

How to Adapt Your Pretrained Multilingual Model to 1600 Languages

Ebrahimi¹,

Kann²

2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

show abstract

“…This approach can, therefore, be seen as a form of distant supervision specific for obtaining labeled data for lowresource languages. Cross-lingual projections have been applied in low-resource settings for tasks, such as POS tagging and parsing (Täckström et al, 2013;Wisniewski et al, 2014;Plank and Agić, 2018;Eskander et al, 2020). Sources for parallel text can be the OPUS project (Tiedemann, 2012), Bible corpora (Mayer and Cysouw, 2014;Christodoulopoulos and Steedman, 2015) or the recent JW300 corpus (Agić and Vulić, 2019).…”

Section: Cross-lingual Annotation Projectionsmentioning

confidence: 99%

A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios

Hedderich¹,

Lange²,

Adel³

et al. 2021

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Langua

136

View full text Add to dashboard Cite

Deep neural networks and huge language models are becoming omnipresent in natural language applications. As they are known for requiring large amounts of training data, there is a growing body of work to improve the performance in low-resource settings. Motivated by the recent fundamental changes towards neural models and the popular pre-train and fine-tune paradigm, we survey promising approaches for low-resource natural language processing. After a discussion about the different dimensions of data availability, we give a structured overview of methods that enable learning when training data is sparse. This includes mechanisms to create additional labeled data like data augmentation and distant supervision as well as transfer learning settings that reduce the need for target supervision. A goal of our survey is to explain how these methods differ in their requirements as understanding them is essential for choosing a technique suited for a specific low-resource setting. Further key aspects of this work are to highlight open issues and to outline promising directions for future research.

show abstract

“…While multi-lingual Transformer-based models, e.g. mBERT (Devlin et al, 2019) and XLM-R (Conneau et al, 2020), are widely applied in cross-lingual and multi-lingual NLP tasks 2 Keung et al, 2019;Eskander et al, 2020), no attempt has been made to extend the findings on the aforementioned mono-lingual research to this context. In this paper, we explore the roles of attention heads in cross-lingual and multi-lingual tasks for two reasons.…”

Section: Introductionmentioning

confidence: 99%

Contributions of Transformer Attention Heads in Multi- and Cross-lingual Tasks

Ma¹,

Zhang²,

Lou³

et al. 2021

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Confer

View full text Add to dashboard Cite

This paper studies the relative importance of attention heads in Transformer-based models to aid their interpretability in cross-lingual and multi-lingual tasks. Prior research has found that only a few attention heads are important in each mono-lingual Natural Language Processing (NLP) task and pruning the remaining heads leads to comparable or improved performance of the model. However, the impact of pruning attention heads is not yet clear in cross-lingual and multi-lingual tasks. Through extensive experiments, we show that (1) pruning a number of attention heads in a multilingual Transformer-based model has, in general, positive effects on its performance in cross-lingual and multi-lingual tasks and (2) the attention heads to be pruned can be ranked using gradients and identified with a few trial experiments. Our experiments focus on sequence labeling tasks, with potential applicability on other cross-lingual and multi-lingual tasks. For comprehensiveness, we examine two pre-trained multi-lingual models, namely multi-lingual BERT (mBERT) and XLM-R, on three tasks across 9 languages each. We also discuss the validity of our findings and their extensibility to truly resource-scarce languages and other task settings. * Equal contribution. † Work done when interning at the Minds, Machines, and Society Lab at Dartmouth College. 1 We regard single-source machine translation as a monolingual task since the inputs to the models are mono-lingual.

show abstract

Unsupervised Cross-Lingual Part-of-Speech Tagging for Truly Low-Resource Scenarios

Cited by 19 publications

References 31 publications

How to Adapt Your Pretrained Multilingual Model to 1600 Languages

How to Adapt Your Pretrained Multilingual Model to 1600 Languages

A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios

Contributions of Transformer Attention Heads in Multi- and Cross-lingual Tasks

Contact Info

Product

Resources

About